* [PATCH 0/8] in-kernel APIC support "v1"
@ 2007-05-09 3:03 Gregory Haskins
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Here is my latest series incorporating the feedback and numerous bugfixes. I
did not keep an official change-log, so its difficult to say what changed off
the top of my head without an interdiff. I will keep a changelog from here on
out. Lets call this drop officially "v1". I will start tracking versions of
the drop so its easier to refer to them in review notes, etc.
Here are a few notes:
A) I implemented Avi's idea for a fd-based signaling mechanism. I didnt quite
get what he meant by "writable-fd". The way I saw it, it should be readable
so that is how I implemented it. If that is not satisfactory, please
elaborate on the writable idea and I will change it over.
B) I changed the controversial kvm_irqdevice_ack() mechanism to use an "out"
structure, instead of an int pointer + return bitmap. Hopefully, this design
puts Avi's mind at ease as the return code is more standard now. In addition,
this API makes it easier to extend, which I take advantage of later in the
series for the TPR-shadow stuff.
C) I changed the irq.task assignment from a lock to a barrier, per review
comments. However, I left the irq.guestmode = 0 assignment in a lock because
I believe it is actually required to eliminate a race. E.g. We want to make
sure that the irq.pending and IPI-method are decided atomically and the
irq.guest-mode is essentially identifiying a critical section. I could be
convinced otherwise, but for now its still there.
D) Patch #8 is for demonstration purposes only. Dont apply it (yet) as it
causes the system to error on VMENTRY. I include it purely so its clear where
I am going.
Overall, this code (excluding patch #8) seems to be working quite well from a
pure functional standpoint. One problem that I see is QEMU remains pretty
busy even when the guest is idle. I have a feeling it has something to do
with the way signals are delivered...TBD. Otherwise, its working from my
perspective. I would love to hear feedback from testers.
An interesting discovery on my part while working on this is that there is an
aparent mis-emulation in the QEMU LAPIC code. The kernel that ships as the
SLED-10 installer (2.6.16.21, I think) maps LINT0 as an NMI and masks off all
interrupts in the 8259 except the PIT. It also leaves the PIT input on the
IOAPIC active.
This means that every timer tick gets delivered both as a FIXED vector from
the IOAPIC, and as an NMI. As far as I can tell from reading google, this is
what linux intended. Note, however, that under QEMU LAPIC, LINT0 is dropped
if the vector is not EXTINT whereas the in-kernel APIC emulates both.
Therefore, cat'ing /proc/interrupts under stock KVM shows only IRQ: 0, and LOC
incrementing, with NMI at 0. The in-kernel patches show NMIs also
incrementing.
I could generate a patch to fix the QEMU code, but what I am not sure of is
whether this was intentionally coded to ignore the LINT0 NMI programming?
Regards,
-Greg
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-09 3:03 ` Gregory Haskins
[not found] ` <20070509030315.23443.93779.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 2/8] KVM: Add irqdevice object Gregory Haskins
` (7 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/kvm.h | 60 +++++++++++++++++++++++++++++++
drivers/kvm/kvm_main.c | 94 ++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 142 insertions(+), 12 deletions(-)
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 9c20d5d..b76631b 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -254,6 +254,65 @@ struct kvm_stat {
u32 light_exits;
};
+struct kvm_io_device {
+ void (*read)(struct kvm_io_device *this,
+ gpa_t addr,
+ int len,
+ void *val);
+ void (*write)(struct kvm_io_device *this,
+ gpa_t addr,
+ int len,
+ const void *val);
+ int (*in_range)(struct kvm_io_device *this, gpa_t addr);
+ void (*destructor)(struct kvm_io_device *this);
+
+ void *private;
+};
+
+static inline void kvm_iodevice_read(struct kvm_io_device *dev,
+ gpa_t addr,
+ int len,
+ void *val)
+{
+ dev->read(dev, addr, len, val);
+}
+
+static inline void kvm_iodevice_write(struct kvm_io_device *dev,
+ gpa_t addr,
+ int len,
+ const void *val)
+{
+ dev->write(dev, addr, len, val);
+}
+
+static inline int kvm_iodevice_inrange(struct kvm_io_device *dev, gpa_t addr)
+{
+ return dev->in_range(dev, addr);
+}
+
+static inline void kvm_iodevice_destructor(struct kvm_io_device *dev)
+{
+ dev->destructor(dev);
+}
+
+/*
+ * It would be nice to use something smarter than a linear search, TBD...
+ * Thankfully we dont expect many devices to register (famous last words :),
+ * so until then it will suffice. At least its abstracted so we can change
+ * in one place.
+ */
+struct kvm_io_bus {
+ int dev_count;
+#define NR_IOBUS_DEVS 6
+ struct kvm_io_device *devs[NR_IOBUS_DEVS];
+};
+
+void kvm_io_bus_init(struct kvm_io_bus *bus);
+void kvm_io_bus_destroy(struct kvm_io_bus *bus);
+struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr);
+void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+ struct kvm_io_device *dev);
+
struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -367,6 +426,7 @@ struct kvm {
unsigned long rmap_overflow;
struct list_head vm_list;
struct file *filp;
+ struct kvm_io_bus mmio_bus;
};
struct descriptor_table {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index a3723dd..2bc5dbb 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -295,6 +295,7 @@ static struct kvm *kvm_create_vm(void)
spin_lock_init(&kvm->lock);
INIT_LIST_HEAD(&kvm->active_mmu_pages);
+ kvm_io_bus_init(&kvm->mmio_bus);
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
struct kvm_vcpu *vcpu = &kvm->vcpus[i];
@@ -392,6 +393,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
spin_lock(&kvm_lock);
list_del(&kvm->vm_list);
spin_unlock(&kvm_lock);
+ kvm_io_bus_destroy(&kvm->mmio_bus);
kvm_free_vcpus(kvm);
kvm_free_physmem(kvm);
kfree(kvm);
@@ -1015,12 +1017,25 @@ static int emulator_write_std(unsigned long addr,
return X86EMUL_UNHANDLEABLE;
}
+static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
+ gpa_t addr)
+{
+ /*
+ * Note that its important to have this wrapper function because
+ * in the very near future we will be checking for MMIOs against
+ * the LAPIC as well as the general MMIO bus
+ */
+ return kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr);
+}
+
static int emulator_read_emulated(unsigned long addr,
void *val,
unsigned int bytes,
struct x86_emulate_ctxt *ctxt)
{
- struct kvm_vcpu *vcpu = ctxt->vcpu;
+ struct kvm_vcpu *vcpu = ctxt->vcpu;
+ struct kvm_io_device *mmio_dev;
+ gpa_t gpa;
if (vcpu->mmio_read_completed) {
memcpy(val, vcpu->mmio_data, bytes);
@@ -1029,18 +1044,26 @@ static int emulator_read_emulated(unsigned long addr,
} else if (emulator_read_std(addr, val, bytes, ctxt)
== X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;
- else {
- gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
- if (gpa == UNMAPPED_GVA)
- return X86EMUL_PROPAGATE_FAULT;
- vcpu->mmio_needed = 1;
- vcpu->mmio_phys_addr = gpa;
- vcpu->mmio_size = bytes;
- vcpu->mmio_is_write = 0;
+ gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
+ if (gpa == UNMAPPED_GVA)
+ return X86EMUL_PROPAGATE_FAULT;
- return X86EMUL_UNHANDLEABLE;
+ /*
+ * Is this MMIO handled locally?
+ */
+ mmio_dev = vcpu_find_mmio_dev(vcpu, gpa);
+ if (mmio_dev) {
+ kvm_iodevice_read(mmio_dev, gpa, bytes, val);
+ return X86EMUL_CONTINUE;
}
+
+ vcpu->mmio_needed = 1;
+ vcpu->mmio_phys_addr = gpa;
+ vcpu->mmio_size = bytes;
+ vcpu->mmio_is_write = 0;
+
+ return X86EMUL_UNHANDLEABLE;
}
static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -1068,8 +1091,9 @@ static int emulator_write_emulated(unsigned long addr,
unsigned int bytes,
struct x86_emulate_ctxt *ctxt)
{
- struct kvm_vcpu *vcpu = ctxt->vcpu;
- gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
+ struct kvm_vcpu *vcpu = ctxt->vcpu;
+ struct kvm_io_device *mmio_dev;
+ gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
if (gpa == UNMAPPED_GVA) {
kvm_arch_ops->inject_page_fault(vcpu, addr, 2);
@@ -1079,6 +1103,15 @@ static int emulator_write_emulated(unsigned long addr,
if (emulator_write_phys(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;
+ /*
+ * Is this MMIO handled locally?
+ */
+ mmio_dev = vcpu_find_mmio_dev(vcpu, gpa);
+ if (mmio_dev) {
+ kvm_iodevice_write(mmio_dev, gpa, bytes, val);
+ return X86EMUL_CONTINUE;
+ }
+
vcpu->mmio_needed = 1;
vcpu->mmio_phys_addr = gpa;
vcpu->mmio_size = bytes;
@@ -2907,6 +2940,43 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
return NOTIFY_OK;
}
+void kvm_io_bus_init(struct kvm_io_bus *bus)
+{
+ memset(bus, 0, sizeof(*bus));
+}
+
+void kvm_io_bus_destroy(struct kvm_io_bus *bus)
+{
+ int i;
+
+ for (i = 0; i < bus->dev_count; i++) {
+ struct kvm_io_device *pos = bus->devs[i];
+
+ kvm_iodevice_destructor(pos);
+ }
+}
+
+struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr)
+{
+ int i;
+
+ for (i = 0; i < bus->dev_count; i++) {
+ struct kvm_io_device *pos = bus->devs[i];
+
+ if (pos->in_range(pos, addr))
+ return pos;
+ }
+
+ return NULL;
+}
+
+void kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
+{
+ BUG_ON(bus->dev_count > (NR_IOBUS_DEVS-1));
+
+ bus->devs[bus->dev_count++] = dev;
+}
+
static struct notifier_block kvm_cpu_notifier = {
.notifier_call = kvm_cpu_hotplug,
.priority = 20, /* must be > scheduler priority */
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
[not found] ` <20070509030320.23443.51197.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU Gregory Haskins
` (6 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
The current code is geared towards using a user-mode (A)PIC. This patch adds
an "irqdevice" abstraction, and implements a "userint" model to handle the
duties of the original code. Later, we can develop other irqdevice models
to handle objects like LAPIC, IOAPIC, i8259, etc, as appropriate
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/Makefile | 2
drivers/kvm/irqdevice.h | 176 +++++++++++++++++++++++++++++++++++++
drivers/kvm/kvm.h | 107 ++++++++++++++++++++++-
drivers/kvm/kvm_main.c | 58 +++++++++---
drivers/kvm/svm.c | 158 ++++++++++++++++++++++++---------
drivers/kvm/userint.c | 223 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/kvm/vmx.c | 161 +++++++++++++++++++++++++---------
7 files changed, 780 insertions(+), 105 deletions(-)
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c0a789f..540afbc 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -2,7 +2,7 @@
# Makefile for Kernel-based Virtual Machine module
#
-kvm-objs := kvm_main.o mmu.o x86_emulate.o
+kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o
obj-$(CONFIG_KVM) += kvm.o
kvm-intel-objs = vmx.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/irqdevice.h b/drivers/kvm/irqdevice.h
new file mode 100644
index 0000000..097d179
--- /dev/null
+++ b/drivers/kvm/irqdevice.h
@@ -0,0 +1,176 @@
+/*
+ * Defines an interface for an abstract interrupt controller. The model
+ * consists of a unit with an arbitrary number of input lines N (IRQ0-(N-1)),
+ * an arbitrary number of output lines (INTR) (LINT, EXTINT, NMI, etc), and
+ * methods for completing an interrupt-acknowledge cycle (INTA). A particular
+ * implementation of this model will define various policies, such as
+ * irq-to-vector translation, INTA/auto-EOI policy, etc.
+ *
+ * In addition, the INTR callback mechanism allows the unit to be "wired" to
+ * an interruptible source in a very flexible manner. For instance, an
+ * irqdevice could have its INTR wired to a VCPU (ala LAPIC), or another
+ * interrupt controller (ala cascaded i8259s)
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __IRQDEVICE_H
+#define __IRQDEVICE_H
+
+struct kvm_irqdevice;
+
+typedef enum {
+ kvm_irqpin_localint,
+ kvm_irqpin_extint,
+ kvm_irqpin_smi,
+ kvm_irqpin_nmi,
+ kvm_irqpin_invalid, /* must always be last */
+} kvm_irqpin_t;
+
+
+struct kvm_irqsink {
+ void (*set_intr)(struct kvm_irqsink *this,
+ struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin);
+
+ void *private;
+};
+
+#define KVM_IRQACKDATA_VECTOR_VALID (1 << 0)
+#define KVM_IRQACKDATA_VECTOR_PENDING (1 << 1)
+
+#define KVM_IRQACK_FLAG_PEEK (1 << 0)
+
+struct kvm_irqack_data {
+ int flags;
+ int vector;
+};
+
+struct kvm_irqdevice {
+ int (*ack)(struct kvm_irqdevice *this, int flags,
+ struct kvm_irqack_data *data);
+ int (*set_pin)(struct kvm_irqdevice *this, int pin, int level);
+ void (*destructor)(struct kvm_irqdevice *this);
+
+ void *private;
+ struct kvm_irqsink sink;
+};
+
+/**
+ * kvm_irqdevice_init - initialize the kvm_irqdevice for use
+ * @dev: The device
+ *
+ * Description: Initialize the kvm_irqdevice for use. Should be called before
+ * calling any derived implementation init functions
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_init(struct kvm_irqdevice *dev)
+{
+ memset(dev, 0, sizeof(*dev));
+}
+
+/**
+ * kvm_irqdevice_ack - read and ack the highest priority vector from the device
+ * @dev: The device
+ * @flags: Modifies default behavior
+ * [ KVM_IRQACK_FLAG_PEEK - Dont ack vector, just check status ]
+ * @data: A pointer to a kvm_irqack_data structure to hold the result
+ *
+ * Description: Read the highest priority pending vector from the device,
+ * potentially invoking auto-EOI depending on device policy
+ *
+ * Successful return indicates that the *data* structure is valid
+ *
+ * data.flags -
+ * [KVM_IRQACKDATA_VECTOR_VALID - data.vector is valid]
+ * [KVM_IRQACKDATA_VECTOR_PENDING - more vectors are pending]
+ *
+ * Returns: (int)
+ * [-1 = failure]
+ * [ 0 = success]
+ */
+static inline int kvm_irqdevice_ack(struct kvm_irqdevice *dev, int flags,
+ struct kvm_irqack_data *data)
+{
+ return dev->ack(dev, flags, data);
+}
+
+/**
+ * kvm_irqdevice_set_pin - allows the caller to assert/deassert an IRQ
+ * @dev: The device
+ * @pin: The input pin to alter
+ * @level: The value to set (1 = assert, 0 = deassert)
+ *
+ * Description: Allows the caller to assert/deassert an IRQ input pin to the
+ * device according to device policy.
+ *
+ * Returns: (int)
+ * [-1 = failure]
+ * [ 0 = success]
+ */
+static inline int kvm_irqdevice_set_pin(struct kvm_irqdevice *dev, int pin,
+ int level)
+{
+ return dev->set_pin(dev, pin, level);
+}
+
+/**
+ * kvm_irqdevice_register_sink - registers an kvm_irqsink object
+ * @dev: The device
+ * @sink: The sink to register. Data will be copied so building object from
+ * transient storage is ok.
+ *
+ * Description: Registers an kvm_irqsink object as an INTR callback
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_register_sink(struct kvm_irqdevice *dev,
+ const struct kvm_irqsink *sink)
+{
+ dev->sink = *sink;
+}
+
+/**
+ * kvm_irqdevice_destructor - destroys an irqdevice
+ * @dev: The device
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_destructor(struct kvm_irqdevice *dev)
+{
+ dev->destructor(dev);
+}
+
+/**
+ * kvm_irqdevice_set_intr - invokes a registered INTR callback
+ * @dev: The device
+ * @pin: Identifies the pin to alter -
+ * [ KVM_IRQPIN_LOCALINT (default) - an vector is pending on this
+ * device]
+ * [ KVM_IRQPIN_EXTINT - a vector is pending on an external device]
+ * [ KVM_IRQPIN_SMI - system-management-interrupt pin]
+ * [ KVM_IRQPIN_NMI - non-maskable-interrupt pin
+ *
+ * Description: Invokes a registered INTR callback (if present). This
+ * function is meant to be used privately by a irqdevice
+ * implementation.
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_set_intr(struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin)
+{
+ struct kvm_irqsink *sink = &dev->sink;
+ if (sink->set_intr)
+ sink->set_intr(sink, dev, pin);
+}
+
+#endif /* __IRQDEVICE_H */
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index b76631b..059f074 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -13,6 +13,7 @@
#include <linux/mm.h>
#include "vmx.h"
+#include "irqdevice.h"
#include <linux/kvm.h>
#include <linux/kvm_para.h>
@@ -158,6 +159,11 @@ struct vmcs {
struct kvm_vcpu;
+int kvm_user_irqdev_init(struct kvm_irqdevice *dev);
+int kvm_user_irqdev_save(struct kvm_irqdevice *this, void *data);
+int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data);
+int kvm_userint_init(struct kvm_vcpu *vcpu);
+
/*
* x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
* 32-bit). The kvm_mmu structure abstracts the details of the current mmu
@@ -313,6 +319,18 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr);
void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
struct kvm_io_device *dev);
+#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
+
+/*
+ * structure for maintaining info for interrupting an executing VCPU
+ */
+struct kvm_vcpu_irq {
+ spinlock_t lock;
+ struct kvm_irqdevice dev;
+ int pending;
+ int deferred;
+};
+
struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -325,9 +343,7 @@ struct kvm_vcpu {
u64 host_tsc;
struct kvm_run *run;
int interrupt_window_open;
- unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
-#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
- unsigned long irq_pending[NR_IRQ_WORDS];
+ struct kvm_vcpu_irq irq;
unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
unsigned long rip; /* needs vcpu_load_rsp_rip() */
@@ -394,6 +410,91 @@ struct kvm_vcpu {
struct kvm_cpuid_entry cpuid_entries[KVM_MAX_CPUID_ENTRIES];
};
+/*
+ * Assumes lock already held
+ */
+static inline int __kvm_vcpu_irq_all_pending(struct kvm_vcpu *vcpu)
+{
+ int pending = vcpu->irq.pending;
+
+ if (vcpu->irq.deferred != -1)
+ __set_bit(kvm_irqpin_localint, &pending);
+
+ return pending;
+}
+
+/*
+ * These two functions are helpers for determining if a standard interrupt
+ * is pending to replace the old "if (vcpu->irq_summary)" logic. If the
+ * caller wants to know about some of the new advanced interrupt types
+ * (SMI, NMI, etc) or to differentiate between localint and extint they will
+ * have to use the new API
+ */
+static inline int __kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
+{
+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
+
+ if (test_bit(kvm_irqpin_localint, &pending) ||
+ test_bit(kvm_irqpin_extint, &pending))
+ return 1;
+
+ return 0;
+}
+
+static inline int kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
+{
+ int ret = 0;
+ int flags;
+
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+ ret = __kvm_vcpu_irq_pending(vcpu);
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+ return ret;
+}
+
+/*
+ * Assumes lock already held
+ */
+static inline int kvm_vcpu_irq_pop(struct kvm_vcpu *vcpu,
+ struct kvm_irqack_data *data)
+{
+ int ret = 0;
+
+ if (vcpu->irq.deferred != -1) {
+ ret = kvm_irqdevice_ack(&vcpu->irq.dev, KVM_IRQACK_FLAG_PEEK,
+ data);
+ data->flags |= KVM_IRQACKDATA_VECTOR_VALID;
+ data->vector = vcpu->irq.deferred;
+ vcpu->irq.deferred = -1;
+ } else
+ ret = kvm_irqdevice_ack(&vcpu->irq.dev, 0, data);
+
+ /*
+ * If there are no more interrupts we must clear the status flag
+ */
+ if (!(data->flags & KVM_IRQACKDATA_VECTOR_PENDING))
+ __clear_bit(kvm_irqpin_localint, &vcpu->irq.pending);
+
+ return ret;
+}
+
+static inline void __kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
+{
+ BUG_ON(vcpu->irq.deferred != -1); /* We can only hold one deferred */
+
+ vcpu->irq.deferred = irq;
+}
+
+static inline void kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
+{
+ int flags;
+
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+ __kvm_vcpu_irq_push(vcpu, irq);
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+}
+
struct kvm_mem_alias {
gfn_t base_gfn;
unsigned long npages;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 2bc5dbb..199489b 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -300,6 +300,11 @@ static struct kvm *kvm_create_vm(void)
struct kvm_vcpu *vcpu = &kvm->vcpus[i];
mutex_init(&vcpu->mutex);
+
+ memset(&vcpu->irq, 0, sizeof(vcpu->irq));
+ spin_lock_init(&vcpu->irq.lock);
+ vcpu->irq.deferred = -1;
+
vcpu->cpu = -1;
vcpu->kvm = kvm;
vcpu->mmu.root_hpa = INVALID_PAGE;
@@ -367,6 +372,7 @@ static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
vcpu_load(vcpu);
kvm_mmu_destroy(vcpu);
vcpu_put(vcpu);
+ kvm_irqdevice_destructor(&vcpu->irq.dev);
kvm_arch_ops->vcpu_free(vcpu);
free_page((unsigned long)vcpu->run);
vcpu->run = NULL;
@@ -1985,8 +1991,7 @@ static int kvm_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
sregs->efer = vcpu->shadow_efer;
sregs->apic_base = vcpu->apic_base;
- memcpy(sregs->interrupt_bitmap, vcpu->irq_pending,
- sizeof sregs->interrupt_bitmap);
+ kvm_user_irqdev_save(&vcpu->irq.dev, &sregs->interrupt_bitmap);
vcpu_put(vcpu);
@@ -2003,7 +2008,6 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
int mmu_reset_needed = 0;
- int i;
struct descriptor_table dt;
vcpu_load(vcpu);
@@ -2040,12 +2044,8 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
if (mmu_reset_needed)
kvm_mmu_reset_context(vcpu);
- memcpy(vcpu->irq_pending, sregs->interrupt_bitmap,
- sizeof vcpu->irq_pending);
- vcpu->irq_summary = 0;
- for (i = 0; i < NR_IRQ_WORDS; ++i)
- if (vcpu->irq_pending[i])
- __set_bit(i, &vcpu->irq_summary);
+ kvm_user_irqdev_restore(&vcpu->irq.dev,
+ &sregs->interrupt_bitmap[0]);
set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
@@ -2206,14 +2206,8 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
{
if (irq->irq < 0 || irq->irq >= 256)
return -EINVAL;
- vcpu_load(vcpu);
-
- set_bit(irq->irq, vcpu->irq_pending);
- set_bit(irq->irq / BITS_PER_LONG, &vcpu->irq_summary);
- vcpu_put(vcpu);
-
- return 0;
+ return kvm_irqdevice_set_pin(&vcpu->irq.dev, irq->irq, 1);
}
static int kvm_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
@@ -2315,6 +2309,32 @@ out1:
}
/*
+ * This function will be invoked whenever the vcpu->irq.dev raises its INTR
+ * line
+ */
+static void kvm_vcpu_intr(struct kvm_irqsink *this,
+ struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin)
+{
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+ __set_bit(pin, &vcpu->irq.pending);
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+}
+
+static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
+{
+ struct kvm_irqsink sink = {
+ .set_intr = kvm_vcpu_intr,
+ .private = vcpu
+ };
+
+ kvm_irqdevice_register_sink(&vcpu->irq.dev, &sink);
+}
+
+/*
* Creates some virtual cpus. Good luck creating more than one.
*/
static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n)
@@ -2361,6 +2381,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n)
if (r < 0)
goto out_free_vcpus;
+ kvm_irqdevice_init(&vcpu->irq.dev);
+ kvm_vcpu_irqsink_init(vcpu);
+ r = kvm_userint_init(vcpu);
+ if (r < 0)
+ goto out_free_vcpus;
+
kvm_arch_ops->vcpu_load(vcpu);
r = kvm_mmu_setup(vcpu);
if (r >= 0)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index b621403..4c03881 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -106,24 +106,6 @@ static unsigned get_addr_size(struct kvm_vcpu *vcpu)
(cs_attrib & SVM_SELECTOR_DB_MASK) ? 4 : 2;
}
-static inline u8 pop_irq(struct kvm_vcpu *vcpu)
-{
- int word_index = __ffs(vcpu->irq_summary);
- int bit_index = __ffs(vcpu->irq_pending[word_index]);
- int irq = word_index * BITS_PER_LONG + bit_index;
-
- clear_bit(bit_index, &vcpu->irq_pending[word_index]);
- if (!vcpu->irq_pending[word_index])
- clear_bit(word_index, &vcpu->irq_summary);
- return irq;
-}
-
-static inline void push_irq(struct kvm_vcpu *vcpu, u8 irq)
-{
- set_bit(irq, vcpu->irq_pending);
- set_bit(irq / BITS_PER_LONG, &vcpu->irq_summary);
-}
-
static inline void clgi(void)
{
asm volatile (SVM_CLGI);
@@ -904,7 +886,12 @@ static int pf_interception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
if (is_external_interrupt(exit_int_info))
- push_irq(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
+ /*
+ * An exception was taken while we were trying to inject an
+ * IRQ. We must defer the injection of the vector until
+ * the next window.
+ */
+ kvm_vcpu_irq_push(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
spin_lock(&vcpu->kvm->lock);
@@ -1114,7 +1101,7 @@ static int halt_interception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
vcpu->svm->next_rip = vcpu->svm->vmcb->save.rip + 1;
skip_emulated_instruction(vcpu);
- if (vcpu->irq_summary)
+ if (kvm_vcpu_irq_pending(vcpu))
return 1;
kvm_run->exit_reason = KVM_EXIT_HLT;
@@ -1285,7 +1272,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu,
* possible
*/
if (kvm_run->request_interrupt_window &&
- !vcpu->irq_summary) {
+ !kvm_vcpu_irq_pending(vcpu)) {
++vcpu->stat.irq_window_exits;
kvm_run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
return 0;
@@ -1384,60 +1371,135 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
}
-static inline void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
-{
- struct vmcb_control_area *control;
-
- control = &vcpu->svm->vmcb->control;
- control->int_vector = pop_irq(vcpu);
- control->int_ctl &= ~V_INTR_PRIO_MASK;
- control->int_ctl |= V_IRQ_MASK |
- ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
-}
-
static void kvm_reput_irq(struct kvm_vcpu *vcpu)
{
struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
if (control->int_ctl & V_IRQ_MASK) {
control->int_ctl &= ~V_IRQ_MASK;
- push_irq(vcpu, control->int_vector);
+ kvm_vcpu_irq_push(vcpu, control->int_vector);
}
vcpu->interrupt_window_open =
!(control->int_state & SVM_INTERRUPT_SHADOW_MASK);
}
-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
- struct kvm_run *kvm_run)
+static int do_intr_requests(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run,
+ kvm_irqpin_t pin)
{
struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
+ int handled = 0;
vcpu->interrupt_window_open =
(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
(vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF));
- if (vcpu->interrupt_window_open && vcpu->irq_summary)
+ if (vcpu->interrupt_window_open) {
/*
- * If interrupts enabled, and not blocked by sti or mov ss. Good.
+ * If interrupts enabled, and not blocked by sti or mov ss.
+ * Good.
*/
- kvm_do_inject_irq(vcpu);
+ struct kvm_irqack_data ack;
+ int r = 0;
+
+ memset(&ack, 0, sizeof(ack));
+
+ switch (pin) {
+ case kvm_irqpin_localint:
+ r = kvm_vcpu_irq_pop(vcpu, &ack);
+ break;
+ case kvm_irqpin_extint:
+ printk(KERN_WARNING "KVM: external-interrupts not " \
+ "handled yet\n");
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ case kvm_irqpin_nmi:
+ /*
+ * FIXME: Someday we will handle this using the
+ * specific SVN NMI features. For now, just inject
+ * the NMI as a standard interrupt on vector 2
+ */
+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
+ ack.vector = 2;
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ default:
+ panic("KVM: unknown interrupt pin raised: %d\n", pin);
+ break;
+ }
+
+ BUG_ON(r < 0);
+
+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
+ control = &vcpu->svm->vmcb->control;
+ control->int_vector = ack.vector;
+ control->int_ctl &= ~V_INTR_PRIO_MASK;
+ control->int_ctl |= V_IRQ_MASK |
+ ((/*control->int_vector >> 4*/ 0xf) <<
+ V_INTR_PRIO_SHIFT);
+
+ handled = 1;
+ }
+ }
/*
* Interrupts blocked. Wait for unblock.
*/
if (!vcpu->interrupt_window_open &&
- (vcpu->irq_summary || kvm_run->request_interrupt_window)) {
+ (__kvm_vcpu_irq_pending(vcpu) ||
+ kvm_run->request_interrupt_window))
control->intercept |= 1ULL << INTERCEPT_VINTR;
- } else
- control->intercept &= ~(1ULL << INTERCEPT_VINTR);
+
+ return handled;
+}
+
+static void clear_pending_controls(struct kvm_vcpu *vcpu)
+{
+ struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
+
+ control->intercept &= ~(1ULL << INTERCEPT_VINTR);
+}
+
+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run)
+{
+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
+
+ clear_pending_controls(vcpu);
+
+ while (pending) {
+ kvm_irqpin_t pin = __fls(pending);
+
+ switch (pin) {
+ case kvm_irqpin_localint:
+ case kvm_irqpin_extint:
+ case kvm_irqpin_nmi:
+ do_intr_requests(vcpu, kvm_run, pin);
+ break;
+ case kvm_irqpin_smi:
+ /* ignored (for now) */
+ printk(KERN_WARNING "KVM: dropping unhandled SMI\n");
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ case kvm_irqpin_invalid:
+ /* drop */
+ break;
+ default:
+ panic("KVM: unknown interrupt pin raised: %d\n", pin);
+ break;
+ }
+
+ __clear_bit(pin, &pending);
+ }
}
static void post_kvm_run_save(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
- kvm_run->ready_for_interrupt_injection = (vcpu->interrupt_window_open &&
- vcpu->irq_summary == 0);
+ kvm_run->ready_for_interrupt_injection =
+ (vcpu->interrupt_window_open &&
+ !kvm_vcpu_irq_pending(vcpu));
kvm_run->if_flag = (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF) != 0;
kvm_run->cr8 = vcpu->cr8;
kvm_run->apic_base = vcpu->apic_base;
@@ -1452,7 +1514,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu,
static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
- return (!vcpu->irq_summary &&
+ return (!kvm_vcpu_irq_pending(vcpu) &&
kvm_run->request_interrupt_window &&
vcpu->interrupt_window_open &&
(vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF));
@@ -1482,9 +1544,17 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
again:
+ spin_lock(&vcpu->irq.lock);
+
+ /*
+ * We must inject interrupts (if any) while the irq_lock
+ * is held
+ */
if (!vcpu->mmio_read_completed)
do_interrupt_requests(vcpu, kvm_run);
+ spin_unlock(&vcpu->irq.lock);
+
clgi();
pre_svm_run(vcpu);
diff --git a/drivers/kvm/userint.c b/drivers/kvm/userint.c
new file mode 100644
index 0000000..08d26fa
--- /dev/null
+++ b/drivers/kvm/userint.c
@@ -0,0 +1,223 @@
+/*
+ * User Interrupts IRQ device
+ *
+ * This acts as an extention of an interrupt controller that exists elsewhere
+ * (typically in userspace/QEMU). Because this PIC is a pseudo device that
+ * is downstream from a real emulated PIC, the "IRQ-to-vector" mapping has
+ * already occured. Therefore, this PIC has the following unusal properties:
+ *
+ * 1) It has 256 "pins" which are literal vectors (i.e. no translation)
+ * 2) It only supports "auto-EOI" behavior since it is expected that the
+ * upstream emulated PIC will handle the real EOIs (if applicable)
+ * 3) It only listens to "asserts" on the pins (deasserts are dropped)
+ * because its an auto-EOI device anyway.
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * bitarray code based on original vcpu->irq_pending code,
+ * Copyright (C) 2007 Qumranet
+ *
+ * Authors:
+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "kvm.h"
+
+/*
+ *----------------------------------------------------------------------
+ * optimized bitarray object - works like bitarrays in bitops, but uses
+ * a summary field to accelerate lookups. Assumes external locking
+ *---------------------------------------------------------------------
+ */
+
+struct bitarray {
+ unsigned long summary; /* 1 per word in pending */
+ unsigned long pending[NR_IRQ_WORDS];
+};
+
+static inline int bitarray_pending(struct bitarray *this)
+{
+ return this->summary ? 1 : 0;
+}
+
+static inline int bitarray_findhighest(struct bitarray *this)
+{
+ if (!this->summary)
+ return -1;
+ else {
+ int word_index = __fls(this->summary);
+ int bit_index = __fls(this->pending[word_index]);
+
+ return word_index * BITS_PER_LONG + bit_index;
+ }
+}
+
+static inline void bitarray_set(struct bitarray *this, int nr)
+{
+ __set_bit(nr, &this->pending);
+ __set_bit(nr / BITS_PER_LONG, &this->summary);
+}
+
+static inline void bitarray_clear(struct bitarray *this, int nr)
+{
+ int word = nr / BITS_PER_LONG;
+
+ __clear_bit(nr, &this->pending);
+ if (!this->pending[word])
+ __clear_bit(word, &this->summary);
+}
+
+static inline int bitarray_test(struct bitarray *this, int nr)
+{
+ return test_bit(nr, &this->pending);
+}
+
+static inline int bitarray_test_and_set(struct bitarray *this, int nr, int val)
+{
+ if (bitarray_test(this, nr) != val) {
+ if (val)
+ bitarray_set(this, nr);
+ else
+ bitarray_clear(this, nr);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ *----------------------------------------------------------------------
+ * userint interface - provides the actual kvm_irqdevice implementation
+ *---------------------------------------------------------------------
+ */
+
+struct kvm_user_irqdev {
+ spinlock_t lock;
+ atomic_t ref_count;
+ struct bitarray pending;
+};
+
+static int user_irqdev_ack(struct kvm_irqdevice *this, int flags,
+ struct kvm_irqack_data *data)
+{
+ struct kvm_user_irqdev *s = (struct kvm_user_irqdev*)this->private;
+
+ spin_lock(&s->lock);
+
+ if (!(flags & KVM_IRQACK_FLAG_PEEK)) {
+ int irq = bitarray_findhighest(&s->pending);
+
+ if (irq > -1) {
+ /*
+ * Automatically clear the interrupt as the EOI
+ * mechanism (if any) will take place in userspace
+ */
+ bitarray_clear(&s->pending, irq);
+
+ data->flags |= KVM_IRQACKDATA_VECTOR_VALID;
+ }
+
+ data->vector = irq;
+ }
+
+ if (bitarray_pending(&s->pending))
+ data->flags |= KVM_IRQACKDATA_VECTOR_PENDING;
+
+ spin_unlock(&s->lock);
+
+ return 0;
+}
+
+static int user_irqdev_set_pin(struct kvm_irqdevice *this, int irq, int level)
+{
+ struct kvm_user_irqdev *s = (struct kvm_user_irqdev*)this->private;
+ int forward = 0;
+
+ spin_lock(&s->lock);
+ forward = bitarray_test_and_set(&s->pending, irq, level);
+ spin_unlock(&s->lock);
+
+ /*
+ * alert the higher layer software we have changes
+ */
+ if (forward)
+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
+
+ return 0;
+}
+
+static void user_irqdev_destructor(struct kvm_irqdevice *this)
+{
+ struct kvm_user_irqdev *s = (struct kvm_user_irqdev*)this->private;
+
+ if (atomic_dec_and_test(&s->ref_count))
+ kfree(s);
+}
+
+int kvm_user_irqdev_init(struct kvm_irqdevice *irqdev)
+{
+ struct kvm_user_irqdev *s;
+
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ spin_lock_init(&s->lock);
+
+ irqdev->ack = user_irqdev_ack;
+ irqdev->set_pin = user_irqdev_set_pin;
+ irqdev->destructor = user_irqdev_destructor;
+
+ irqdev->private = s;
+ atomic_inc(&s->ref_count);
+
+ return 0;
+}
+
+int kvm_user_irqdev_save(struct kvm_irqdevice *this, void *data)
+{
+ struct kvm_user_irqdev *s = (struct kvm_user_irqdev*)this->private;
+
+ spin_lock(&s->lock);
+ memcpy(data, s->pending.pending, sizeof s->pending.pending);
+ spin_unlock(&s->lock);
+
+ return 0;
+}
+
+int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data)
+{
+ struct kvm_user_irqdev *s = (struct kvm_user_irqdev*)this->private;
+ int i;
+ int forward = 0;
+
+ spin_lock(&s->lock);
+
+ /*
+ * walk the interrupt-bitmap and inject an IRQ for each bit found
+ */
+ for (i = 0; i < 256; ++i) {
+ int val = test_bit(i, data);
+ forward = bitarray_test_and_set(&s->pending, i, val);
+ }
+
+ spin_unlock(&s->lock);
+
+ /*
+ * alert the higher layer software we have changes
+ */
+ if (forward)
+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
+
+ return 0;
+}
+
+int kvm_userint_init(struct kvm_vcpu *vcpu)
+{
+ return kvm_user_irqdev_init(&vcpu->irq.dev);
+}
+
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 19edb34..ca858cb 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1301,52 +1301,118 @@ static void inject_rmode_irq(struct kvm_vcpu *vcpu, int irq)
vmcs_writel(GUEST_RSP, (vmcs_readl(GUEST_RSP) & ~0xffff) | (sp - 6));
}
-static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
+static int do_intr_requests(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run,
+ kvm_irqpin_t pin)
{
- int word_index = __ffs(vcpu->irq_summary);
- int bit_index = __ffs(vcpu->irq_pending[word_index]);
- int irq = word_index * BITS_PER_LONG + bit_index;
-
- clear_bit(bit_index, &vcpu->irq_pending[word_index]);
- if (!vcpu->irq_pending[word_index])
- clear_bit(word_index, &vcpu->irq_summary);
-
- if (vcpu->rmode.active) {
- inject_rmode_irq(vcpu, irq);
- return;
- }
- vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
- irq | INTR_TYPE_EXT_INTR | INTR_INFO_VALID_MASK);
-}
-
-
-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
- struct kvm_run *kvm_run)
-{
- u32 cpu_based_vm_exec_control;
+ int handled = 0;
vcpu->interrupt_window_open =
((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
if (vcpu->interrupt_window_open &&
- vcpu->irq_summary &&
- !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK))
+ !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK)) {
/*
- * If interrupts enabled, and not blocked by sti or mov ss. Good.
+ * If interrupts enabled, and not blocked by sti or mov ss.
+ * Good.
*/
- kvm_do_inject_irq(vcpu);
+ struct kvm_irqack_data ack;
+ int r = 0;
+
+ memset(&ack, 0, sizeof(ack));
+
+ switch (pin) {
+ case kvm_irqpin_localint:
+ r = kvm_vcpu_irq_pop(vcpu, &ack);
+ break;
+ case kvm_irqpin_extint:
+ printk(KERN_WARNING "KVM: external-interrupts not " \
+ "handled yet\n");
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ case kvm_irqpin_nmi:
+ /*
+ * FIXME: Someday we will handle this using the
+ * specific VMX NMI features. For now, just inject
+ * the NMI as a standard interrupt on vector 2
+ */
+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
+ ack.vector = 2;
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ default:
+ panic("KVM: unknown interrupt pin raised: %d\n", pin);
+ break;
+ }
+
+ BUG_ON(r < 0);
+
+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
+ if (vcpu->rmode.active)
+ inject_rmode_irq(vcpu, ack.vector);
+ else
+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+ ack.vector |
+ INTR_TYPE_EXT_INTR |
+ INTR_INFO_VALID_MASK);
+
+ handled = 1;
+ }
+ }
- cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
if (!vcpu->interrupt_window_open &&
- (vcpu->irq_summary || kvm_run->request_interrupt_window))
+ (__kvm_vcpu_irq_pending(vcpu) ||
+ kvm_run->request_interrupt_window)) {
/*
* Interrupts blocked. Wait for unblock.
*/
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
- else
- cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cbvec |= CPU_BASED_VIRTUAL_INTR_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
+ }
+
+ return handled;
+}
+
+static void clear_pending_controls(struct kvm_vcpu *vcpu)
+{
+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cbvec &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
+}
+
+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run)
+{
+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
+
+ clear_pending_controls(vcpu);
+
+ while (pending) {
+ kvm_irqpin_t pin = __fls(pending);
+
+ switch (pin) {
+ case kvm_irqpin_localint:
+ case kvm_irqpin_extint:
+ case kvm_irqpin_nmi:
+ do_intr_requests(vcpu, kvm_run, pin);
+ break;
+ case kvm_irqpin_smi:
+ /* ignored (for now) */
+ printk(KERN_WARNING "KVM: dropping unhandled SMI\n");
+ __clear_bit(pin, &vcpu->irq.pending);
+ break;
+ case kvm_irqpin_invalid:
+ /* drop */
+ break;
+ default:
+ panic("KVM: unknown interrupt pin raised: %d\n", pin);
+ break;
+ }
+
+ __clear_bit(pin, &pending);
+ }
}
static void kvm_guest_debug_pre(struct kvm_vcpu *vcpu)
@@ -1397,9 +1463,13 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
if (is_external_interrupt(vect_info)) {
+ /*
+ * An exception was taken while we were trying to inject an
+ * IRQ. We must defer the injection of the vector until
+ * the next window.
+ */
int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
- set_bit(irq, vcpu->irq_pending);
- set_bit(irq / BITS_PER_LONG, &vcpu->irq_summary);
+ kvm_vcpu_irq_push(vcpu, irq);
}
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) { /* nmi */
@@ -1719,8 +1789,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu,
kvm_run->if_flag = (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) != 0;
kvm_run->cr8 = vcpu->cr8;
kvm_run->apic_base = vcpu->apic_base;
- kvm_run->ready_for_interrupt_injection = (vcpu->interrupt_window_open &&
- vcpu->irq_summary == 0);
+ kvm_run->ready_for_interrupt_injection =
+ (vcpu->interrupt_window_open &&
+ !kvm_vcpu_irq_pending(vcpu));
}
static int handle_interrupt_window(struct kvm_vcpu *vcpu,
@@ -1731,7 +1802,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
* possible
*/
if (kvm_run->request_interrupt_window &&
- !vcpu->irq_summary) {
+ !kvm_vcpu_irq_pending(vcpu)) {
kvm_run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
++vcpu->stat.irq_window_exits;
return 0;
@@ -1742,7 +1813,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
static int handle_halt(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
skip_emulated_instruction(vcpu);
- if (vcpu->irq_summary)
+ if (kvm_vcpu_irq_pending(vcpu))
return 1;
kvm_run->exit_reason = KVM_EXIT_HLT;
@@ -1812,7 +1883,7 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
- return (!vcpu->irq_summary &&
+ return (!kvm_vcpu_irq_pending(vcpu) &&
kvm_run->request_interrupt_window &&
vcpu->interrupt_window_open &&
(vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF));
@@ -1855,11 +1926,19 @@ preempted:
vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
#endif
+ if (vcpu->guest_debug.enabled)
+ kvm_guest_debug_pre(vcpu);
+
+ spin_lock(&vcpu->irq.lock);
+
+ /*
+ * We must inject interrupts (if any) while the irq.lock
+ * is held
+ */
if (!vcpu->mmio_read_completed)
do_interrupt_requests(vcpu, kvm_run);
- if (vcpu->guest_debug.enabled)
- kvm_guest_debug_pre(vcpu);
+ spin_unlock(&vcpu->irq.lock);
if (vcpu->fpu_active) {
fx_save(vcpu->host_fx_image);
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers Gregory Haskins
2007-05-09 3:03 ` [PATCH 2/8] KVM: Add irqdevice object Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
[not found] ` <20070509030325.23443.90129.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor Gregory Haskins
` (5 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
The VCPU executes synchronously w.r.t. userspace today, and therefore
interrupt injection is pretty straight forward. However, we will soon need
to be able to inject interrupts asynchronous to the execution of the VCPU
due to the introduction of SMP, paravirtualized drivers, and asynchronous
hypercalls. This patch adds support to the interrupt mechanism to force
a VCPU to VMEXIT when a new interrupt is pending.
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/kvm.h | 2 ++
drivers/kvm/kvm_main.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-
drivers/kvm/svm.c | 43 +++++++++++++++++++++++++++++++++++
drivers/kvm/vmx.c | 43 +++++++++++++++++++++++++++++++++++
4 files changed, 146 insertions(+), 1 deletions(-)
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 059f074..0f6cc32 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -329,6 +329,8 @@ struct kvm_vcpu_irq {
struct kvm_irqdevice dev;
int pending;
int deferred;
+ struct task_struct *task;
+ int guest_mode;
};
struct kvm_vcpu {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 199489b..a160638 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1868,6 +1868,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
kvm_arch_ops->decache_regs(vcpu);
}
+ vcpu->irq.task = current;
+ smp_wmb();
+
r = kvm_arch_ops->run(vcpu, kvm_run);
out:
@@ -2309,6 +2312,20 @@ out1:
}
/*
+ * This function is invoked whenever we want to interrupt a vcpu that is
+ * currently executing in guest-mode. It currently is a no-op because
+ * the simple delivery of the IPI to execute this function accomplishes our
+ * goal: To cause a VMEXIT. We pass the vcpu (which contains the
+ * vcpu->irq.task, etc) for future use
+ */
+static void kvm_vcpu_guest_intr(void *info)
+{
+#ifdef NOT_YET
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu*)info;
+#endif
+}
+
+/*
* This function will be invoked whenever the vcpu->irq.dev raises its INTR
* line
*/
@@ -2318,10 +2335,50 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
unsigned long flags;
+ int direct_ipi = -1;
spin_lock_irqsave(&vcpu->irq.lock, flags);
- __set_bit(pin, &vcpu->irq.pending);
+
+ if (!test_bit(pin, &vcpu->irq.pending)) {
+ /*
+ * Record the change..
+ */
+ __set_bit(pin, &vcpu->irq.pending);
+
+ /*
+ * then wake up the vcpu (if necessary)
+ */
+ if (vcpu->irq.task && (vcpu->irq.task != current)) {
+ if (vcpu->irq.guest_mode) {
+ /*
+ * If we are in guest mode, we can optimize
+ * the IPI by executing a function directly
+ * on the owning processor.
+ */
+ direct_ipi = task_cpu(vcpu->irq.task);
+ BUG_ON(direct_ipi == smp_processor_id());
+ }
+ }
+ }
+
spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+ /*
+ * we can safely send the IPI outside of the lock-scope because the
+ * irq.pending has already been updated. This code assumes that
+ * userspace will not sleep on anything other than HLT instructions.
+ * HLT is covered in a race-free way because irq.pending was updated
+ * in the critical section, and handle_halt() which check if any
+ * interrupts are pending before returning to userspace.
+ *
+ * If it turns out that userspace can sleep on conditions other than
+ * HLT, this code will need to be enhanced to allow the irq.pending
+ * flags to be exported to userspace
+ */
+ if (direct_ipi != -1)
+ smp_call_function_single(direct_ipi,
+ kvm_vcpu_guest_intr,
+ vcpu, 0, 0);
}
static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 4c03881..91546ae 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1542,11 +1542,40 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
u16 gs_selector;
u16 ldt_selector;
int r;
+ unsigned long irq_flags;
again:
+ /*
+ * We disable interrupts until the next VMEXIT to eliminate a race
+ * condition for delivery of virtual interrutps. Note that this is
+ * probably not as bad as it sounds, as interrupts will still invoke
+ * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
+ * scope) even if they are disabled.
+ *
+ * FIXME: Do we need to do anything additional to mask IPI/NMIs?
+ */
+ local_irq_save(irq_flags);
+
spin_lock(&vcpu->irq.lock);
/*
+ * If there are any signals pending (virtual interrupt related or
+ * otherwise), don't even bother trying to enter guest mode...
+ */
+ if (signal_pending(current)) {
+ kvm_run->exit_reason = KVM_EXIT_INTR;
+ spin_unlock(&vcpu->irq.lock);
+ local_irq_restore(irq_flags);
+ return -EINTR;
+ }
+
+ /*
+ * There are optimizations we can make when signaling interrupts
+ * if we know the VCPU is in GUEST mode, so mark that here
+ */
+ vcpu->irq.guest_mode = 1;
+
+ /*
* We must inject interrupts (if any) while the irq_lock
* is held
*/
@@ -1688,6 +1717,13 @@ again:
#endif
: "cc", "memory" );
+ /*
+ * FIXME: We'd like to turn on interrupts ASAP, but is this so early
+ * that we will mess up the state of the CPU before we fully
+ * transition from guest to host?
+ */
+ local_irq_restore(irq_flags);
+
if (vcpu->fpu_active) {
fx_save(vcpu->guest_fx_image);
fx_restore(vcpu->host_fx_image);
@@ -1710,6 +1746,13 @@ again:
reload_tss(vcpu);
/*
+ * Signal that we have transitioned back to host mode
+ */
+ spin_lock_irqsave(&vcpu->irq.lock, irq_flags);
+ vcpu->irq.guest_mode = 0;
+ spin_unlock_irqrestore(&vcpu->irq.lock, irq_flags);
+
+ /*
* Profile KVM exit RIPs:
*/
if (unlikely(prof_on == KVM_PROFILING))
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index ca858cb..7b81fff 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1895,6 +1895,7 @@ static int vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
u16 fs_sel, gs_sel, ldt_sel;
int fs_gs_ldt_reload_needed;
int r;
+ unsigned long irq_flags;
preempted:
/*
@@ -1929,9 +1930,37 @@ preempted:
if (vcpu->guest_debug.enabled)
kvm_guest_debug_pre(vcpu);
+ /*
+ * We disable interrupts until the next VMEXIT to eliminate a race
+ * condition for delivery of virtual interrutps. Note that this is
+ * probably not as bad as it sounds, as interrupts will still invoke
+ * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
+ * scope) even if they are disabled.
+ *
+ * FIXME: Do we need to do anything additional to mask IPI/NMIs?
+ */
+ local_irq_save(irq_flags);
+
spin_lock(&vcpu->irq.lock);
/*
+ * If there are any signals pending (virtual interrupt related or
+ * otherwise), don't even bother trying to enter guest mode...
+ */
+ if (signal_pending(current)) {
+ kvm_run->exit_reason = KVM_EXIT_INTR;
+ spin_unlock(&vcpu->irq.lock);
+ local_irq_restore(irq_flags);
+ return -EINTR;
+ }
+
+ /*
+ * There are optimizations we can make when signaling interrupts
+ * if we know the VCPU is in GUEST mode, so mark that here
+ */
+ vcpu->irq.guest_mode = 1;
+
+ /*
* We must inject interrupts (if any) while the irq.lock
* is held
*/
@@ -2072,12 +2101,26 @@ again:
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );
+ /*
+ * FIXME: We'd like to turn on interrupts ASAP, but is this so early
+ * that we will mess up the state of the CPU before we fully
+ * transition from guest to host?
+ */
+ local_irq_restore(irq_flags);
+
++vcpu->stat.exits;
vcpu->interrupt_window_open = (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0;
asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
+ /*
+ * Signal that we have transitioned back to host mode
+ */
+ spin_lock_irqsave(&vcpu->irq.lock, irq_flags);
+ vcpu->irq.guest_mode = 0;
+ spin_unlock_irqrestore(&vcpu->irq.lock, irq_flags);
+
if (unlikely(fail)) {
kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
kvm_run->fail_entry.hardware_entry_failure_reason
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (2 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
2007-05-09 3:03 ` [PATCH 5/8] KVM: Add support for in-kernel LAPIC model Gregory Haskins
` (4 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/kvm.h | 2 +
drivers/kvm/kvm_main.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 84 insertions(+), 0 deletions(-)
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 0f6cc32..b5bfc91 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -331,6 +331,8 @@ struct kvm_vcpu_irq {
int deferred;
struct task_struct *task;
int guest_mode;
+ wait_queue_head_t wq;
+ int usignal;
};
struct kvm_vcpu {
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index a160638..6b40c18 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -40,6 +40,7 @@
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/poll.h>
#include "x86_emulate.h"
#include "segment_descriptor.h"
@@ -304,6 +305,7 @@ static struct kvm *kvm_create_vm(void)
memset(&vcpu->irq, 0, sizeof(vcpu->irq));
spin_lock_init(&vcpu->irq.lock);
vcpu->irq.deferred = -1;
+ init_waitqueue_head(&vcpu->irq.wq);
vcpu->cpu = -1;
vcpu->kvm = kvm;
@@ -2265,11 +2267,78 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
return 0;
}
+static unsigned int kvm_vcpu_poll(struct file *filp, poll_table *wait)
+{
+ struct kvm_vcpu *vcpu = filp->private_data;
+ unsigned int events = 0;
+ unsigned long flags;
+
+ poll_wait(filp, &vcpu->irq.wq, wait);
+
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+ if (vcpu->irq.usignal)
+ events |= POLLIN;
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+ return events;
+}
+
+static ssize_t kvm_vcpu_read(struct file *filp, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct kvm_vcpu *vcpu = filp->private_data;
+ ssize_t res = -EAGAIN;
+ DECLARE_WAITQUEUE(wait, current);
+ unsigned long flags;
+ int val;
+
+ if (count < sizeof(vcpu->irq.usignal))
+ return -EINVAL;
+
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+
+ val = vcpu->irq.usignal;
+
+ if (val > 0)
+ res = sizeof(val);
+ else if (!(filp->f_flags & O_NONBLOCK)) {
+ __add_wait_queue(&vcpu->irq.wq, &wait);
+ for (res = 0;;) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (val > 0) {
+ res = sizeof(val);
+ break;
+ }
+ if (signal_pending(current)) {
+ res = -ERESTARTSYS;
+ break;
+ }
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+ schedule();
+ spin_lock_irqsave(&vcpu->irq.lock, flags);
+ }
+ __remove_wait_queue(&vcpu->irq.wq, &wait);
+ __set_current_state(TASK_RUNNING);
+ }
+
+ if (res > 0)
+ vcpu->irq.usignal = 0;
+
+ spin_unlock_irqrestore(&vcpu->irq.lock, flags);
+
+ if (res > 0 && put_user(val, (int __user *) buf))
+ return -EFAULT;
+
+ return res;
+}
+
static struct file_operations kvm_vcpu_fops = {
.release = kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
.compat_ioctl = kvm_vcpu_ioctl,
.mmap = kvm_vcpu_mmap,
+ .poll = kvm_vcpu_poll,
+ .read = kvm_vcpu_read,
};
/*
@@ -2336,6 +2405,7 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
unsigned long flags;
int direct_ipi = -1;
+ int indirect_sig = 0;
spin_lock_irqsave(&vcpu->irq.lock, flags);
@@ -2357,6 +2427,15 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
*/
direct_ipi = task_cpu(vcpu->irq.task);
BUG_ON(direct_ipi == smp_processor_id());
+ } else {
+ /*
+ * otherwise, we must assume that we could be
+ * blocked anywhere, including userspace. Send
+ * a signal to give everyone a chance to get
+ * notification
+ */
+ vcpu->irq.usignal++;
+ indirect_sig = 1;
}
}
}
@@ -2379,6 +2458,9 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
smp_call_function_single(direct_ipi,
kvm_vcpu_guest_intr,
vcpu, 0, 0);
+
+ if (indirect_sig && waitqueue_active(&vcpu->irq.wq))
+ wake_up(&vcpu->irq.wq);
}
static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 5/8] KVM: Add support for in-kernel LAPIC model
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (3 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
2007-05-09 3:03 ` [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors Gregory Haskins
` (3 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/Makefile | 2
drivers/kvm/kernint.c | 149 +++++
drivers/kvm/kvm.h | 35 +
drivers/kvm/kvm_main.c | 179 +++++-
drivers/kvm/lapic.c | 1412 ++++++++++++++++++++++++++++++++++++++++++++++++
drivers/kvm/svm.c | 13
drivers/kvm/userint.c | 8
drivers/kvm/vmx.c | 16 -
include/linux/kvm.h | 16 +
9 files changed, 1789 insertions(+), 41 deletions(-)
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 540afbc..1aad737 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -2,7 +2,7 @@
# Makefile for Kernel-based Virtual Machine module
#
-kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o
+kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o lapic.o kernint.o
obj-$(CONFIG_KVM) += kvm.o
kvm-intel-objs = vmx.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/kernint.c b/drivers/kvm/kernint.c
new file mode 100644
index 0000000..b5cbcae
--- /dev/null
+++ b/drivers/kvm/kernint.c
@@ -0,0 +1,149 @@
+/*
+ * Kernel Interrupt IRQ device
+ *
+ * Provides a model for connecting in-kernel interrupt resources to a VCPU.
+ *
+ * A typical modern x86 processor has the concept of an internal Local-APIC
+ * and some external signal pins. The way in which interrupts are injected is
+ * dependent on whether software enables the LAPIC or not. When enabled,
+ * interrupts are acknowledged through the LAPIC. Otherwise they are through
+ * an externally connected PIC (typically an i8259 on the BSP)
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "kvm.h"
+
+struct kvm_kernint {
+ struct kvm_vcpu *vcpu;
+ struct kvm_irqdevice *self_irq;
+ struct kvm_irqdevice *ext_irq;
+ struct kvm_irqdevice apic_irq;
+
+};
+
+static struct kvm_irqdevice *get_irq_dev(struct kvm_kernint *s)
+{
+ struct kvm_irqdevice *dev;
+
+ if (kvm_lapic_enabled(s->vcpu))
+ dev = &s->apic_irq;
+ else
+ dev = s->ext_irq;
+
+ if (!dev)
+ kvm_crash_guest(s->vcpu->kvm);
+
+ return dev;
+}
+
+static int kernint_irqdev_ack(struct kvm_irqdevice *this, int flags,
+ struct kvm_irqack_data *data)
+{
+ struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+ return kvm_irqdevice_ack(get_irq_dev(s), flags, data);
+}
+
+static int kernint_irqdev_set_pin(struct kvm_irqdevice *this,
+ int irq, int level)
+{
+ /* no-op */
+ return 0;
+}
+
+static void kernint_irqdev_destructor(struct kvm_irqdevice *this)
+{
+ struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+ kvm_irqdevice_destructor(&s->apic_irq);
+ kvm_lapic_destroy(s->vcpu);
+ kfree(s);
+}
+
+static void kvm_apic_intr(struct kvm_irqsink *this,
+ struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin)
+{
+ struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+ /*
+ * If the LAPIC sent us an interrupt it *must* be enabled,
+ * just forward it on to the CPU
+ */
+ kvm_irqdevice_set_intr(s->self_irq, pin);
+}
+
+static void kvm_ext_intr(struct kvm_irqsink *this,
+ struct kvm_irqdevice *dev,
+ kvm_irqpin_t pin)
+{
+ struct kvm_kernint *s = (struct kvm_kernint*)this->private;
+
+ /*
+ * If the EXTINT device sent us an interrupt, forward it to the LINT0
+ * pin of the LAPIC
+ */
+ if (pin != kvm_irqpin_localint)
+ return;
+
+ /*
+ * "irq 0" = LINT0, 1 = LINT1
+ */
+ kvm_irqdevice_set_pin(&s->apic_irq, 0, 1);
+}
+
+int kvm_kernint_init(struct kvm_vcpu *vcpu)
+{
+ struct kvm_irqdevice *irqdev = &vcpu->irq.dev;
+ struct kvm_kernint *s;
+ struct kvm_irqsink apicsink;
+
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ s->vcpu = vcpu;
+
+ /*
+ * Configure the irqdevice interface
+ */
+ irqdev->ack = kernint_irqdev_ack;
+ irqdev->set_pin = kernint_irqdev_set_pin;
+ irqdev->destructor = kernint_irqdev_destructor;
+
+ irqdev->private = s;
+ s->self_irq = irqdev;
+
+ /*
+ * Configure the EXTINT device if this is the BSP processor
+ */
+ if (!vcpu_slot(vcpu)) {
+ struct kvm_irqsink extsink = {
+ .set_intr = kvm_ext_intr,
+ .private = s
+ };
+ s->ext_irq = &vcpu->kvm->isa_irq;
+ kvm_irqdevice_register_sink(s->ext_irq, &extsink);
+ }
+
+ /*
+ * Configure the LAPIC device
+ */
+ apicsink.set_intr = kvm_apic_intr;
+ apicsink.private = s;
+
+ kvm_irqdevice_init(&s->apic_irq);
+ kvm_irqdevice_register_sink(&s->apic_irq, &apicsink);
+ kvm_lapic_init(vcpu, &s->apic_irq, 0);
+
+ return 0;
+}
+
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index b5bfc91..60710d8 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -163,6 +163,21 @@ int kvm_user_irqdev_init(struct kvm_irqdevice *dev);
int kvm_user_irqdev_save(struct kvm_irqdevice *this, void *data);
int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data);
int kvm_userint_init(struct kvm_vcpu *vcpu);
+int kvm_kernint_init(struct kvm_vcpu *vcpu);
+
+#define KVM_LAPIC_OPTION_USERMODE (1 << 0)
+
+int kvm_lapic_init(struct kvm_vcpu *vcpu, struct kvm_irqdevice *dev,
+ int flags);
+void kvm_lapic_destroy(struct kvm_vcpu *vcpu);
+void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, u64 cr8);
+u64 kvm_lapic_get_tpr(struct kvm_vcpu *vcpu);
+void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 base);
+u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
+void kvm_lapic_save(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
+void kvm_lapic_restore(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
+void kvm_lapic_reset(struct kvm_vcpu *vcpu);
+int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
/*
* x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
@@ -335,6 +350,11 @@ struct kvm_vcpu_irq {
int usignal;
};
+struct kvm_lapic {
+ void *dev;
+ struct kvm_io_device *mmio;
+};
+
struct kvm_vcpu {
struct kvm *kvm;
union {
@@ -348,6 +368,7 @@ struct kvm_vcpu {
struct kvm_run *run;
int interrupt_window_open;
struct kvm_vcpu_irq irq;
+ struct kvm_lapic apic;
unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
unsigned long rip; /* needs vcpu_load_rsp_rip() */
@@ -358,10 +379,8 @@ struct kvm_vcpu {
struct page *para_state_page;
gpa_t hypercall_gpa;
unsigned long cr4;
- unsigned long cr8;
u64 pdptrs[4]; /* pae */
u64 shadow_efer;
- u64 apic_base;
u64 ia32_misc_enable_msr;
int nmsrs;
struct vmx_msr_entry *guest_msrs;
@@ -532,6 +551,8 @@ struct kvm {
struct list_head vm_list;
struct file *filp;
struct kvm_io_bus mmio_bus;
+ int enable_kernel_pic;
+ struct kvm_irqdevice isa_irq;
};
struct descriptor_table {
@@ -606,6 +627,9 @@ void kvm_exit_arch(void);
int kvm_mmu_module_init(void);
void kvm_mmu_module_exit(void);
+int kvm_apicbus_send(struct kvm *kvm, int dest, int trig_mode, int level,
+ int dest_mode, int delivery_mode, int vector);
+
void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
int kvm_mmu_create(struct kvm_vcpu *vcpu);
int kvm_mmu_setup(struct kvm_vcpu *vcpu);
@@ -737,6 +761,13 @@ static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
return (struct kvm_mmu_page *)page_private(page);
}
+static inline int vcpu_slot(struct kvm_vcpu *vcpu)
+{
+ return vcpu - vcpu->kvm->vcpus;
+}
+
+void kvm_crash_guest(struct kvm *kvm);
+
static inline u16 read_fs(void)
{
u16 seg;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 6b40c18..8be8daa 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -297,6 +297,7 @@ static struct kvm *kvm_create_vm(void)
spin_lock_init(&kvm->lock);
INIT_LIST_HEAD(&kvm->active_mmu_pages);
kvm_io_bus_init(&kvm->mmio_bus);
+ kvm_irqdevice_init(&kvm->isa_irq);
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
struct kvm_vcpu *vcpu = &kvm->vcpus[i];
@@ -391,6 +392,23 @@ static void kvm_free_vcpus(struct kvm *kvm)
kvm_free_vcpu(&kvm->vcpus[i]);
}
+/*
+ * The function kills a guest while there still is a user space processes
+ * with a descriptor to it
+ */
+void kvm_crash_guest(struct kvm *kvm)
+{
+ unsigned int i;
+
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ /*
+ * FIXME: in the future it should send IPI to gracefully
+ * stop the other vCPUs
+ */
+ kvm_free_vcpu(&kvm->vcpus[i]);
+ }
+}
+
static int kvm_dev_release(struct inode *inode, struct file *filp)
{
return 0;
@@ -402,6 +420,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
list_del(&kvm->vm_list);
spin_unlock(&kvm_lock);
kvm_io_bus_destroy(&kvm->mmio_bus);
+ if (kvm->enable_kernel_pic)
+ kvm_irqdevice_destructor(&kvm->isa_irq);
kvm_free_vcpus(kvm);
kvm_free_physmem(kvm);
kfree(kvm);
@@ -607,7 +627,7 @@ void set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
inject_gp(vcpu);
return;
}
- vcpu->cr8 = cr8;
+ kvm_lapic_set_tpr(vcpu, cr8);
}
EXPORT_SYMBOL_GPL(set_cr8);
@@ -908,6 +928,69 @@ out:
return r;
}
+static int kvm_vm_ioctl_enable_kernel_pic(struct kvm *kvm, __u32 val)
+{
+ /*
+ * FIXME: We should not allow this if VCPUs have already been created
+ */
+ if (kvm->enable_kernel_pic)
+ return -EINVAL;
+
+ /*
+ * Someday we may offer two levels of in-kernel PIC support:
+ *
+ * level 0 = (default) compatiblity mode (everything in userspace)
+ * level 1 = LAPIC in kernel, IOAPIC/i8259 in userspace
+ * level 2 = All three in kernel
+ *
+ * For now we only support level 0 and 1. However, you cant set
+ * level 0
+ */
+ if (val != 1)
+ return -EINVAL;
+
+ kvm->enable_kernel_pic = val;
+
+ printk(KERN_INFO "KVM: Setting in-kernel PIC level to %d\n", val);
+
+ /*
+ * installing a user_irqdev model to the kvm->isa_irq device
+ * creates a level-1 environment, where the userspace completely
+ * controls the ISA domain interrupts in the IOAPIC/i8259.
+ * Interrupts come down to the VCPU either as an ISA vector to
+ * this controller, or as an APIC bus message (or both)
+ */
+ kvm_user_irqdev_init(&kvm->isa_irq);
+
+ return 0;
+}
+
+static int kvm_vm_ioctl_isa_interrupt(struct kvm *kvm,
+ struct kvm_interrupt *irq)
+{
+ if (irq->irq < 0 || irq->irq >= 256)
+ return -EINVAL;
+
+ if (!kvm->enable_kernel_pic)
+ return -EINVAL;
+
+ return kvm_irqdevice_set_pin(&kvm->isa_irq, irq->irq, 1);
+}
+
+static int kvm_vm_ioctl_apic_msg(struct kvm *kvm,
+ struct kvm_apic_msg *msg)
+{
+ if (!kvm->enable_kernel_pic)
+ return -EINVAL;
+
+ msg->delivery_mode = (msg->delivery_mode << 8) & 0xF00;
+
+ kvm_apicbus_send(kvm, msg->dest, msg->trig_mode, 1, msg->dest_mode,
+ msg->delivery_mode, msg->vector);
+
+ return 0;
+}
+
static gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
int i;
@@ -1028,10 +1111,16 @@ static int emulator_write_std(unsigned long addr,
static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
gpa_t addr)
{
+ struct kvm_io_device *dev = vcpu->apic.mmio;
+
+ /*
+ * First check if the LAPIC will snarf this request
+ */
+ if (dev && dev->in_range(dev, addr))
+ return dev;
+
/*
- * Note that its important to have this wrapper function because
- * in the very near future we will be checking for MMIOs against
- * the LAPIC as well as the general MMIO bus
+ * And then fallback to allow any device to participate
*/
return kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr);
}
@@ -1497,7 +1586,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
data = 3;
break;
case MSR_IA32_APICBASE:
- data = vcpu->apic_base;
+ data = kvm_lapic_get_base(vcpu);
break;
case MSR_IA32_MISC_ENABLE:
data = vcpu->ia32_misc_enable_msr;
@@ -1575,7 +1664,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
case 0x200 ... 0x2ff: /* MTRRs */
break;
case MSR_IA32_APICBASE:
- vcpu->apic_base = data;
+ kvm_lapic_set_base(vcpu, data);
break;
case MSR_IA32_MISC_ENABLE:
vcpu->ia32_misc_enable_msr = data;
@@ -1839,8 +1928,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
- /* re-sync apic's tpr */
- vcpu->cr8 = kvm_run->cr8;
+ if (!vcpu->kvm->enable_kernel_pic)
+ /* re-sync apic's tpr if the APIC is in userspace */
+ kvm_lapic_set_tpr(vcpu, kvm_run->cr8);
if (vcpu->pio.cur_count) {
r = complete_pio(vcpu);
@@ -1992,11 +2082,12 @@ static int kvm_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
sregs->cr2 = vcpu->cr2;
sregs->cr3 = vcpu->cr3;
sregs->cr4 = vcpu->cr4;
- sregs->cr8 = vcpu->cr8;
sregs->efer = vcpu->shadow_efer;
- sregs->apic_base = vcpu->apic_base;
- kvm_user_irqdev_save(&vcpu->irq.dev, &sregs->interrupt_bitmap);
+ kvm_lapic_save(vcpu, sregs);
+
+ if (!vcpu->kvm->enable_kernel_pic)
+ kvm_user_irqdev_save(&vcpu->irq.dev, &sregs->interrupt_bitmap);
vcpu_put(vcpu);
@@ -2028,14 +2119,10 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
mmu_reset_needed |= vcpu->cr3 != sregs->cr3;
vcpu->cr3 = sregs->cr3;
- vcpu->cr8 = sregs->cr8;
-
mmu_reset_needed |= vcpu->shadow_efer != sregs->efer;
#ifdef CONFIG_X86_64
kvm_arch_ops->set_efer(vcpu, sregs->efer);
#endif
- vcpu->apic_base = sregs->apic_base;
-
kvm_arch_ops->decache_cr4_guest_bits(vcpu);
mmu_reset_needed |= vcpu->cr0 != sregs->cr0;
@@ -2049,8 +2136,11 @@ static int kvm_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
if (mmu_reset_needed)
kvm_mmu_reset_context(vcpu);
- kvm_user_irqdev_restore(&vcpu->irq.dev,
- &sregs->interrupt_bitmap[0]);
+ kvm_lapic_restore(vcpu, sregs);
+
+ if (!vcpu->kvm->enable_kernel_pic)
+ kvm_user_irqdev_restore(&vcpu->irq.dev,
+ &sregs->interrupt_bitmap[0]);
set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
@@ -2522,7 +2612,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n)
kvm_irqdevice_init(&vcpu->irq.dev);
kvm_vcpu_irqsink_init(vcpu);
- r = kvm_userint_init(vcpu);
+
+ if (kvm->enable_kernel_pic)
+ r = kvm_kernint_init(vcpu);
+ else
+ r = kvm_userint_init(vcpu);
+
if (r < 0)
goto out_free_vcpus;
@@ -2644,6 +2739,12 @@ static int kvm_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
return 0;
}
+static int kvm_vcpu_ioctl_apic_reset(struct kvm_vcpu *vcpu)
+{
+ kvm_lapic_reset(vcpu);
+ return 0;
+}
+
static long kvm_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -2813,6 +2914,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_APIC_RESET: {
+ r = kvm_vcpu_ioctl_apic_reset(vcpu);
+ if (r)
+ goto out;
+ r = 0;
+ break;
+ }
default:
;
}
@@ -2866,6 +2974,41 @@ static long kvm_vm_ioctl(struct file *filp,
goto out;
break;
}
+ case KVM_ENABLE_KERNEL_PIC: {
+ __u32 val;
+
+ r = -EFAULT;
+ if (copy_from_user(&val, argp, sizeof val))
+ goto out;
+ r = kvm_vm_ioctl_enable_kernel_pic(kvm, val);
+ if (r)
+ goto out;
+ break;
+ }
+ case KVM_ISA_INTERRUPT: {
+ struct kvm_interrupt irq;
+
+ r = -EFAULT;
+ if (copy_from_user(&irq, argp, sizeof irq))
+ goto out;
+ r = kvm_vm_ioctl_isa_interrupt(kvm, &irq);
+ if (r)
+ goto out;
+ r = 0;
+ break;
+ }
+ case KVM_APIC_MSG: {
+ struct kvm_apic_msg msg;
+
+ r = -EFAULT;
+ if (copy_from_user(&msg, argp, sizeof msg))
+ goto out;
+ r = kvm_vm_ioctl_apic_msg(kvm, &msg);
+ if (r)
+ goto out;
+ r = 0;
+ break;
+ }
default:
;
}
diff --git a/drivers/kvm/lapic.c b/drivers/kvm/lapic.c
new file mode 100644
index 0000000..f7b04f9
--- /dev/null
+++ b/drivers/kvm/lapic.c
@@ -0,0 +1,1412 @@
+/*
+ * Local APIC virtualization
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ * Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
+ *
+ * Based on Xen 3.0 code, Copyright (c) 2004, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "kvm.h"
+#include <linux/kvm.h>
+#include <linux/mm.h>
+#include <linux/highmem.h>
+#include <linux/smp.h>
+#include <linux/hrtimer.h>
+#include <asm/processor.h>
+#include <asm/io.h>
+#include <asm/msr.h>
+#include <asm/page.h>
+#include <asm/current.h>
+
+/*XXX remove this definition after GFW enabled */
+#define APIC_NO_BIOS
+
+#define PRId64 "d"
+#define PRIx64 "llx"
+#define PRIu64 "u"
+#define PRIo64 "o"
+
+#define APIC_BUS_CYCLE_NS 1
+
+/* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
+#define apic_debug(fmt,arg...)
+
+struct kvm_kern_apic {
+ spinlock_t lock;
+ atomic_t ref_count;
+ int usermode;
+ u32 status;
+ u32 vcpu_id;
+ u64 base_msr;
+ unsigned long base_address;
+ struct kvm_io_device mmio_dev;
+ struct {
+ unsigned long pending;
+ u32 divide_count;
+ ktime_t last_update;
+ struct hrtimer dev;
+ } timer;
+ u32 err_status;
+ u32 err_write_count;
+ struct kvm_vcpu *vcpu;
+ struct kvm_irqdevice *irq_dev;
+ struct page *regs_page;
+ void *regs;
+};
+
+static __inline__ int find_highest_bit(unsigned long *data, int nr_bits)
+{
+ int length = BITS_TO_LONGS(nr_bits);
+ while (length && !data[--length])
+ continue;
+ return __ffs(data[length]) + (length * BITS_PER_LONG);
+}
+
+#define APIC_LVT_NUM 6
+/* 14 is the version for Xeon and Pentium 8.4.8*/
+#define APIC_VERSION (0x14UL | ((APIC_LVT_NUM - 1) << 16))
+#define VLOCAL_APIC_MEM_LENGTH (1 << 12)
+/* followed define is not in apicdef.h */
+#define APIC_SHORT_MASK 0xc0000
+#define APIC_DEST_NOSHORT 0x0
+#define APIC_DEST_MASK 0x800
+#define _APIC_GLOB_DISABLE 0x0
+#define APIC_GLOB_DISABLE_MASK 0x1
+#define APIC_SOFTWARE_DISABLE_MASK 0x2
+#define _APIC_BSP_ACCEPT_PIC 0x3
+#define MAX_APIC_INT_VECTOR 256
+
+#define inject_gp(vcpu) kvm_arch_ops->inject_gp(vcpu, 0);
+
+#define apic_enabled(apic) \
+ (!((apic)->status & \
+ (APIC_GLOB_DISABLE_MASK | APIC_SOFTWARE_DISABLE_MASK)))
+
+#define apic_global_enabled(apic) \
+ (!(test_bit(_APIC_GLOB_DISABLE, &(apic)->status)))
+
+#define LVT_MASK \
+ APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK
+
+#define LINT_MASK \
+ LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY |\
+ APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER
+
+#define KVM_APIC_ID(apic) \
+ (GET_APIC_ID(apic_get_reg(apic, APIC_ID)))
+
+#define apic_lvt_enabled(apic, lvt_type) \
+ (!(apic_get_reg(apic, lvt_type) & APIC_LVT_MASKED))
+
+#define apic_lvt_vector(apic, lvt_type) \
+ (apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK)
+
+#define apic_lvt_dm(apic, lvt_type) \
+ (apic_get_reg(apic, lvt_type) & APIC_MODE_MASK)
+
+#define apic_lvtt_period(apic) \
+ (apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC)
+
+static inline u32 apic_get_reg(struct kvm_kern_apic *apic, u32 reg)
+{
+ return *((u32 *)(apic->regs + reg));
+}
+
+static inline void apic_set_reg(struct kvm_kern_apic *apic,
+ u32 reg, u32 val)
+{
+ *((u32 *)(apic->regs + reg)) = val;
+}
+
+static unsigned int apic_lvt_mask[APIC_LVT_NUM] =
+{
+ LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
+ LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
+ LVT_MASK | APIC_MODE_MASK, /* LVTPC */
+ LINT_MASK, LINT_MASK, /* LVT0-1 */
+ LVT_MASK /* LVTERR */
+};
+
+#define ASSERT(x) \
+ if (!(x)) { \
+ printk(KERN_EMERG "assertion failed %s: %d: %s\n", \
+ __FILE__, __LINE__, #x); \
+ BUG(); \
+ }
+
+static int apic_find_highest_irr(struct kvm_kern_apic *apic)
+{
+ int result;
+
+ result = find_highest_bit((unsigned long *)(apic->regs + APIC_IRR),
+ MAX_APIC_INT_VECTOR);
+
+ ASSERT( result == 0 || result >= 16);
+
+ return result;
+}
+
+
+static int apic_find_highest_isr(struct kvm_kern_apic *apic)
+{
+ int result;
+
+ result = find_highest_bit((unsigned long *)(apic->regs + APIC_ISR),
+ MAX_APIC_INT_VECTOR);
+
+ ASSERT( result == 0 || result >= 16);
+
+ return result;
+}
+
+static void apic_dropref(struct kvm_kern_apic *apic)
+{
+ if (atomic_dec_and_test(&apic->ref_count)) {
+
+ spin_lock_bh(&apic->lock);
+
+ hrtimer_cancel(&apic->timer.dev);
+
+ if (apic->regs_page) {
+ __free_page(apic->regs_page);
+ apic->regs_page = 0;
+ }
+
+ spin_unlock_bh(&apic->lock);
+
+ kfree(apic);
+ }
+}
+
+#if 0
+static void apic_dump_state(struct kvm_kern_apic *apic)
+{
+ u64 *tmp;
+
+ printk(KERN_INFO "%s begin\n", __FUNCTION__);
+
+ printk(KERN_INFO "status = 0x%08x\n", apic->status);
+ printk(KERN_INFO "base_msr=0x%016llx, apicbase = 0x%08lx\n",
+ apic->base_msr, apic->base_address);
+
+ tmp = (u64*)(apic->regs + APIC_IRR);
+ printk(KERN_INFO "IRR = 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
+ tmp[3], tmp[2], tmp[1], tmp[0]);
+ tmp = (u64*)(apic->regs + APIC_ISR);
+ printk(KERN_INFO "ISR = 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
+ tmp[3], tmp[2], tmp[1], tmp[0]);
+ tmp = (u64*)(apic->regs + APIC_TMR);
+ printk(KERN_INFO "TMR = 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
+ tmp[3], tmp[2], tmp[1], tmp[0]);
+
+ printk(KERN_INFO "APIC_ID=0x%08x\n", apic_get_reg(apic, APIC_ID));
+ printk(KERN_INFO "APIC_TASKPRI=0x%08x\n",
+ apic_get_reg(apic, APIC_TASKPRI) & 0xff);
+ printk(KERN_INFO "APIC_PROCPRI=0x%08x\n",
+ apic_get_reg(apic, APIC_PROCPRI));
+
+ printk(KERN_INFO "APIC_DFR=0x%08x\n",
+ apic_get_reg(apic, APIC_DFR) | 0x0FFFFFFF);
+ printk(KERN_INFO "APIC_LDR=0x%08x\n",
+ apic_get_reg(apic, APIC_LDR) & APIC_LDR_MASK);
+ printk(KERN_INFO "APIC_SPIV=0x%08x\n",
+ apic_get_reg(apic, APIC_SPIV) & 0x3ff);
+ printk(KERN_INFO "APIC_ESR=0x%08x\n",
+ apic_get_reg(apic, APIC_ESR));
+ printk(KERN_INFO "APIC_ICR=0x%08x\n",
+ apic_get_reg(apic, APIC_ICR) & ~(1 << 12));
+ printk(KERN_INFO "APIC_ICR2=0x%08x\n",
+ apic_get_reg(apic, APIC_ICR2) & 0xff000000);
+
+ printk(KERN_INFO "APIC_LVTERR=0x%08x\n",
+ apic_get_reg(apic, APIC_LVTERR));
+ printk(KERN_INFO "APIC_LVT1=0x%08x\n",
+ apic_get_reg(apic, APIC_LVT1));
+ printk(KERN_INFO "APIC_LVT0=0x%08x\n",
+ apic_get_reg(apic, APIC_LVT0));
+ printk(KERN_INFO "APIC_LVTPC=0x%08x\n",
+ apic_get_reg(apic, APIC_LVTPC));
+ printk(KERN_INFO "APIC_LVTTHMR=0x%08x\n",
+ apic_get_reg(apic, APIC_LVTTHMR));
+ printk(KERN_INFO "APIC_LVTT=0x%08x\n",
+ apic_get_reg(apic, APIC_LVTT));
+
+ printk(KERN_INFO "APIC_TMICT=0x%08x\n",
+ apic_get_reg(apic, APIC_TMICT));
+ printk(KERN_INFO "APIC_TDCR=0x%08x\n",
+ apic_get_reg(apic, APIC_TDCR));
+
+ printk(KERN_INFO "%s end\n", __FUNCTION__);
+}
+#endif
+
+
+static int apic_update_ppr(struct kvm_kern_apic *apic)
+{
+ u32 tpr, isrv, ppr, orig_ppr;
+ int irq;
+ int masked = 0;
+ int forward = 0;
+
+ ppr = apic_get_reg(apic, APIC_PROCPRI);
+ orig_ppr = ppr;
+
+ /*
+ * Before we change anything, see if the only pending vectors we have
+ * are anything masked by PPR
+ */
+ irq = apic_find_highest_irr(apic);
+ if (irq && ((irq & 0xf0) <= ppr))
+ masked = true;
+
+ /*
+ * Compute the PPR value based on the current settings of TPR/ISR
+ */
+ tpr = apic_get_reg(apic, APIC_TASKPRI);
+ irq = apic_find_highest_isr(apic);
+ isrv = (irq >> 4) & 0xf;
+
+ if ((tpr >> 4) >= isrv)
+ ppr = tpr & 0xff;
+ else
+ ppr = isrv << 4; /* low 4 bits of PPR have to be cleared */
+
+ apic_set_reg(apic, APIC_PROCPRI, ppr);
+
+ if (masked) {
+ /*
+ * If we get here its because there were vectors that
+ * were masked by PPR. Check again to see if anything is
+ * now available
+ */
+ irq = apic_find_highest_irr(apic);
+ if ((irq & 0xf0) > ppr)
+ forward = 1;
+ }
+
+ apic_debug("%s: ppr 0x%x (old) 0x%x (new), isr 0x%x, isrv 0x%x\n",
+ __FUNCTION__, orig_ppr, ppr, irq, isrv);
+
+ return forward;
+}
+
+static void apic_set_tpr(struct kvm_kern_apic *apic, u32 tpr)
+{
+ int forward = 0;
+
+ apic_debug("new value = %x\n", tpr);
+
+ apic_set_reg(apic, APIC_TASKPRI, tpr);
+ forward = apic_update_ppr(apic);
+
+ if (forward) {
+ spin_unlock_bh(&apic->lock);
+ kvm_irqdevice_set_intr(apic->irq_dev, kvm_irqpin_localint);
+ spin_lock_bh(&apic->lock);
+ }
+}
+
+static int apic_match_dest(struct kvm_kern_apic *target,
+ int dest,
+ int dest_mode,
+ int delivery_mode)
+{
+ int result = 0;
+
+ spin_lock_bh(&target->lock);
+
+ if (!dest_mode) /* Physical */
+ result = (GET_APIC_ID(apic_get_reg(target, APIC_ID)) == dest);
+ else { /* Logical */
+ u32 ldr = apic_get_reg(target, APIC_LDR);
+
+ /* Flat mode */
+ if (apic_get_reg(target, APIC_DFR) == APIC_DFR_FLAT)
+ result = GET_APIC_LOGICAL_ID(ldr) & dest;
+ else {
+ if ((delivery_mode == APIC_DM_LOWEST) &&
+ (dest == 0xff)) {
+ printk(KERN_ALERT "Broadcast IPI " \
+ "with lowest priority "
+ "delivery mode\n");
+ spin_unlock_bh(&target->lock);
+ kvm_crash_guest(target->vcpu->kvm);
+ return 0;
+ }
+ if (GET_APIC_LOGICAL_ID(ldr) == (dest & 0xf))
+ result = (GET_APIC_LOGICAL_ID(ldr) >> 4) &
+ (dest >> 4);
+ else
+ result = 0;
+ }
+ }
+
+ spin_unlock_bh(&target->lock);
+
+ return result;
+}
+
+/*
+ * Add a pending IRQ into lapic.
+ * Return 1 if successfully added and 0 if discarded.
+ */
+static int __apic_accept_irq(struct kvm_kern_apic *apic,
+ int delivery_mode,
+ int vector,
+ int level,
+ int trig_mode)
+{
+ kvm_irqpin_t pin = kvm_irqpin_invalid;
+
+ switch (delivery_mode) {
+ case APIC_DM_FIXED:
+ case APIC_DM_LOWEST:
+ if (unlikely(!apic_enabled(apic)))
+ break;
+
+ if (test_and_set_bit(vector, apic->regs + APIC_IRR)
+ && trig_mode) {
+ apic_debug("level trig mode repeatedly for vector " \
+ "%d\n", vector);
+ break;
+ }
+
+ if (trig_mode) {
+ apic_debug("level trig mode for vector %d\n", vector);
+ set_bit(vector, apic->regs + APIC_TMR);
+ }
+
+ apic_debug("FIXED/LOWEST interrupt for vector %d\n", vector);
+ pin = kvm_irqpin_localint;
+ break;
+ case APIC_DM_REMRD:
+ printk(KERN_WARNING "%s: Ignore deliver mode %d\n",
+ __FUNCTION__, delivery_mode);
+ break;
+ case APIC_DM_EXTINT:
+ apic_debug("EXTINT interrupt\n");
+ pin = kvm_irqpin_extint;
+ break;
+ case APIC_DM_SMI:
+ apic_debug("SMI interrupt\n");
+ pin = kvm_irqpin_smi;
+ break;
+ case APIC_DM_NMI:
+ apic_debug("NMI interrupt\n");
+ pin = kvm_irqpin_nmi;
+ break;
+ case APIC_DM_INIT:
+ case APIC_DM_STARTUP: /* FIXME: currently no support for SMP */
+ default:
+ printk(KERN_ALERT "TODO: support interrupt type %x\n",
+ delivery_mode);
+ kvm_crash_guest(apic->vcpu->kvm);
+ break;
+ }
+
+ if (likely(pin != kvm_irqpin_invalid)) {
+ /*
+ * temp release of the lock to transmit
+ */
+ spin_unlock_bh(&apic->lock);
+ kvm_irqdevice_set_intr(apic->irq_dev, pin);
+ spin_lock_bh(&apic->lock);
+
+ return 1;
+ } else
+ return 0;
+}
+
+static int apic_accept_irq(struct kvm_kern_apic *apic,
+ int delivery_mode,
+ int vector,
+ int level,
+ int trig_mode)
+{
+ int ret;
+
+ spin_lock_bh(&apic->lock);
+ ret = __apic_accept_irq(apic, delivery_mode, vector,
+ level, trig_mode);
+ spin_unlock_bh(&apic->lock);
+
+ return ret;
+}
+
+static void apic_set_eoi(struct kvm_kern_apic *apic)
+{
+ int vector = apic_find_highest_isr(apic);
+ int forward;
+
+ /*
+ * Not every write EOI will has corresponding ISR,
+ * one example is when Kernel check timer on setup_IO_APIC
+ */
+ if (!vector)
+ return;
+
+ __clear_bit(vector, apic->regs + APIC_ISR);
+ forward = apic_update_ppr(apic);
+
+ __clear_bit(vector, apic->regs + APIC_TMR);
+
+ if (forward) {
+ spin_unlock_bh(&apic->lock);
+ kvm_irqdevice_set_intr(apic->irq_dev, kvm_irqpin_localint);
+ spin_lock_bh(&apic->lock);
+ }
+}
+
+static int apic_check_vector(struct kvm_kern_apic *apic,u32 dm, u32 vector)
+{
+ if ((dm == APIC_DM_FIXED) && (vector < 16)) {
+ apic->err_status |= 0x40;
+ __apic_accept_irq(apic, APIC_DM_FIXED,
+ apic_lvt_vector(apic, APIC_LVTERR), 0, 0);
+ apic_debug("%s: check failed "
+ " dm %x vector %x\n", __FUNCTION__, dm, vector);
+ return 0;
+ }
+ return 1;
+}
+
+int kvm_apicbus_send(struct kvm *kvm, int dest, int trig_mode, int level,
+ int dest_mode, int delivery_mode, int vector)
+{
+ int i;
+ u32 lpr_map = 0;
+
+ apic_debug("%s: %d %d %d %d %d %d\n", __FUNCTION__,
+ dest, trig_mode, level, dest_mode, delivery_mode, vector);
+
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ struct kvm_kern_apic *target;
+ target = kvm->vcpus[i].apic.dev;
+
+ if (!target)
+ continue;
+
+ if (apic_match_dest(target, dest, dest_mode, delivery_mode)) {
+ if (delivery_mode == APIC_DM_LOWEST)
+ __set_bit(target->vcpu_id, &lpr_map);
+ else
+ apic_accept_irq(target, delivery_mode,
+ vector, level, trig_mode);
+ }
+ }
+
+ if (delivery_mode == APIC_DM_LOWEST) {
+ struct kvm_kern_apic *target;
+
+ /* Currently only UP is supported */
+ target = kvm->vcpus[0].apic.dev;
+
+ if (target)
+ apic_accept_irq(target, delivery_mode,
+ vector, level, trig_mode);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_apicbus_send);
+
+static void apic_send_ipi(struct kvm_kern_apic *apic)
+{
+ u32 icr_low = apic_get_reg(apic, APIC_ICR);
+ u32 icr_high = apic_get_reg(apic, APIC_ICR2);
+
+ unsigned int dest = GET_APIC_DEST_FIELD(icr_high);
+ unsigned int short_hand = icr_low & APIC_SHORT_MASK;
+ unsigned int trig_mode = icr_low & APIC_INT_LEVELTRIG;
+ unsigned int level = icr_low & APIC_INT_ASSERT;
+ unsigned int dest_mode = icr_low & APIC_DEST_MASK;
+ unsigned int delivery_mode = icr_low & APIC_MODE_MASK;
+ unsigned int vector = icr_low & APIC_VECTOR_MASK;
+
+ apic_debug("icr_high 0x%x, icr_low 0x%x, "
+ "short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, "
+ "dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n",
+ icr_high, icr_low, short_hand, dest,
+ trig_mode, level, dest_mode, delivery_mode, vector);
+
+ /*
+ * We unlock here because we would enter this function in a lock
+ * state and we dont want to remain this way while we transmit
+ */
+ spin_unlock_bh(&apic->lock);
+
+ if (short_hand == APIC_DEST_NOSHORT)
+ /*
+ * If no short-hand notation is in use, just forward the
+ * message onto the apicbus and let the bus handle the routing.
+ */
+ kvm_apicbus_send(apic->vcpu->kvm, dest, trig_mode, level,
+ dest_mode, delivery_mode, vector);
+ else {
+ /*
+ * Otherwise we need to consider the short-hand to find the
+ * correct targets.
+ */
+ unsigned int i;
+
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ struct kvm_kern_apic *target;
+ int result = 0;
+
+ target = apic->vcpu->kvm->vcpus[i].apic.dev;
+
+ if (!target)
+ continue;
+
+ switch (short_hand) {
+ case APIC_DEST_SELF:
+ if (target == apic)
+ result = 1;
+ break;
+ case APIC_DEST_ALLINC:
+ result = 1;
+ break;
+
+ case APIC_DEST_ALLBUT:
+ if (target != apic)
+ result = 1;
+ break;
+ }
+
+ if (result)
+ apic_accept_irq(target, delivery_mode,
+ vector, level, trig_mode);
+ }
+ }
+
+ /*
+ * Relock before returning
+ */
+ spin_lock_bh(&apic->lock);
+
+}
+
+static u32 apic_get_tmcct(struct kvm_kern_apic *apic)
+{
+ u32 counter_passed;
+ ktime_t passed, now = apic->timer.dev.base->get_time();
+ u32 tmcct = apic_get_reg(apic, APIC_TMCCT);
+
+ ASSERT(apic != NULL);
+
+ if (unlikely(ktime_to_ns(now) <=
+ ktime_to_ns(apic->timer.last_update))) {
+ /* Wrap around */
+ passed = ktime_add(
+ ({ (ktime_t){
+ .tv64 = KTIME_MAX -
+ (apic->timer.last_update).tv64 };
+ }), now);
+ apic_debug("time elapsed\n");
+ } else
+ passed = ktime_sub(now, apic->timer.last_update);
+
+ counter_passed = ktime_to_ns(passed) /
+ (APIC_BUS_CYCLE_NS * apic->timer.divide_count);
+ tmcct -= counter_passed;
+
+ if (tmcct <= 0) {
+ if (unlikely(!apic_lvtt_period(apic))) {
+ tmcct = 0;
+ } else {
+ do {
+ tmcct += apic_get_reg(apic, APIC_TMICT);
+ } while ( tmcct <= 0 );
+ }
+ }
+
+ apic->timer.last_update = now;
+ apic_set_reg(apic, APIC_TMCCT, tmcct);
+
+ return tmcct;
+}
+
+/*
+ *----------------------------------------------------------------------
+ * MMIO
+ *----------------------------------------------------------------------
+ */
+
+#define align(val, len) ((val + (len-1)) & ~len)
+
+static int validate_mmio(struct kvm_kern_apic *apic, gpa_t address, int len)
+{
+ /*
+ * According to IA 32 Manual, all registers should be accessed with
+ * 32 bits alignment.
+ */
+ if (align(address, 4) != align(address+len, 4)) {
+ printk(KERN_WARNING "KVM: MMIO request to %d bytes at %ld " \
+ "is not 32 bit aligned. Injecting #GP\n",
+ len, address);
+ inject_gp(apic->vcpu);
+ return 0;
+ }
+
+ return 1;
+}
+
+static u32 __apic_read(struct kvm_kern_apic *apic,
+ unsigned int offset)
+{
+ u32 val = 0;
+
+ if (offset > APIC_TDCR)
+ return 0;
+
+ switch (offset) {
+ case APIC_ARBPRI:
+ printk(KERN_WARNING "access local APIC ARBPRI register " \
+ "which is for P6\n");
+ break;
+
+ case APIC_TMCCT: /* Timer CCR */
+ val = apic_get_tmcct(apic);
+ break;
+
+ case APIC_ESR:
+ apic->err_write_count = 0;
+ /* fall through */
+ default:
+ val = apic_get_reg(apic, offset);
+ break;
+ }
+
+ return val;
+}
+
+static void apic_mmio_read(struct kvm_io_device *this,
+ gpa_t address,
+ int len,
+ void *data)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+ unsigned int offset = address - apic->base_address;
+ unsigned char alignment = offset & 0x3;
+ u32 val;
+
+ if (!validate_mmio(apic, address, len))
+ return;
+
+ spin_lock_bh(&apic->lock);
+ val = __apic_read(apic, offset & ~0x3);
+ spin_unlock_bh(&apic->lock);
+
+ switch (len) {
+ case 1:
+ case 2:
+ case 4:
+ memcpy(data, (char*)((char*)&val + alignment), len);
+ break;
+ default:
+ printk(KERN_ALERT "Local APIC read with len = %x, " \
+ "should be 1,2, or 4 instead\n", len);
+ inject_gp(apic->vcpu);
+ break;
+ }
+}
+
+static void apic_mmio_write(struct kvm_io_device *this,
+ gpa_t address,
+ int len,
+ const void *data)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+ unsigned int offset = address - apic->base_address;
+ unsigned char alignment = offset & 0x3;
+ u32 val;
+
+ if (!validate_mmio(apic, address, len))
+ return;
+
+ spin_lock_bh(&apic->lock);
+
+ switch (len) {
+ case 1:
+ case 2: {
+ unsigned int tmp;
+
+ /*
+ * Some kernels will access with byte/word alignment
+ */
+ apic_debug("Notice: Local APIC write with len = %x\n", len);
+ tmp = __apic_read(apic, offset & ~0x3);
+ switch (len) {
+ case 1:
+ val = *(u8*)data;
+
+ val = (tmp & ~(0xff << (8*alignment))) |
+ ((val & 0xff) << (8*alignment));
+ break;
+
+ case 2:
+ if (alignment != 0x0 && alignment != 0x2) {
+ printk(KERN_ALERT "alignment error for apic " \
+ "with len == 2\n");
+ inject_gp(apic->vcpu);
+ }
+
+ /*
+ * assumes 16 bit alignment on the pointer.
+ * Mis-alignment is a host-side issue, however, so
+ * we crash
+ */
+ BUG_ON(((long)data & 0x1));
+
+ val = *(u16*)data;
+
+ val = (tmp & ~(0xffff << (8*alignment))) |
+ ((val & 0xffff) << (8*alignment));
+ break;
+ }
+
+ break;
+ }
+ case 4:
+ memcpy(&val, data, 4);
+ break;
+ default:
+ printk(KERN_ALERT "Local APIC write with len = %x, " \
+ "should be 1,2, or 4 instead\n", len);
+ inject_gp(apic->vcpu);
+ break;
+ }
+
+ /* too common printing */
+ if (offset != APIC_EOI)
+ apic_debug("%s: offset 0x%x with length 0x%x, and value is " \
+ "0x%lx\n",
+ __FUNCTION__, offset, len, val);
+
+ offset &= 0xff0;
+
+ switch (offset) {
+ case APIC_ID: /* Local APIC ID */
+ apic_set_reg(apic, APIC_ID, val);
+ break;
+
+ case APIC_TASKPRI:
+ apic_set_tpr(apic, val & 0xff);
+ break;
+
+ case APIC_EOI:
+ apic_set_eoi(apic);
+ break;
+
+ case APIC_LDR:
+ apic_set_reg(apic, APIC_LDR, val & APIC_LDR_MASK);
+ break;
+
+ case APIC_DFR:
+ apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
+ break;
+
+ case APIC_SPIV:
+ apic_set_reg(apic, APIC_SPIV, val & 0x3ff);
+ if (!(val & APIC_SPIV_APIC_ENABLED)) {
+ int i;
+ u32 lvt_val;
+
+ apic->status |= APIC_SOFTWARE_DISABLE_MASK;
+ for (i = 0; i < APIC_LVT_NUM; i++) {
+ lvt_val = apic_get_reg(apic,
+ APIC_LVTT +
+ 0x10 * i);
+ apic_set_reg(apic, APIC_LVTT + 0x10 * i,
+ lvt_val | APIC_LVT_MASKED);
+ }
+
+ if ((apic_get_reg(apic, APIC_LVT0) &
+ APIC_MODE_MASK) == APIC_DM_EXTINT)
+ clear_bit(_APIC_BSP_ACCEPT_PIC, &apic->status);
+ } else {
+ apic->status &= ~APIC_SOFTWARE_DISABLE_MASK;
+ if ((apic_get_reg(apic, APIC_LVT0) &
+ APIC_MODE_MASK) == APIC_DM_EXTINT)
+ set_bit(_APIC_BSP_ACCEPT_PIC, &apic->status);
+ }
+ break;
+
+ case APIC_ESR:
+ apic->err_write_count = !apic->err_write_count;
+ if (!apic->err_write_count)
+ apic->err_status = 0;
+ break;
+
+ case APIC_ICR:
+ /* No delay here, so we always clear the pending bit*/
+ apic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
+ apic_send_ipi(apic);
+ break;
+
+ case APIC_ICR2:
+ apic_set_reg(apic, APIC_ICR2, val & 0xff000000);
+ break;
+
+ case APIC_LVTT:
+ case APIC_LVTTHMR:
+ case APIC_LVTPC:
+ case APIC_LVT0:
+ case APIC_LVT1:
+ case APIC_LVTERR:
+ {
+ if (apic->status & APIC_SOFTWARE_DISABLE_MASK)
+ val |= APIC_LVT_MASKED;
+
+ val &= apic_lvt_mask[(offset - APIC_LVTT) >> 4];
+ apic_set_reg(apic, offset, val);
+
+ /* On hardware, when write vector less than 0x20 will error */
+ if (!(val & APIC_LVT_MASKED))
+ apic_check_vector(apic, apic_lvt_dm(apic, offset),
+ apic_lvt_vector(apic, offset));
+ if (!apic->vcpu_id && (offset == APIC_LVT0)) {
+ if ((val & APIC_MODE_MASK) == APIC_DM_EXTINT)
+ if (val & APIC_LVT_MASKED)
+ clear_bit(_APIC_BSP_ACCEPT_PIC,
+ &apic->status);
+ else
+ set_bit(_APIC_BSP_ACCEPT_PIC,
+ &apic->status);
+ else
+ clear_bit(_APIC_BSP_ACCEPT_PIC,
+ &apic->status);
+ }
+ }
+ break;
+
+ case APIC_TMICT:
+ {
+ ktime_t now = apic->timer.dev.base->get_time();
+ u32 offset;
+
+ apic_set_reg(apic, APIC_TMICT, val);
+ apic_set_reg(apic, APIC_TMCCT, val);
+ apic->timer.last_update = now;
+ offset = APIC_BUS_CYCLE_NS * apic->timer.divide_count * val;
+
+ /* Make sure the lock ordering is coherent */
+ spin_unlock_bh(&apic->lock);
+ hrtimer_cancel(&apic->timer.dev);
+ hrtimer_start(&apic->timer.dev,
+ ktime_add_ns(now, offset),
+ HRTIMER_MODE_ABS);
+
+ apic_debug("%s: bus cycle is %"PRId64"ns, now 0x%016"PRIx64", "
+ "timer initial count 0x%x, offset 0x%x, "
+ "expire @ 0x%016"PRIx64".\n", __FUNCTION__,
+ APIC_BUS_CYCLE_NS, ktime_to_ns(now),
+ apic_get_reg(apic, APIC_TMICT),
+ offset, ktime_to_ns(ktime_add_ns(now, offset)));
+ }
+ return;
+
+ case APIC_TDCR:
+ {
+ unsigned int tmp1, tmp2;
+
+ tmp1 = val & 0xf;
+ tmp2 = ((tmp1 & 0x3) | ((tmp1 & 0x8) >> 1)) + 1;
+ apic->timer.divide_count = 0x1 << (tmp2 & 0x7);
+
+ apic_set_reg(apic, APIC_TDCR, val);
+
+ apic_debug("timer divide count is 0x%x\n",
+ apic->timer.divide_count);
+ }
+ break;
+
+ default:
+ printk(KERN_WARNING "Local APIC Write to read-only register\n");
+ break;
+ }
+
+ spin_unlock_bh(&apic->lock);
+}
+
+static int apic_mmio_range(struct kvm_io_device *this, gpa_t addr)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+ int ret = 0;
+
+ spin_lock_bh(&apic->lock);
+
+ if (apic_global_enabled(apic) &&
+ (addr >= apic->base_address) &&
+ (addr < (apic->base_address + VLOCAL_APIC_MEM_LENGTH)))
+ ret = 1;
+
+ spin_unlock_bh(&apic->lock);
+
+ return ret;
+}
+
+static void apic_mmio_destructor(struct kvm_io_device *this)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+
+ apic_dropref(apic);
+}
+
+static void apic_mmio_register(struct kvm_kern_apic *apic)
+{
+ /* Register ourselves with the MMIO subsystem */
+ struct kvm_io_device *dev = &apic->mmio_dev;
+
+ dev->read = apic_mmio_read;
+ dev->write = apic_mmio_write;
+ dev->in_range = apic_mmio_range;
+ dev->destructor = apic_mmio_destructor;
+
+ dev->private = apic;
+ atomic_inc(&apic->ref_count);
+
+ apic->vcpu->apic.mmio = dev;
+}
+
+/*
+ *----------------------------------------------------------------------
+ * LAPIC interface
+ *----------------------------------------------------------------------
+ */
+
+void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, u64 cr8)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+
+ spin_lock_bh(&apic->lock);
+ apic_set_tpr(apic, ((cr8 & 0x0f) << 4));
+ spin_unlock_bh(&apic->lock);
+}
+
+u64 kvm_lapic_get_tpr(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+ u64 tpr;
+
+ spin_lock_bh(&apic->lock);
+ tpr = (u64)apic_get_reg(apic, APIC_TASKPRI);
+ spin_unlock_bh(&apic->lock);
+
+ return (tpr & 0xf0) >> 4;
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_get_tpr);
+
+void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+
+ spin_lock_bh(&apic->lock);
+ if (apic->vcpu_id)
+ value &= ~MSR_IA32_APICBASE_BSP;
+
+ apic->base_msr = value;
+ apic->base_address = apic->base_msr & MSR_IA32_APICBASE_BASE;
+
+ /* with FSB delivery interrupt, we can restart APIC functionality */
+ if (!(value & MSR_IA32_APICBASE_ENABLE))
+ set_bit(_APIC_GLOB_DISABLE, &apic->status);
+ else
+ clear_bit(_APIC_GLOB_DISABLE, &apic->status);
+
+ apic_debug("apic base msr is 0x%016"PRIx64", and base address is " \
+ "0x%lx.\n", apic->base_msr, apic->base_address);
+
+ spin_unlock_bh(&apic->lock);
+}
+
+u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+ u64 base;
+
+ spin_lock_bh(&apic->lock);
+ base = apic->base_msr;
+ spin_unlock_bh(&apic->lock);
+
+ return base;
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_get_base);
+
+void kvm_lapic_save(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ /*
+ * FIXME: This needs to support the entire register set when
+ * enabled
+ */
+ sregs->cr8 = kvm_lapic_get_tpr(vcpu);
+ sregs->apic_base = kvm_lapic_get_base(vcpu);
+}
+
+void kvm_lapic_restore(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ /*
+ * FIXME: This needs to support the entire register set when
+ * enabled
+ */
+ kvm_lapic_set_tpr(vcpu, sregs->cr8);
+ kvm_lapic_set_base(vcpu, sregs->apic_base);
+}
+
+void kvm_lapic_reset(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic;
+ int i;
+
+ apic_debug("%s\n", __FUNCTION__);
+
+ ASSERT(vcpu);
+ apic = vcpu->apic.dev;
+ ASSERT(apic != NULL);
+
+ /* Stop the timer in case it's a reset to an active apic */
+ hrtimer_cancel(&apic->timer.dev);
+
+ spin_lock_bh(&apic->lock);
+
+ apic_set_reg(apic, APIC_ID, vcpu_slot(vcpu) << 24);
+ apic_set_reg(apic, APIC_LVR, APIC_VERSION);
+
+ for (i = 0; i < APIC_LVT_NUM; i++)
+ apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+
+ apic_set_reg(apic, APIC_DFR, 0xffffffffU);
+ apic_set_reg(apic, APIC_SPIV, 0xff);
+ apic_set_reg(apic, APIC_TASKPRI, 0);
+ apic_set_reg(apic, APIC_LDR, 0);
+ apic_set_reg(apic, APIC_ESR, 0);
+ apic_set_reg(apic, APIC_ICR, 0);
+ apic_set_reg(apic, APIC_ICR2, 0);
+ apic_set_reg(apic, APIC_TDCR, 0);
+ apic_set_reg(apic, APIC_TMICT, 0);
+ memset((void*)(apic->regs + APIC_IRR), 0, KVM_IRQ_BITMAP_SIZE(u8));
+ memset((void*)(apic->regs + APIC_ISR), 0, KVM_IRQ_BITMAP_SIZE(u8));
+ memset((void*)(apic->regs + APIC_TMR), 0, KVM_IRQ_BITMAP_SIZE(u8));
+
+ apic->base_msr =
+ MSR_IA32_APICBASE_ENABLE |
+ APIC_DEFAULT_PHYS_BASE;
+ if (vcpu_slot(vcpu) == 0)
+ apic->base_msr |= MSR_IA32_APICBASE_BSP;
+ apic->base_address = apic->base_msr & MSR_IA32_APICBASE_BASE;
+
+ apic->timer.divide_count = 0;
+ apic->timer.pending = 0;
+ apic->status = 0;
+
+#ifdef APIC_NO_BIOS
+ /*
+ * XXX According to mp specification, BIOS will enable LVT0/1,
+ * remove it after BIOS enabled
+ */
+ if (!vcpu_slot(vcpu)) {
+ apic_set_reg(apic, APIC_LVT0, APIC_MODE_EXTINT << 8);
+ apic_set_reg(apic, APIC_LVT1, APIC_MODE_NMI << 8);
+ set_bit(_APIC_BSP_ACCEPT_PIC, &apic->status);
+ }
+#endif
+
+ spin_unlock_bh(&apic->lock);
+
+ printk(KERN_INFO "%s: vcpu=%p, id=%d, base_msr=" \
+ "0x%016"PRIx64", base_address=0x%0lx.\n", __FUNCTION__, vcpu,
+ GET_APIC_ID(apic_get_reg(apic, APIC_ID)),
+ apic->base_msr, apic->base_address);
+}
+
+int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+ int ret = 0;
+
+ spin_lock_bh(&apic->lock);
+ if (!apic->usermode)
+ ret = apic_enabled(apic);
+ spin_unlock_bh(&apic->lock);
+
+ return ret;
+}
+
+/*
+ *----------------------------------------------------------------------
+ * timer interface
+ *----------------------------------------------------------------------
+ */
+static int __apic_timer_fn(struct kvm_kern_apic *apic)
+{
+ u32 vector;
+ ktime_t now;
+ int result = 0;
+
+ if (unlikely(!apic_enabled(apic) ||
+ !apic_lvt_enabled(apic, APIC_LVTT))) {
+ apic_debug("%s: time interrupt although apic is down\n",
+ __FUNCTION__);
+ return 0;
+ }
+
+ vector = apic_lvt_vector(apic, APIC_LVTT);
+ now = apic->timer.dev.base->get_time();
+ apic->timer.last_update = now;
+ apic->timer.pending++;
+
+ __apic_accept_irq(apic, APIC_DM_FIXED, vector, 1, 0);
+
+ if (apic_lvtt_period(apic)) {
+ u32 offset;
+ u32 tmict = apic_get_reg(apic, APIC_TMICT);
+
+ apic_set_reg(apic, APIC_TMCCT, tmict);
+ offset = APIC_BUS_CYCLE_NS * apic->timer.divide_count * tmict;
+
+ result = 1;
+ apic->timer.dev.expires = ktime_add_ns(now, offset);
+
+ apic_debug("%s: now 0x%016"PRIx64", expire @ 0x%016"PRIx64", "
+ "timer initial count 0x%x, timer current count 0x%x.\n",
+ __FUNCTION__,
+ ktime_to_ns(now), ktime_add_ns(now, offset),
+ apic_get_reg(apic, APIC_TMICT),
+ apic_get_reg(apic, APIC_TMCCT));
+ } else {
+ apic_set_reg(apic, APIC_TMCCT, 0);
+ apic_debug("%s: now 0x%016"PRIx64", "
+ "timer initial count 0x%x, timer current count 0x%x.\n",
+ __FUNCTION__,
+ ktime_to_ns(now), apic_get_reg(apic, APIC_TMICT),
+ apic_get_reg(apic, APIC_TMCCT));
+ }
+
+ return result;
+}
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
+{
+ struct kvm_kern_apic *apic;
+ int restart_timer = 0;
+
+ apic = container_of(data, struct kvm_kern_apic, timer.dev);
+
+ spin_lock_bh(&apic->lock);
+ restart_timer = __apic_timer_fn(apic);
+ spin_unlock_bh(&apic->lock);
+
+ if (restart_timer)
+ return HRTIMER_RESTART;
+ else
+ return HRTIMER_NORESTART;
+}
+
+/*
+ *----------------------------------------------------------------------
+ * IRQDEVICE interface
+ *----------------------------------------------------------------------
+ */
+
+static int apic_irqdev_ack(struct kvm_irqdevice *this, int flags,
+ struct kvm_irqack_data *data)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+ int irq;
+
+ apic_debug("LAPIC ACK attempt\n");
+
+ spin_lock_bh(&apic->lock);
+
+ if (!apic_enabled(apic))
+ goto out;
+
+ if (!(flags & KVM_IRQACK_FLAG_PEEK)) {
+ irq = apic_find_highest_irr(apic);
+ if ((irq & 0xf0) > apic_get_reg(apic, APIC_PROCPRI)) {
+ BUG_ON (irq < 0x10);
+
+ __set_bit(irq, apic->regs + APIC_ISR);
+ __clear_bit(irq, apic->regs + APIC_IRR);
+ apic_update_ppr(apic);
+
+ /*
+ * We have to special case the timer interrupt
+ * because we want the vector to stay pending
+ * for each tick of the clock, even for a backlog.
+ * Therefore, if this was a timer vector and we
+ * still have ticks pending, keep IRR set
+ */
+ if (irq == apic_lvt_vector(apic, APIC_LVTT)) {
+ BUG_ON(!apic->timer.pending);
+ apic->timer.pending--;
+ if (apic->timer.pending)
+ __set_bit(irq, apic->regs + APIC_IRR);
+ }
+
+ data->flags |= KVM_IRQACKDATA_VECTOR_VALID;
+ data->vector = irq;
+ }
+ else
+ data->vector = -1;
+
+ apic_debug("ACK for vector %d\n", data->vector);
+ }
+
+ /*
+ * See if there is anything still pending. Don't forget that we may
+ * have entered this function with vector=NULL just to check pending
+ * status
+ */
+ irq = apic_find_highest_irr(apic);
+ if (irq) {
+ /*
+ * we check TASKPRI (as opposed to PROCPRI) because we
+ * want to know the threshold against which the CPU is masking
+ * interrupts, not the APIC itself. The PROCPRI register
+ * factors in the ISR in addition to the TPR. Therefore, we
+ * would never see a match here if we looked at PPR since
+ * we just injected the highest ISR during this call
+ */
+ if (irq > apic_get_reg(apic, APIC_TASKPRI))
+ data->flags |= KVM_IRQACKDATA_VECTOR_PENDING;
+ }
+
+ out:
+ spin_unlock_bh(&apic->lock);
+
+ return 0;
+}
+
+static int apic_irqdev_set_pin(struct kvm_irqdevice *this, int irq, int level)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+ int lvt = 0;
+
+ spin_lock_bh(&apic->lock);
+
+ if (!apic_enabled(apic)) {
+ /*
+ * If the LAPIC is disabled, we simply forward the interrupt
+ * on to the output line
+ */
+ __apic_accept_irq(apic, APIC_DM_EXTINT, 0, level, 1);
+ goto out;
+ }
+
+ /*
+ * pin "0" is LINT0, and "1" is LINT1
+ */
+ BUG_ON(irq > 1);
+
+ switch(irq) {
+ case 0:
+ lvt = APIC_LVT0;
+ break;
+ case 1:
+ lvt = APIC_LVT1;
+ break;
+ }
+
+ if (apic_lvt_enabled(apic, lvt))
+ __apic_accept_irq(apic,
+ apic_lvt_dm(apic, lvt),
+ apic_lvt_vector(apic, lvt),
+ level,
+ 1);
+
+
+ out:
+ spin_unlock_bh(&apic->lock);
+
+ return 0;
+}
+
+static void apic_irqdev_destructor(struct kvm_irqdevice *this)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)this->private;
+
+ apic_dropref(apic);
+}
+
+static void apic_irqdev_register(struct kvm_kern_apic *apic,
+ struct kvm_irqdevice *dev)
+{
+ dev->ack = apic_irqdev_ack;
+ dev->set_pin = apic_irqdev_set_pin;
+ dev->destructor = apic_irqdev_destructor;
+
+ dev->private = apic;
+ atomic_inc(&apic->ref_count);
+
+ apic->irq_dev = dev;
+}
+
+int kvm_lapic_init(struct kvm_vcpu *vcpu,
+ struct kvm_irqdevice *irq_dev, int flags)
+{
+ struct kvm_kern_apic *apic = NULL;
+ struct kvm_io_device *mmio_dev = NULL;
+
+ ASSERT(vcpu != NULL);
+ apic_debug("apic_init %d\n", vcpu_slot(vcpu));
+
+ apic = kzalloc(sizeof(*apic), GFP_KERNEL);
+ if (!apic)
+ goto nomem;
+
+ spin_lock_init(&apic->lock);
+ atomic_inc(&apic->ref_count);
+ apic->vcpu_id = vcpu_slot(vcpu);
+
+ apic->regs_page = alloc_page(GFP_KERNEL);
+ if ( apic->regs_page == NULL ) {
+ printk(KERN_ALERT "malloc apic regs error for vcpu %x\n",
+ vcpu_slot(vcpu));
+ goto nomem;
+ }
+ apic->regs = page_address(apic->regs_page);
+ memset(apic->regs, 0, PAGE_SIZE);
+
+ apic->vcpu = vcpu;
+ vcpu->apic.dev = apic;
+
+ if (!(flags & KVM_LAPIC_OPTION_USERMODE)) {
+ apic_irqdev_register(apic, irq_dev);
+ apic_mmio_register(apic);
+ } else
+ apic->usermode = 1;
+
+ hrtimer_init(&apic->timer.dev, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+ apic->timer.dev.function = apic_timer_fn;
+
+ kvm_lapic_reset(vcpu);
+ return 0;
+
+ nomem:
+ if (mmio_dev)
+ kfree(mmio_dev);
+
+ if (apic)
+ apic_dropref(apic);
+
+ return -ENOMEM;
+}
+
+void kvm_lapic_destroy(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic = vcpu->apic.dev;
+
+ if (vcpu->apic.mmio)
+ kvm_iodevice_destructor(vcpu->apic.mmio);
+
+ apic_dropref(apic);
+}
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 91546ae..b9ace21 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -570,9 +570,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
fx_init(vcpu);
vcpu->fpu_active = 1;
- vcpu->apic_base = 0xfee00000 |
- /*for vcpu 0*/ MSR_IA32_APICBASE_BSP |
- MSR_IA32_APICBASE_ENABLE;
return 0;
@@ -1410,9 +1407,9 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
r = kvm_vcpu_irq_pop(vcpu, &ack);
break;
case kvm_irqpin_extint:
- printk(KERN_WARNING "KVM: external-interrupts not " \
- "handled yet\n");
- __clear_bit(pin, &vcpu->irq.pending);
+ r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq, 0, &ack);
+ if (!(ack.flags & KVM_IRQACKDATA_VECTOR_PENDING))
+ __clear_bit(pin, &vcpu->irq.pending);
break;
case kvm_irqpin_nmi:
/*
@@ -1501,8 +1498,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu,
(vcpu->interrupt_window_open &&
!kvm_vcpu_irq_pending(vcpu));
kvm_run->if_flag = (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF) != 0;
- kvm_run->cr8 = vcpu->cr8;
- kvm_run->apic_base = vcpu->apic_base;
+ kvm_run->cr8 = kvm_lapic_get_tpr(vcpu);
+ kvm_run->apic_base = kvm_lapic_get_base(vcpu);
}
/*
diff --git a/drivers/kvm/userint.c b/drivers/kvm/userint.c
index 08d26fa..16987e1 100644
--- a/drivers/kvm/userint.c
+++ b/drivers/kvm/userint.c
@@ -218,6 +218,12 @@ int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data)
int kvm_userint_init(struct kvm_vcpu *vcpu)
{
- return kvm_user_irqdev_init(&vcpu->irq.dev);
+ int ret;
+
+ ret = kvm_user_irqdev_init(&vcpu->irq.dev);
+ if (ret < 0)
+ return ret;
+
+ return kvm_lapic_init(vcpu, NULL, KVM_LAPIC_OPTION_USERMODE);
}
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 7b81fff..bee4831 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1083,10 +1083,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
memset(vcpu->regs, 0, sizeof(vcpu->regs));
vcpu->regs[VCPU_REGS_RDX] = get_rdx_init_val();
- vcpu->cr8 = 0;
- vcpu->apic_base = 0xfee00000 |
- /*for vcpu 0*/ MSR_IA32_APICBASE_BSP |
- MSR_IA32_APICBASE_ENABLE;
fx_init(vcpu);
@@ -1327,9 +1323,9 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
r = kvm_vcpu_irq_pop(vcpu, &ack);
break;
case kvm_irqpin_extint:
- printk(KERN_WARNING "KVM: external-interrupts not " \
- "handled yet\n");
- __clear_bit(pin, &vcpu->irq.pending);
+ r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq, 0, &ack);
+ if (!(ack.flags & KVM_IRQACKDATA_VECTOR_PENDING))
+ __clear_bit(pin, &vcpu->irq.pending);
break;
case kvm_irqpin_nmi:
/*
@@ -1690,7 +1686,7 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
return 1;
case 8:
vcpu_load_rsp_rip(vcpu);
- vcpu->regs[reg] = vcpu->cr8;
+ vcpu->regs[reg] = kvm_lapic_get_tpr(vcpu);
vcpu_put_rsp_rip(vcpu);
skip_emulated_instruction(vcpu);
return 1;
@@ -1787,8 +1783,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
kvm_run->if_flag = (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) != 0;
- kvm_run->cr8 = vcpu->cr8;
- kvm_run->apic_base = vcpu->apic_base;
+ kvm_run->cr8 = kvm_lapic_get_tpr(vcpu);
+ kvm_run->apic_base = kvm_lapic_get_base(vcpu);
kvm_run->ready_for_interrupt_injection =
(vcpu->interrupt_window_open &&
!kvm_vcpu_irq_pending(vcpu));
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index e6edca8..a83606b 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -231,6 +231,17 @@ struct kvm_dirty_log {
};
};
+/* for KVM_APIC */
+struct kvm_apic_msg {
+ /* in */
+ __u32 dest;
+ __u32 trig_mode;
+ __u32 dest_mode;
+ __u32 delivery_mode;
+ __u32 vector;
+ __u32 padding;
+};
+
struct kvm_cpuid_entry {
__u32 function;
__u32 eax;
@@ -282,6 +293,9 @@ struct kvm_signal_mask {
#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
#define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
+#define KVM_ENABLE_KERNEL_PIC _IOW(KVMIO, 0x44, __u32)
+#define KVM_ISA_INTERRUPT _IOW(KVMIO, 0x45, struct kvm_interrupt)
+#define KVM_APIC_MSG _IOW(KVMIO, 0x46, struct kvm_apic_msg)
/*
* ioctls for vcpu fds
@@ -300,5 +314,5 @@ struct kvm_signal_mask {
#define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask)
#define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu)
#define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu)
-
+#define KVM_APIC_RESET _IO(KVMIO, 0x8e)
#endif
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (4 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 5/8] KVM: Add support for in-kernel LAPIC model Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
[not found] ` <20070509030340.23443.84153.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features Gregory Haskins
` (2 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/vmx.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++----
drivers/kvm/vmx.h | 3 +++
2 files changed, 61 insertions(+), 5 deletions(-)
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index bee4831..1c99bc9 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1148,7 +1148,14 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
PIN_BASED_VM_EXEC_CONTROL,
PIN_BASED_EXT_INTR_MASK /* 20.6.1 */
| PIN_BASED_NMI_EXITING /* 20.6.1 */
+ | PIN_BASED_VIRTUAL_NMI /* 20.6.1 */
);
+
+ if (!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & PIN_BASED_VIRTUAL_NMI))
+ printk(KERN_WARNING "KVM: Warning - Host processor does " \
+ "not support virtual-NMI injection. Using IRQ " \
+ "method\n");
+
vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_HLT_EXITING /* 20.6.2 */
@@ -1297,6 +1304,43 @@ static void inject_rmode_irq(struct kvm_vcpu *vcpu, int irq)
vmcs_writel(GUEST_RSP, (vmcs_readl(GUEST_RSP) & ~0xffff) | (sp - 6));
}
+static int do_nmi_requests(struct kvm_vcpu *vcpu)
+{
+ int nmi_window = 0;
+
+ BUG_ON(!(test_bit(kvm_irqpin_nmi, &vcpu->irq.pending)));
+
+ nmi_window =
+ (((vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 0xb) == 0)
+ && (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD)
+ & INTR_INFO_VALID_MASK));
+
+ if (nmi_window) {
+ if (vcpu->rmode.active)
+ inject_rmode_irq(vcpu, 2);
+ else
+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+ 2 |
+ INTR_TYPE_NMI |
+ INTR_INFO_VALID_MASK);
+
+ __clear_bit(kvm_irqpin_nmi, &vcpu->irq.pending);
+ } else {
+ /*
+ * NMIs blocked. Wait for unblock.
+ */
+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cbvec |= CPU_BASED_NMI_EXITING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
+ }
+
+ /*
+ * nmi_window correctly reflects whether we handled this interrupt
+ * or not, so just return it as the "handled" indicator
+ */
+ return nmi_window;
+}
+
static int do_intr_requests(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run,
kvm_irqpin_t pin)
@@ -1329,9 +1373,11 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
break;
case kvm_irqpin_nmi:
/*
- * FIXME: Someday we will handle this using the
- * specific VMX NMI features. For now, just inject
- * the NMI as a standard interrupt on vector 2
+ * We should only get here if the processor does
+ * not support virtual NMIs. Inject the NMI as a
+ * standard interrupt on vector 2. The implication is
+ * that NMIs are going to be subject to RFLAGS.IF
+ * masking, unfortunately.
*/
ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
ack.vector = 2;
@@ -1374,7 +1420,8 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
static void clear_pending_controls(struct kvm_vcpu *vcpu)
{
u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
- cbvec &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+ cbvec &= ~(CPU_BASED_VIRTUAL_INTR_PENDING
+ | CPU_BASED_NMI_EXITING);
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
}
@@ -1391,7 +1438,6 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
switch (pin) {
case kvm_irqpin_localint:
case kvm_irqpin_extint:
- case kvm_irqpin_nmi:
do_intr_requests(vcpu, kvm_run, pin);
break;
case kvm_irqpin_smi:
@@ -1399,6 +1445,13 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
printk(KERN_WARNING "KVM: dropping unhandled SMI\n");
__clear_bit(pin, &vcpu->irq.pending);
break;
+ case kvm_irqpin_nmi:
+ if (vmcs_read32(PIN_BASED_VM_EXEC_CONTROL)
+ & PIN_BASED_VIRTUAL_NMI)
+ do_nmi_requests(vcpu);
+ else
+ do_intr_requests(vcpu, kvm_run, pin);
+ break;
case kvm_irqpin_invalid:
/* drop */
break;
diff --git a/drivers/kvm/vmx.h b/drivers/kvm/vmx.h
index d0dc93d..d3fe017 100644
--- a/drivers/kvm/vmx.h
+++ b/drivers/kvm/vmx.h
@@ -35,6 +35,7 @@
#define CPU_BASED_CR8_LOAD_EXITING 0x00080000
#define CPU_BASED_CR8_STORE_EXITING 0x00100000
#define CPU_BASED_TPR_SHADOW 0x00200000
+#define CPU_BASED_NMI_EXITING 0x00400000
#define CPU_BASED_MOV_DR_EXITING 0x00800000
#define CPU_BASED_UNCOND_IO_EXITING 0x01000000
#define CPU_BASED_ACTIVATE_IO_BITMAP 0x02000000
@@ -44,6 +45,7 @@
#define PIN_BASED_EXT_INTR_MASK 0x1
#define PIN_BASED_NMI_EXITING 0x8
+#define PIN_BASED_VIRTUAL_NMI 0x20
#define VM_EXIT_ACK_INTR_ON_EXIT 0x00008000
#define VM_EXIT_HOST_ADD_SPACE_SIZE 0x00000200
@@ -221,6 +223,7 @@ enum vmcs_field {
#define VECTORING_INFO_VALID_MASK INTR_INFO_VALID_MASK
#define INTR_TYPE_EXT_INTR (0 << 8) /* external interrupt */
+#define INTR_TYPE_NMI (2 << 8) /* non-maskable interrupt */
#define INTR_TYPE_EXCEPTION (3 << 8) /* processor exception */
/*
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (5 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
2007-05-09 3:03 ` [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors Gregory Haskins
2007-05-13 12:02 ` [PATCH 0/8] in-kernel APIC support "v1" Avi Kivity
8 siblings, 0 replies; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/irqdevice.h | 3 +++
drivers/kvm/kvm.h | 1 +
drivers/kvm/lapic.c | 15 +++++++++++++++
3 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/drivers/kvm/irqdevice.h b/drivers/kvm/irqdevice.h
index 097d179..173313d 100644
--- a/drivers/kvm/irqdevice.h
+++ b/drivers/kvm/irqdevice.h
@@ -45,12 +45,14 @@ struct kvm_irqsink {
#define KVM_IRQACKDATA_VECTOR_VALID (1 << 0)
#define KVM_IRQACKDATA_VECTOR_PENDING (1 << 1)
+#define KVM_IRQACKDATA_NEXT_VALID (1 << 2)
#define KVM_IRQACK_FLAG_PEEK (1 << 0)
struct kvm_irqack_data {
int flags;
int vector;
+ int next;
};
struct kvm_irqdevice {
@@ -92,6 +94,7 @@ static inline void kvm_irqdevice_init(struct kvm_irqdevice *dev)
* data.flags -
* [KVM_IRQACKDATA_VECTOR_VALID - data.vector is valid]
* [KVM_IRQACKDATA_VECTOR_PENDING - more vectors are pending]
+ * [KVM_IRQACKDATA_NEXT_VALID - next-vector is valid]
*
* Returns: (int)
* [-1 = failure]
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 60710d8..4ae616f 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -178,6 +178,7 @@ void kvm_lapic_save(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
void kvm_lapic_restore(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
void kvm_lapic_reset(struct kvm_vcpu *vcpu);
int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
+void *kvm_lapic_get_regs(struct kvm_vcpu *vcpu);
/*
* x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
diff --git a/drivers/kvm/lapic.c b/drivers/kvm/lapic.c
index f7b04f9..da51710 100644
--- a/drivers/kvm/lapic.c
+++ b/drivers/kvm/lapic.c
@@ -1140,6 +1140,13 @@ int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
return ret;
}
+void *kvm_lapic_get_regs(struct kvm_vcpu *vcpu)
+{
+ struct kvm_kern_apic *apic = (struct kvm_kern_apic*)vcpu->apic.dev;
+ return apic->regs;
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_get_regs);
+
/*
*----------------------------------------------------------------------
* timer interface
@@ -1278,6 +1285,14 @@ static int apic_irqdev_ack(struct kvm_irqdevice *this, int flags,
*/
if (irq > apic_get_reg(apic, APIC_TASKPRI))
data->flags |= KVM_IRQACKDATA_VECTOR_PENDING;
+
+ /*
+ * We report the next pending vector here so that the system
+ * can asses TPR thresholds for TPR-shadowing purposes
+ * (if applicable)
+ */
+ data->next = irq;
+ data->flags |= KVM_IRQACKDATA_NEXT_VALID;
}
out:
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (6 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features Gregory Haskins
@ 2007-05-09 3:03 ` Gregory Haskins
[not found] ` <20070509030350.23443.35387.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-13 12:02 ` [PATCH 0/8] in-kernel APIC support "v1" Avi Kivity
8 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 3:03 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---
drivers/kvm/vmx.c | 32 ++++++++++++++++++++++++++------
1 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 1c99bc9..7745bb9 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1159,13 +1159,26 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_HLT_EXITING /* 20.6.2 */
- | CPU_BASED_CR8_LOAD_EXITING /* 20.6.2 */
- | CPU_BASED_CR8_STORE_EXITING /* 20.6.2 */
+ | CPU_BASED_TPR_SHADOW /* 20.6.2 */
| CPU_BASED_ACTIVATE_IO_BITMAP /* 20.6.2 */
| CPU_BASED_MOV_DR_EXITING
| CPU_BASED_USE_TSC_OFFSETING /* 21.3 */
);
+ if (!(vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & CPU_BASED_TPR_SHADOW)) {
+ u32 cbvec;
+
+ cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cbvec |= CPU_BASED_CR8_LOAD_EXITING; /* 20.6.2 */
+ cbvec |= CPU_BASED_CR8_STORE_EXITING; /* 20.6.2 */
+ vmcs_write32_fixedbits(MSR_IA32_VMX_PROCBASED_CTLS,
+ CPU_BASED_VM_EXEC_CONTROL,
+ cbvec);
+
+ printk(KERN_WARNING "KVM: Warning - Host processor does " \
+ "not support TPR-shadow\n");
+ }
+
vmcs_write32(EXCEPTION_BITMAP, 1 << PF_VECTOR);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0);
@@ -1239,7 +1252,7 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */
#ifdef CONFIG_X86_64
- vmcs_writel(VIRTUAL_APIC_PAGE_ADDR, 0);
+ vmcs_writel(VIRTUAL_APIC_PAGE_ADDR, kvm_lapic_get_regs(vcpu));
vmcs_writel(TPR_THRESHOLD, 0);
#endif
@@ -1346,6 +1359,9 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
kvm_irqpin_t pin)
{
int handled = 0;
+ struct kvm_irqack_data ack;
+
+ memset(&ack, 0, sizeof(ack));
vcpu->interrupt_window_open =
((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
@@ -1357,11 +1373,8 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
* If interrupts enabled, and not blocked by sti or mov ss.
* Good.
*/
- struct kvm_irqack_data ack;
int r = 0;
- memset(&ack, 0, sizeof(ack));
-
switch (pin) {
case kvm_irqpin_localint:
r = kvm_vcpu_irq_pop(vcpu, &ack);
@@ -1414,6 +1427,13 @@ static int do_intr_requests(struct kvm_vcpu *vcpu,
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
}
+#ifdef CONFIG_X86_64
+ if (ack.flags & KVM_IRQACKDATA_NEXT_VALID)
+ vmcs_write32(TPR_THRESHOLD, ack.next >> 4);
+ else
+ vmcs_write32(TPR_THRESHOLD, 0);
+#endif
+
return handled;
}
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/8] KVM: Adds support for in-kernel mmiohandlers
[not found] ` <20070509030315.23443.93779.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-09 9:51 ` Dor Laor
0 siblings, 0 replies; 30+ messages in thread
From: Dor Laor @ 2007-05-09 9:51 UTC (permalink / raw)
To: Gregory Haskins, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>+static inline void kvm_iodevice_read(struct kvm_io_device *dev,
>+ gpa_t addr,
>+ int len,
>+ void *val)
>+{
>+ dev->read(dev, addr, len, val);
>+}
>+
>+static inline void kvm_iodevice_write(struct kvm_io_device *dev,
>+ gpa_t addr,
>+ int len,
>+ const void *val)
>+{
>+ dev->write(dev, addr, len, val);
>+}
>+
>+static inline int kvm_iodevice_inrange(struct kvm_io_device *dev,
gpa_t
>addr)
>+{
>+ return dev->in_range(dev, addr);
>+}
What's the motivation of the above wrappers?
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <20070509030320.23443.51197.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-09 15:16 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6157-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Dor Laor @ 2007-05-09 15:16 UTC (permalink / raw)
To: Gregory Haskins, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>-static inline void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
>-{
>- struct vmcb_control_area *control;
>-
>- control = &vcpu->svm->vmcb->control;
>- control->int_vector = pop_irq(vcpu);
>- control->int_ctl &= ~V_INTR_PRIO_MASK;
>- control->int_ctl |= V_IRQ_MASK |
>- ((/*control->int_vector >> 4*/ 0xf) <<
V_INTR_PRIO_SHIFT);
>-}
>-
Keep up the good work, looks like its converging.
BTW, what types of VMs are running with your apic?
I have some comments below:
> static void kvm_reput_irq(struct kvm_vcpu *vcpu)
> {
> struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
>
> if (control->int_ctl & V_IRQ_MASK) {
> control->int_ctl &= ~V_IRQ_MASK;
>- push_irq(vcpu, control->int_vector);
>+ kvm_vcpu_irq_push(vcpu, control->int_vector);
> }
>
> vcpu->interrupt_window_open =
> !(control->int_state & SVM_INTERRUPT_SHADOW_MASK);
> }
>
>-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>- struct kvm_run *kvm_run)
>+static int do_intr_requests(struct kvm_vcpu *vcpu,
>+ struct kvm_run *kvm_run,
>+ kvm_irqpin_t pin)
> {
> struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
>+ int handled = 0;
>
> vcpu->interrupt_window_open =
> (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
> (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF));
>
>- if (vcpu->interrupt_window_open && vcpu->irq_summary)
>+ if (vcpu->interrupt_window_open) {
> /*
>- * If interrupts enabled, and not blocked by sti or mov
ss.
>Good.
>+ * If interrupts enabled, and not blocked by sti or mov
ss.
>+ * Good.
> */
>- kvm_do_inject_irq(vcpu);
>+ struct kvm_irqack_data ack;
>+ int r = 0;
>+
>+ memset(&ack, 0, sizeof(ack));
>+
>+ switch (pin) {
>+ case kvm_irqpin_localint:
>+ r = kvm_vcpu_irq_pop(vcpu, &ack);
>+ break;
>+ case kvm_irqpin_extint:
>+ printk(KERN_WARNING "KVM: external-interrupts
not " \
>+ "handled yet\n");
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ case kvm_irqpin_nmi:
>+ /*
>+ * FIXME: Someday we will handle this using the
>+ * specific SVN NMI features. For now, just
inject
>+ * the NMI as a standard interrupt on vector 2
>+ */
>+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
>+ ack.vector = 2;
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ default:
>+ panic("KVM: unknown interrupt pin raised: %d\n",
pin);
>+ break;
>+ }
>+
>+ BUG_ON(r < 0);
>+
The above code should be arch-generic.
>+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
>+ control = &vcpu->svm->vmcb->control;
>+ control->int_vector = ack.vector;
>+ control->int_ctl &= ~V_INTR_PRIO_MASK;
>+ control->int_ctl |= V_IRQ_MASK |
>+ ((/*control->int_vector >> 4*/ 0xf) <<
>+ V_INTR_PRIO_SHIFT);
>+
>+ handled = 1;
>+ }
>+ }
>
> /*
> * Interrupts blocked. Wait for unblock.
> */
> if (!vcpu->interrupt_window_open &&
>- (vcpu->irq_summary || kvm_run->request_interrupt_window)) {
>+ (__kvm_vcpu_irq_pending(vcpu) ||
>+ kvm_run->request_interrupt_window))
> control->intercept |= 1ULL << INTERCEPT_VINTR;
>- } else
>- control->intercept &= ~(1ULL << INTERCEPT_VINTR);
>+
>+ return handled;
>+}
>+
>+static void clear_pending_controls(struct kvm_vcpu *vcpu)
>+{
>+ struct vmcb_control_area *control = &vcpu->svm->vmcb->control;
>+
>+ control->intercept &= ~(1ULL << INTERCEPT_VINTR);
>+}
>+
IMHO I think that do_interrupt_requests and do_intr_requests can be
united into one. The switch(pin) in both of them is unnatural.
>+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>+ struct kvm_run *kvm_run)
>+{
>+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
>+
>+ clear_pending_controls(vcpu);
>+
>+ while (pending) {
>+ kvm_irqpin_t pin = __fls(pending);
>+
>+ switch (pin) {
>+ case kvm_irqpin_localint:
>+ case kvm_irqpin_extint:
>+ case kvm_irqpin_nmi:
>+ do_intr_requests(vcpu, kvm_run, pin);
>+ break;
>+ case kvm_irqpin_smi:
>+ /* ignored (for now) */
>+ printk(KERN_WARNING "KVM: dropping unhandled
SMI\n");
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ case kvm_irqpin_invalid:
>+ /* drop */
>+ break;
>+ default:
>+ panic("KVM: unknown interrupt pin raised: %d\n",
pin);
>+ break;
>+ }
>+
>+ __clear_bit(pin, &pending);
>+ }
> }
Seems like you can inject several irq at once using the above while
loop, but you only do one push in case external interrupt got in the way
and prevented the injection.
>
> static void post_kvm_run_save(struct kvm_vcpu *vcpu,
> struct kvm_run *kvm_run)
> {
>- kvm_run->ready_for_interrupt_injection =
(vcpu->interrupt_window_open
>&&
>- vcpu->irq_summary ==
0);
>+ kvm_run->ready_for_interrupt_injection =
>+ (vcpu->interrupt_window_open &&
>+ !kvm_vcpu_irq_pending(vcpu));
> kvm_run->if_flag = (vcpu->svm->vmcb->save.rflags &
X86_EFLAGS_IF) !=
>0;
> kvm_run->cr8 = vcpu->cr8;
> kvm_run->apic_base = vcpu->apic_base;
>@@ -1452,7 +1514,7 @@ static void post_kvm_run_save(struct kvm_vcpu
*vcpu,
> static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu,
> struct kvm_run *kvm_run)
> {
>- return (!vcpu->irq_summary &&
>+ return (!kvm_vcpu_irq_pending(vcpu) &&
> kvm_run->request_interrupt_window &&
> vcpu->interrupt_window_open &&
> (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF));
>@@ -1482,9 +1544,17 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu,
>struct kvm_run *kvm_run)
> int r;
>
> again:
>+ spin_lock(&vcpu->irq.lock);
>+
>+ /*
>+ * We must inject interrupts (if any) while the irq_lock
>+ * is held
>+ */
> if (!vcpu->mmio_read_completed)
> do_interrupt_requests(vcpu, kvm_run);
>
>+ spin_unlock(&vcpu->irq.lock);
>+
> clgi();
>
> pre_svm_run(vcpu);
>diff --git a/drivers/kvm/userint.c b/drivers/kvm/userint.c
>new file mode 100644
>index 0000000..08d26fa
>--- /dev/null
>+++ b/drivers/kvm/userint.c
>@@ -0,0 +1,223 @@
>+/*
>+ * User Interrupts IRQ device
>+ *
>+ * This acts as an extention of an interrupt controller that exists
>elsewhere
>+ * (typically in userspace/QEMU). Because this PIC is a pseudo device
>that
>+ * is downstream from a real emulated PIC, the "IRQ-to-vector" mapping
has
>+ * already occured. Therefore, this PIC has the following unusal
>properties:
>+ *
>+ * 1) It has 256 "pins" which are literal vectors (i.e. no
translation)
>+ * 2) It only supports "auto-EOI" behavior since it is expected that
the
>+ * upstream emulated PIC will handle the real EOIs (if applicable)
>+ * 3) It only listens to "asserts" on the pins (deasserts are dropped)
>+ * because its an auto-EOI device anyway.
>+ *
>+ * Copyright (C) 2007 Novell
>+ *
>+ * bitarray code based on original vcpu->irq_pending code,
>+ * Copyright (C) 2007 Qumranet
>+ *
>+ * Authors:
>+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
>+ *
>+ * This work is licensed under the terms of the GNU GPL, version 2.
See
>+ * the COPYING file in the top-level directory.
>+ *
>+ */
>+
>+#include "kvm.h"
>+
>+/*
>+
*----------------------------------------------------------------------
>+ * optimized bitarray object - works like bitarrays in bitops, but
uses
>+ * a summary field to accelerate lookups. Assumes external locking
>+
*---------------------------------------------------------------------
>+ */
>+
>+struct bitarray {
>+ unsigned long summary; /* 1 per word in pending */
>+ unsigned long pending[NR_IRQ_WORDS];
>+};
>+
>+static inline int bitarray_pending(struct bitarray *this)
>+{
>+ return this->summary ? 1 : 0;
>+}
>+
>+static inline int bitarray_findhighest(struct bitarray *this)
>+{
>+ if (!this->summary)
>+ return -1;
>+ else {
No need for else, simpler.
>+ int word_index = __fls(this->summary);
>+ int bit_index = __fls(this->pending[word_index]);
>+
>+ return word_index * BITS_PER_LONG + bit_index;
>+ }
>+}
>+
>+static inline void bitarray_set(struct bitarray *this, int nr)
>+{
>+ __set_bit(nr, &this->pending);
>+ __set_bit(nr / BITS_PER_LONG, &this->summary);
>+}
>+
>+static inline void bitarray_clear(struct bitarray *this, int nr)
>+{
>+ int word = nr / BITS_PER_LONG;
>+
>+ __clear_bit(nr, &this->pending);
>+ if (!this->pending[word])
>+ __clear_bit(word, &this->summary);
>+}
>+
>+static inline int bitarray_test(struct bitarray *this, int nr)
>+{
>+ return test_bit(nr, &this->pending);
>+}
>+
>+static inline int bitarray_test_and_set(struct bitarray *this, int nr,
int
>val)
>+{
>+ if (bitarray_test(this, nr) != val) {
>+ if (val)
>+ bitarray_set(this, nr);
>+ else
>+ bitarray_clear(this, nr);
>+ return 1;
>+ }
>+
>+ return 0;
>+}
>+
>+/*
>+
*----------------------------------------------------------------------
>+ * userint interface - provides the actual kvm_irqdevice
implementation
>+
*---------------------------------------------------------------------
>+ */
>+
>+struct kvm_user_irqdev {
>+ spinlock_t lock;
>+ atomic_t ref_count;
>+ struct bitarray pending;
>+};
>+
>+static int user_irqdev_ack(struct kvm_irqdevice *this, int flags,
>+ struct kvm_irqack_data *data)
>+{
>+ struct kvm_user_irqdev *s = (struct
kvm_user_irqdev*)this->private;
>+
>+ spin_lock(&s->lock);
>+
>+ if (!(flags & KVM_IRQACK_FLAG_PEEK)) {
>+ int irq = bitarray_findhighest(&s->pending);
>+
>+ if (irq > -1) {
>+ /*
>+ * Automatically clear the interrupt as the EOI
>+ * mechanism (if any) will take place in
userspace
>+ */
>+ bitarray_clear(&s->pending, irq);
>+
>+ data->flags |= KVM_IRQACKDATA_VECTOR_VALID;
>+ }
>+
>+ data->vector = irq;
>+ }
>+
>+ if (bitarray_pending(&s->pending))
>+ data->flags |= KVM_IRQACKDATA_VECTOR_PENDING;
>+
>+ spin_unlock(&s->lock);
>+
>+ return 0;
>+}
>+
>+static int user_irqdev_set_pin(struct kvm_irqdevice *this, int irq,
int
>level)
>+{
>+ struct kvm_user_irqdev *s = (struct
kvm_user_irqdev*)this->private;
>+ int forward = 0;
>+
>+ spin_lock(&s->lock);
>+ forward = bitarray_test_and_set(&s->pending, irq, level);
>+ spin_unlock(&s->lock);
>+
>+ /*
>+ * alert the higher layer software we have changes
>+ */
>+ if (forward)
>+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
>+
>+ return 0;
>+}
>+
>+static void user_irqdev_destructor(struct kvm_irqdevice *this)
>+{
>+ struct kvm_user_irqdev *s = (struct
kvm_user_irqdev*)this->private;
>+
>+ if (atomic_dec_and_test(&s->ref_count))
>+ kfree(s);
>+}
>+
>+int kvm_user_irqdev_init(struct kvm_irqdevice *irqdev)
>+{
>+ struct kvm_user_irqdev *s;
>+
>+ s = kzalloc(sizeof(*s), GFP_KERNEL);
>+ if (!s)
>+ return -ENOMEM;
>+
>+ spin_lock_init(&s->lock);
>+
>+ irqdev->ack = user_irqdev_ack;
>+ irqdev->set_pin = user_irqdev_set_pin;
>+ irqdev->destructor = user_irqdev_destructor;
>+
>+ irqdev->private = s;
>+ atomic_inc(&s->ref_count);
>+
>+ return 0;
>+}
>+
>+int kvm_user_irqdev_save(struct kvm_irqdevice *this, void *data)
>+{
>+ struct kvm_user_irqdev *s = (struct
kvm_user_irqdev*)this->private;
>+
>+ spin_lock(&s->lock);
>+ memcpy(data, s->pending.pending, sizeof s->pending.pending);
>+ spin_unlock(&s->lock);
>+
>+ return 0;
>+}
>+
>+int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data)
>+{
>+ struct kvm_user_irqdev *s = (struct
kvm_user_irqdev*)this->private;
>+ int i;
>+ int forward = 0;
>+
>+ spin_lock(&s->lock);
>+
>+ /*
>+ * walk the interrupt-bitmap and inject an IRQ for each bit
found
>+ */
>+ for (i = 0; i < 256; ++i) {
>+ int val = test_bit(i, data);
>+ forward = bitarray_test_and_set(&s->pending, i, val);
>+ }
>+
You run over forward each loop but below you use it.
>+ spin_unlock(&s->lock);
>+
>+ /*
>+ * alert the higher layer software we have changes
>+ */
>+ if (forward)
>+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
>+
>+ return 0;
>+}
>+
>+int kvm_userint_init(struct kvm_vcpu *vcpu)
>+{
>+ return kvm_user_irqdev_init(&vcpu->irq.dev);
>+}
>+
>diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
>index 19edb34..ca858cb 100644
>--- a/drivers/kvm/vmx.c
>+++ b/drivers/kvm/vmx.c
>@@ -1301,52 +1301,118 @@ static void inject_rmode_irq(struct kvm_vcpu
>*vcpu, int irq)
> vmcs_writel(GUEST_RSP, (vmcs_readl(GUEST_RSP) & ~0xffff) | (sp -
6));
> }
>
>-static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
>+static int do_intr_requests(struct kvm_vcpu *vcpu,
>+ struct kvm_run *kvm_run,
>+ kvm_irqpin_t pin)
> {
>- int word_index = __ffs(vcpu->irq_summary);
>- int bit_index = __ffs(vcpu->irq_pending[word_index]);
>- int irq = word_index * BITS_PER_LONG + bit_index;
>-
>- clear_bit(bit_index, &vcpu->irq_pending[word_index]);
>- if (!vcpu->irq_pending[word_index])
>- clear_bit(word_index, &vcpu->irq_summary);
>-
>- if (vcpu->rmode.active) {
>- inject_rmode_irq(vcpu, irq);
>- return;
>- }
>- vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>- irq | INTR_TYPE_EXT_INTR |
INTR_INFO_VALID_MASK);
>-}
>-
>-
>-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>- struct kvm_run *kvm_run)
>-{
>- u32 cpu_based_vm_exec_control;
>+ int handled = 0;
>
> vcpu->interrupt_window_open =
> ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
> (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
>
> if (vcpu->interrupt_window_open &&
>- vcpu->irq_summary &&
>- !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
INTR_INFO_VALID_MASK))
>+ !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
INTR_INFO_VALID_MASK))
>{
> /*
>- * If interrupts enabled, and not blocked by sti or mov
ss.
>Good.
>+ * If interrupts enabled, and not blocked by sti or mov
ss.
>+ * Good.
> */
>- kvm_do_inject_irq(vcpu);
>+ struct kvm_irqack_data ack;
>+ int r = 0;
>+
>+ memset(&ack, 0, sizeof(ack));
>+
>+ switch (pin) {
>+ case kvm_irqpin_localint:
>+ r = kvm_vcpu_irq_pop(vcpu, &ack);
>+ break;
>+ case kvm_irqpin_extint:
>+ printk(KERN_WARNING "KVM: external-interrupts
not " \
>+ "handled yet\n");
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ case kvm_irqpin_nmi:
>+ /*
>+ * FIXME: Someday we will handle this using the
>+ * specific VMX NMI features. For now, just
inject
>+ * the NMI as a standard interrupt on vector 2
>+ */
>+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
>+ ack.vector = 2;
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ default:
>+ panic("KVM: unknown interrupt pin raised: %d\n",
pin);
>+ break;
>+ }
>+
>+ BUG_ON(r < 0);
>+
>+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
>+ if (vcpu->rmode.active)
>+ inject_rmode_irq(vcpu, ack.vector);
>+ else
>+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>+ ack.vector |
>+ INTR_TYPE_EXT_INTR |
>+ INTR_INFO_VALID_MASK);
>+
>+ handled = 1;
>+ }
>+ }
>
>- cpu_based_vm_exec_control =
vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> if (!vcpu->interrupt_window_open &&
>- (vcpu->irq_summary || kvm_run->request_interrupt_window))
>+ (__kvm_vcpu_irq_pending(vcpu) ||
>+ kvm_run->request_interrupt_window)) {
> /*
> * Interrupts blocked. Wait for unblock.
> */
>- cpu_based_vm_exec_control |=
CPU_BASED_VIRTUAL_INTR_PENDING;
>- else
>- cpu_based_vm_exec_control &=
~CPU_BASED_VIRTUAL_INTR_PENDING;
>- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
cpu_based_vm_exec_control);
>+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>+ cbvec |= CPU_BASED_VIRTUAL_INTR_PENDING;
>+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
>+ }
>+
>+ return handled;
>+}
>+
>+static void clear_pending_controls(struct kvm_vcpu *vcpu)
>+{
>+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>+ cbvec &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
>+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
>+}
>+
>+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>+ struct kvm_run *kvm_run)
>+{
>+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
>+
>+ clear_pending_controls(vcpu);
>+
>+ while (pending) {
>+ kvm_irqpin_t pin = __fls(pending);
>+
>+ switch (pin) {
>+ case kvm_irqpin_localint:
>+ case kvm_irqpin_extint:
>+ case kvm_irqpin_nmi:
>+ do_intr_requests(vcpu, kvm_run, pin);
>+ break;
>+ case kvm_irqpin_smi:
>+ /* ignored (for now) */
>+ printk(KERN_WARNING "KVM: dropping unhandled
SMI\n");
>+ __clear_bit(pin, &vcpu->irq.pending);
>+ break;
>+ case kvm_irqpin_invalid:
>+ /* drop */
>+ break;
>+ default:
>+ panic("KVM: unknown interrupt pin raised: %d\n",
pin);
>+ break;
>+ }
>+
>+ __clear_bit(pin, &pending);
>+ }
> }
Now the do_interrupt_requests function is totally arch-generalized.
I think that do_intr_requests should be generalized too.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6157-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
@ 2007-05-09 18:04 ` Gregory Haskins
[not found] ` <4641D4D8.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 18:04 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Dor Laor
>>> On Wed, May 9, 2007 at 11:16 AM, in message
<64F9B87B6B770947A9F8391472E032160BBA6157-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>,
"Dor Laor" <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>>- static inline void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
>>-{
>>- struct vmcb_control_area *control;
>>-
>>- control = &vcpu- >svm- >vmcb- >control;
>>- control- >int_vector = pop_irq(vcpu);
>>- control- >int_ctl &= ~V_INTR_PRIO_MASK;
>>- control- >int_ctl |= V_IRQ_MASK |
>>- ((/*control- >int_vector >> 4*/ 0xf) <<
> V_INTR_PRIO_SHIFT);
>>-}
>>-
> Keep up the good work, looks like its converging.
> BTW, what types of VMs are running with your apic?
>
> I have some comments below:
>
>
>> static void kvm_reput_irq(struct kvm_vcpu *vcpu)
>> {
>> struct vmcb_control_area *control = &vcpu- >svm- >vmcb- >control;
>>
>> if (control- >int_ctl & V_IRQ_MASK) {
>> control- >int_ctl &= ~V_IRQ_MASK;
>>- push_irq(vcpu, control- >int_vector);
>>+ kvm_vcpu_irq_push(vcpu, control- >int_vector);
>> }
>>
>> vcpu- >interrupt_window_open =
>> !(control- >int_state & SVM_INTERRUPT_SHADOW_MASK);
>> }
>>
>>- static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>>- struct kvm_run *kvm_run)
>>+static int do_intr_requests(struct kvm_vcpu *vcpu,
>>+ struct kvm_run *kvm_run,
>>+ kvm_irqpin_t pin)
>> {
>> struct vmcb_control_area *control = &vcpu- >svm- >vmcb- >control;
>>+ int handled = 0;
>>
>> vcpu- >interrupt_window_open =
>> (!(control- >int_state & SVM_INTERRUPT_SHADOW_MASK) &&
>> (vcpu- >svm- >vmcb- >save.rflags & X86_EFLAGS_IF));
>>
>>- if (vcpu- >interrupt_window_open && vcpu- >irq_summary)
>>+ if (vcpu- >interrupt_window_open) {
>> /*
>>- * If interrupts enabled, and not blocked by sti or mov
> ss.
>>Good.
>>+ * If interrupts enabled, and not blocked by sti or mov
> ss.
>>+ * Good.
>> */
>>- kvm_do_inject_irq(vcpu);
>>+ struct kvm_irqack_data ack;
>>+ int r = 0;
>>+
>>+ memset(&ack, 0, sizeof(ack));
>>+
>>+ switch (pin) {
>>+ case kvm_irqpin_localint:
>>+ r = kvm_vcpu_irq_pop(vcpu, &ack);
>>+ break;
>>+ case kvm_irqpin_extint:
>>+ printk(KERN_WARNING "KVM: external- interrupts
> not " \
>>+ "handled yet\n");
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ case kvm_irqpin_nmi:
>>+ /*
>>+ * FIXME: Someday we will handle this using the
>>+ * specific SVN NMI features. For now, just
> inject
>>+ * the NMI as a standard interrupt on vector 2
>>+ */
>>+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
>>+ ack.vector = 2;
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ default:
>>+ panic("KVM: unknown interrupt pin raised: %d\n",
> pin);
>>+ break;
>>+ }
>>+
>>+ BUG_ON(r < 0);
>>+
>
> The above code should be arch- generic.
>
>>+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
>>+ control = &vcpu- >svm- >vmcb- >control;
>>+ control- >int_vector = ack.vector;
>>+ control- >int_ctl &= ~V_INTR_PRIO_MASK;
>>+ control- >int_ctl |= V_IRQ_MASK |
>>+ ((/*control- >int_vector >> 4*/ 0xf) <<
>>+ V_INTR_PRIO_SHIFT);
>>+
>>+ handled = 1;
>>+ }
>>+ }
>>
>> /*
>> * Interrupts blocked. Wait for unblock.
>> */
>> if (!vcpu- >interrupt_window_open &&
>>- (vcpu- >irq_summary || kvm_run- >request_interrupt_window)) {
>>+ (__kvm_vcpu_irq_pending(vcpu) ||
>>+ kvm_run- >request_interrupt_window))
>> control- >intercept |= 1ULL << INTERCEPT_VINTR;
>>- } else
>>- control- >intercept &= ~(1ULL << INTERCEPT_VINTR);
>>+
>>+ return handled;
>>+}
>>+
>>+static void clear_pending_controls(struct kvm_vcpu *vcpu)
>>+{
>>+ struct vmcb_control_area *control = &vcpu- >svm- >vmcb- >control;
>>+
>>+ control- >intercept &= ~(1ULL << INTERCEPT_VINTR);
>>+}
>>+
>
> IMHO I think that do_interrupt_requests and do_intr_requests can be
> united into one. The switch(pin) in both of them is unnatural.
I think this will start to make more sense when you get further down my patch queue. This patch is just laying the groundwork, and therefore it seems unnecessarily convoluted. Take a look at my vmx-nmi.patch, for instance.
>
>>+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>>+ struct kvm_run *kvm_run)
>>+{
>>+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
>>+
>>+ clear_pending_controls(vcpu);
>>+
>>+ while (pending) {
>>+ kvm_irqpin_t pin = __fls(pending);
>>+
>>+ switch (pin) {
>>+ case kvm_irqpin_localint:
>>+ case kvm_irqpin_extint:
>>+ case kvm_irqpin_nmi:
>>+ do_intr_requests(vcpu, kvm_run, pin);
>>+ break;
>>+ case kvm_irqpin_smi:
>>+ /* ignored (for now) */
>>+ printk(KERN_WARNING "KVM: dropping unhandled
> SMI\n");
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ case kvm_irqpin_invalid:
>>+ /* drop */
>>+ break;
>>+ default:
>>+ panic("KVM: unknown interrupt pin raised: %d\n",
> pin);
>>+ break;
>>+ }
>>+
>>+ __clear_bit(pin, &pending);
>>+ }
>> }
>
>
> Seems like you can inject several irq at once using the above while
> loop, but you only do one push in case external interrupt got in the way
> and prevented the injection.
I didn't quite understand what you were getting at with the comments about the external interrupts getting in the way, but I think the gist of your comment is "is this broken to push more than one interrupt?"
If so, the answer is that we only ever push one interrupt at a time, but we run through each pending pin to give each handler (do_intr_requests, and later, do_nmi_requests/do_smi_requests) an opportunity to update the PENDING/WINDOW type flags.
For instance (this is based on the entire patch series, which includes the NMI work) if both an NMI and localint are pending, the NMI will get injected, and the localint will set the IRQ_WINDOW_EXITING feature.
That is the intention, anyway. I understand VMX much better than SVN, so theres a good chance I flubbed this up. ;) Let me know if you see anything that differs from what i described.
>
>>
>> static void post_kvm_run_save(struct kvm_vcpu *vcpu,
>> struct kvm_run *kvm_run)
>> {
>>- kvm_run- >ready_for_interrupt_injection =
> (vcpu- >interrupt_window_open
>>&&
>>- vcpu- >irq_summary ==
> 0);
>>+ kvm_run- >ready_for_interrupt_injection =
>>+ (vcpu- >interrupt_window_open &&
>>+ !kvm_vcpu_irq_pending(vcpu));
>> kvm_run- >if_flag = (vcpu- >svm- >vmcb- >save.rflags &
> X86_EFLAGS_IF) !=
>>0;
>> kvm_run- >cr8 = vcpu- >cr8;
>> kvm_run- >apic_base = vcpu- >apic_base;
>>@@ - 1452,7 +1514,7 @@ static void post_kvm_run_save(struct kvm_vcpu
> *vcpu,
>> static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu,
>> struct kvm_run *kvm_run)
>> {
>>- return (!vcpu- >irq_summary &&
>>+ return (!kvm_vcpu_irq_pending(vcpu) &&
>> kvm_run- >request_interrupt_window &&
>> vcpu- >interrupt_window_open &&
>> (vcpu- >svm- >vmcb- >save.rflags & X86_EFLAGS_IF));
>>@@ - 1482,9 +1544,17 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu,
>>struct kvm_run *kvm_run)
>> int r;
>>
>> again:
>>+ spin_lock(&vcpu- >irq.lock);
>>+
>>+ /*
>>+ * We must inject interrupts (if any) while the irq_lock
>>+ * is held
>>+ */
>> if (!vcpu- >mmio_read_completed)
>> do_interrupt_requests(vcpu, kvm_run);
>>
>>+ spin_unlock(&vcpu- >irq.lock);
>>+
>> clgi();
>>
>> pre_svm_run(vcpu);
>>diff -- git a/drivers/kvm/userint.c b/drivers/kvm/userint.c
>>new file mode 100644
>>index 0000000..08d26fa
>>--- /dev/null
>>+++ b/drivers/kvm/userint.c
>>@@ - 0,0 +1,223 @@
>>+/*
>>+ * User Interrupts IRQ device
>>+ *
>>+ * This acts as an extention of an interrupt controller that exists
>>elsewhere
>>+ * (typically in userspace/QEMU). Because this PIC is a pseudo device
>>that
>>+ * is downstream from a real emulated PIC, the "IRQ- to- vector" mapping
> has
>>+ * already occured. Therefore, this PIC has the following unusal
>>properties:
>>+ *
>>+ * 1) It has 256 "pins" which are literal vectors (i.e. no
> translation)
>>+ * 2) It only supports "auto- EOI" behavior since it is expected that
> the
>>+ * upstream emulated PIC will handle the real EOIs (if applicable)
>>+ * 3) It only listens to "asserts" on the pins (deasserts are dropped)
>>+ * because its an auto- EOI device anyway.
>>+ *
>>+ * Copyright (C) 2007 Novell
>>+ *
>>+ * bitarray code based on original vcpu- >irq_pending code,
>>+ * Copyright (C) 2007 Qumranet
>>+ *
>>+ * Authors:
>>+ * Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
>>+ *
>>+ * This work is licensed under the terms of the GNU GPL, version 2.
> See
>>+ * the COPYING file in the top- level directory.
>>+ *
>>+ */
>>+
>>+#include "kvm.h"
>>+
>>+/*
>>+
> *----------------------------------------------------------------------
>>+ * optimized bitarray object - works like bitarrays in bitops, but
> uses
>>+ * a summary field to accelerate lookups. Assumes external locking
>>+
> *---------------------------------------------------------------------
>>+ */
>>+
>>+struct bitarray {
>>+ unsigned long summary; /* 1 per word in pending */
>>+ unsigned long pending[NR_IRQ_WORDS];
>>+};
>>+
>>+static inline int bitarray_pending(struct bitarray *this)
>>+{
>>+ return this- >summary ? 1 : 0;
>>+}
>>+
>>+static inline int bitarray_findhighest(struct bitarray *this)
>>+{
>>+ if (!this- >summary)
>>+ return - 1;
>>+ else {
>
> No need for else, simpler.
>
>>+ int word_index = __fls(this- >summary);
>>+ int bit_index = __fls(this- >pending[word_index]);
>>+
>>+ return word_index * BITS_PER_LONG + bit_index;
>>+ }
>>+}
>>+
>>+static inline void bitarray_set(struct bitarray *this, int nr)
>>+{
>>+ __set_bit(nr, &this- >pending);
>>+ __set_bit(nr / BITS_PER_LONG, &this- >summary);
>>+}
>>+
>>+static inline void bitarray_clear(struct bitarray *this, int nr)
>>+{
>>+ int word = nr / BITS_PER_LONG;
>>+
>>+ __clear_bit(nr, &this- >pending);
>>+ if (!this- >pending[word])
>>+ __clear_bit(word, &this- >summary);
>>+}
>>+
>>+static inline int bitarray_test(struct bitarray *this, int nr)
>>+{
>>+ return test_bit(nr, &this- >pending);
>>+}
>>+
>>+static inline int bitarray_test_and_set(struct bitarray *this, int nr,
> int
>>val)
>>+{
>>+ if (bitarray_test(this, nr) != val) {
>>+ if (val)
>>+ bitarray_set(this, nr);
>>+ else
>>+ bitarray_clear(this, nr);
>>+ return 1;
>>+ }
>>+
>>+ return 0;
>>+}
>>+
>>+/*
>>+
> *----------------------------------------------------------------------
>>+ * userint interface - provides the actual kvm_irqdevice
> implementation
>>+
> *---------------------------------------------------------------------
>>+ */
>>+
>>+struct kvm_user_irqdev {
>>+ spinlock_t lock;
>>+ atomic_t ref_count;
>>+ struct bitarray pending;
>>+};
>>+
>>+static int user_irqdev_ack(struct kvm_irqdevice *this, int flags,
>>+ struct kvm_irqack_data *data)
>>+{
>>+ struct kvm_user_irqdev *s = (struct
> kvm_user_irqdev*)this- >private;
>>+
>>+ spin_lock(&s- >lock);
>>+
>>+ if (!(flags & KVM_IRQACK_FLAG_PEEK)) {
>>+ int irq = bitarray_findhighest(&s- >pending);
>>+
>>+ if (irq > - 1) {
>>+ /*
>>+ * Automatically clear the interrupt as the EOI
>>+ * mechanism (if any) will take place in
> userspace
>>+ */
>>+ bitarray_clear(&s- >pending, irq);
>>+
>>+ data- >flags |= KVM_IRQACKDATA_VECTOR_VALID;
>>+ }
>>+
>>+ data- >vector = irq;
>>+ }
>>+
>>+ if (bitarray_pending(&s- >pending))
>>+ data- >flags |= KVM_IRQACKDATA_VECTOR_PENDING;
>>+
>>+ spin_unlock(&s- >lock);
>>+
>>+ return 0;
>>+}
>>+
>>+static int user_irqdev_set_pin(struct kvm_irqdevice *this, int irq,
> int
>>level)
>>+{
>>+ struct kvm_user_irqdev *s = (struct
> kvm_user_irqdev*)this- >private;
>>+ int forward = 0;
>>+
>>+ spin_lock(&s- >lock);
>>+ forward = bitarray_test_and_set(&s- >pending, irq, level);
>>+ spin_unlock(&s- >lock);
>>+
>>+ /*
>>+ * alert the higher layer software we have changes
>>+ */
>>+ if (forward)
>>+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
>>+
>>+ return 0;
>>+}
>>+
>>+static void user_irqdev_destructor(struct kvm_irqdevice *this)
>>+{
>>+ struct kvm_user_irqdev *s = (struct
> kvm_user_irqdev*)this- >private;
>>+
>>+ if (atomic_dec_and_test(&s- >ref_count))
>>+ kfree(s);
>>+}
>>+
>>+int kvm_user_irqdev_init(struct kvm_irqdevice *irqdev)
>>+{
>>+ struct kvm_user_irqdev *s;
>>+
>>+ s = kzalloc(sizeof(*s), GFP_KERNEL);
>>+ if (!s)
>>+ return - ENOMEM;
>>+
>>+ spin_lock_init(&s- >lock);
>>+
>>+ irqdev- >ack = user_irqdev_ack;
>>+ irqdev- >set_pin = user_irqdev_set_pin;
>>+ irqdev- >destructor = user_irqdev_destructor;
>>+
>>+ irqdev- >private = s;
>>+ atomic_inc(&s- >ref_count);
>>+
>>+ return 0;
>>+}
>>+
>>+int kvm_user_irqdev_save(struct kvm_irqdevice *this, void *data)
>>+{
>>+ struct kvm_user_irqdev *s = (struct
> kvm_user_irqdev*)this- >private;
>>+
>>+ spin_lock(&s- >lock);
>>+ memcpy(data, s- >pending.pending, sizeof s- >pending.pending);
>>+ spin_unlock(&s- >lock);
>>+
>>+ return 0;
>>+}
>>+
>>+int kvm_user_irqdev_restore(struct kvm_irqdevice *this, void *data)
>>+{
>>+ struct kvm_user_irqdev *s = (struct
> kvm_user_irqdev*)this- >private;
>>+ int i;
>>+ int forward = 0;
>>+
>>+ spin_lock(&s- >lock);
>>+
>>+ /*
>>+ * walk the interrupt- bitmap and inject an IRQ for each bit
> found
>>+ */
>>+ for (i = 0; i < 256; ++i) {
>>+ int val = test_bit(i, data);
>>+ forward = bitarray_test_and_set(&s- >pending, i, val);
>>+ }
>>+
>
> You run over forward each loop but below you use it.
Doh! Good eyes. Thanks
>
>>+ spin_unlock(&s- >lock);
>>+
>>+ /*
>>+ * alert the higher layer software we have changes
>>+ */
>>+ if (forward)
>>+ kvm_irqdevice_set_intr(this, kvm_irqpin_localint);
>>+
>>+ return 0;
>>+}
>>+
>>+int kvm_userint_init(struct kvm_vcpu *vcpu)
>>+{
>>+ return kvm_user_irqdev_init(&vcpu- >irq.dev);
>>+}
>>+
>>diff -- git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
>>index 19edb34..ca858cb 100644
>>--- a/drivers/kvm/vmx.c
>>+++ b/drivers/kvm/vmx.c
>>@@ - 1301,52 +1301,118 @@ static void inject_rmode_irq(struct kvm_vcpu
>>*vcpu, int irq)
>> vmcs_writel(GUEST_RSP, (vmcs_readl(GUEST_RSP) & ~0xffff) | (sp -
> 6));
>> }
>>
>>- static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
>>+static int do_intr_requests(struct kvm_vcpu *vcpu,
>>+ struct kvm_run *kvm_run,
>>+ kvm_irqpin_t pin)
>> {
>>- int word_index = __ffs(vcpu- >irq_summary);
>>- int bit_index = __ffs(vcpu- >irq_pending[word_index]);
>>- int irq = word_index * BITS_PER_LONG + bit_index;
>>-
>>- clear_bit(bit_index, &vcpu- >irq_pending[word_index]);
>>- if (!vcpu- >irq_pending[word_index])
>>- clear_bit(word_index, &vcpu- >irq_summary);
>>-
>>- if (vcpu- >rmode.active) {
>>- inject_rmode_irq(vcpu, irq);
>>- return;
>>- }
>>- vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>>- irq | INTR_TYPE_EXT_INTR |
> INTR_INFO_VALID_MASK);
>>-}
>>-
>>-
>>- static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>>- struct kvm_run *kvm_run)
>>-{
>>- u32 cpu_based_vm_exec_control;
>>+ int handled = 0;
>>
>> vcpu- >interrupt_window_open =
>> ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>> (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
>>
>> if (vcpu- >interrupt_window_open &&
>>- vcpu- >irq_summary &&
>>- !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
> INTR_INFO_VALID_MASK))
>>+ !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
> INTR_INFO_VALID_MASK))
>>{
>> /*
>>- * If interrupts enabled, and not blocked by sti or mov
> ss.
>>Good.
>>+ * If interrupts enabled, and not blocked by sti or mov
> ss.
>>+ * Good.
>> */
>>- kvm_do_inject_irq(vcpu);
>>+ struct kvm_irqack_data ack;
>>+ int r = 0;
>>+
>>+ memset(&ack, 0, sizeof(ack));
>>+
>>+ switch (pin) {
>>+ case kvm_irqpin_localint:
>>+ r = kvm_vcpu_irq_pop(vcpu, &ack);
>>+ break;
>>+ case kvm_irqpin_extint:
>>+ printk(KERN_WARNING "KVM: external- interrupts
> not " \
>>+ "handled yet\n");
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ case kvm_irqpin_nmi:
>>+ /*
>>+ * FIXME: Someday we will handle this using the
>>+ * specific VMX NMI features. For now, just
> inject
>>+ * the NMI as a standard interrupt on vector 2
>>+ */
>>+ ack.flags |= KVM_IRQACKDATA_VECTOR_VALID;
>>+ ack.vector = 2;
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ default:
>>+ panic("KVM: unknown interrupt pin raised: %d\n",
> pin);
>>+ break;
>>+ }
>>+
>>+ BUG_ON(r < 0);
>>+
>>+ if (ack.flags & KVM_IRQACKDATA_VECTOR_VALID) {
>>+ if (vcpu- >rmode.active)
>>+ inject_rmode_irq(vcpu, ack.vector);
>>+ else
>>+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>>+ ack.vector |
>>+ INTR_TYPE_EXT_INTR |
>>+ INTR_INFO_VALID_MASK);
>>+
>>+ handled = 1;
>>+ }
>>+ }
>>
>>- cpu_based_vm_exec_control =
> vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>> if (!vcpu- >interrupt_window_open &&
>>- (vcpu- >irq_summary || kvm_run- >request_interrupt_window))
>>+ (__kvm_vcpu_irq_pending(vcpu) ||
>>+ kvm_run- >request_interrupt_window)) {
>> /*
>> * Interrupts blocked. Wait for unblock.
>> */
>>- cpu_based_vm_exec_control |=
> CPU_BASED_VIRTUAL_INTR_PENDING;
>>- else
>>- cpu_based_vm_exec_control &=
> ~CPU_BASED_VIRTUAL_INTR_PENDING;
>>- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
> cpu_based_vm_exec_control);
>>+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>>+ cbvec |= CPU_BASED_VIRTUAL_INTR_PENDING;
>>+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
>>+ }
>>+
>>+ return handled;
>>+}
>>+
>>+static void clear_pending_controls(struct kvm_vcpu *vcpu)
>>+{
>>+ u32 cbvec = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>>+ cbvec &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
>>+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cbvec);
>>+}
>>+
>>+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>>+ struct kvm_run *kvm_run)
>>+{
>>+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
>>+
>>+ clear_pending_controls(vcpu);
>>+
>>+ while (pending) {
>>+ kvm_irqpin_t pin = __fls(pending);
>>+
>>+ switch (pin) {
>>+ case kvm_irqpin_localint:
>>+ case kvm_irqpin_extint:
>>+ case kvm_irqpin_nmi:
>>+ do_intr_requests(vcpu, kvm_run, pin);
>>+ break;
>>+ case kvm_irqpin_smi:
>>+ /* ignored (for now) */
>>+ printk(KERN_WARNING "KVM: dropping unhandled
> SMI\n");
>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>+ break;
>>+ case kvm_irqpin_invalid:
>>+ /* drop */
>>+ break;
>>+ default:
>>+ panic("KVM: unknown interrupt pin raised: %d\n",
> pin);
>>+ break;
>>+ }
>>+
>>+ __clear_bit(pin, &pending);
>>+ }
>> }
>
> Now the do_interrupt_requests function is totally arch- generalized.
> I think that do_intr_requests should be generalized too.
While I definitely agree that there is likely some kind of generalization to be made somewhere in here, note that this does become more and more arch specific as we move down the series. I will try to pull some of this stuff out to kvm_main
Thanks for the review!
-Greg
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <4641D4D8.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-09 22:12 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6471-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Dor Laor @ 2007-05-09 22:12 UTC (permalink / raw)
To: Gregory Haskins, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>>+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>>>+ struct kvm_run *kvm_run)
>>>+{
>>>+ int pending = __kvm_vcpu_irq_all_pending(vcpu);
>>>+
>>>+ clear_pending_controls(vcpu);
>>>+
>>>+ while (pending) {
>>>+ kvm_irqpin_t pin = __fls(pending);
>>>+
>>>+ switch (pin) {
>>>+ case kvm_irqpin_localint:
>>>+ case kvm_irqpin_extint:
>>>+ case kvm_irqpin_nmi:
>>>+ do_intr_requests(vcpu, kvm_run, pin);
>>>+ break;
>>>+ case kvm_irqpin_smi:
>>>+ /* ignored (for now) */
>>>+ printk(KERN_WARNING "KVM: dropping unhandled
>> SMI\n");
>>>+ __clear_bit(pin, &vcpu- >irq.pending);
>>>+ break;
>>>+ case kvm_irqpin_invalid:
>>>+ /* drop */
>>>+ break;
>>>+ default:
>>>+ panic("KVM: unknown interrupt pin raised: %d\n",
>> pin);
>>>+ break;
>>>+ }
>>>+
>>>+ __clear_bit(pin, &pending);
>>>+ }
>>> }
>>
>>
>> Seems like you can inject several irq at once using the above while
>> loop, but you only do one push in case external interrupt got in the
way
>> and prevented the injection.
>
>I didn't quite understand what you were getting at with the comments
about
>the external interrupts getting in the way, but I think the gist of
your
>comment is "is this broken to push more than one interrupt?"
>
>If so, the answer is that we only ever push one interrupt at a time,
but we
>run through each pending pin to give each handler (do_intr_requests,
and
>later, do_nmi_requests/do_smi_requests) an opportunity to update the
>PENDING/WINDOW type flags.
>
>For instance (this is based on the entire patch series, which includes
the
>NMI work) if both an NMI and localint are pending, the NMI will get
>injected, and the localint will set the IRQ_WINDOW_EXITING feature.
>
>That is the intention, anyway. I understand VMX much better than SVN,
so
>theres a good chance I flubbed this up. ;) Let me know if you see
>anything that differs from what i described.
Ahh, I got it now, I missed that the next irq source won't be injected.
Thanks for clearing this out for me.
I wonder if VMX or SVM have an option of injecting several virq at once.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6471-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
@ 2007-05-09 22:47 ` Gregory Haskins
[not found] ` <4642170B.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-09 22:47 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Dor Laor
>>> On Wed, May 9, 2007 at 6:12 PM, in message
<64F9B87B6B770947A9F8391472E032160BBA6471-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>,
"Dor Laor" <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
> I wonder if VMX or SVM have an option of injecting several virq at once.
VMX definitely does not (IIUC). I don't know enough about SVN to say for sure, but my gut tells me that its not likely. A physical CPU can only accept a single vector at a time, so I bet the virtual ones do as well. If there is more than one pending, I think the intention is to both inject the virq and set the IRQ_WINDOW_EXIT feature to exit on the next RFLAGS.IF window.
Regards,
-Greg
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/8] in-kernel APIC support "v1"
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
` (7 preceding siblings ...)
2007-05-09 3:03 ` [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors Gregory Haskins
@ 2007-05-13 12:02 ` Avi Kivity
[not found] ` <4646FE71.5080009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
8 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-13 12:02 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
> Here is my latest series incorporating the feedback and numerous bugfixes. I
> did not keep an official change-log, so its difficult to say what changed off
> the top of my head without an interdiff. I will keep a changelog from here on
> out. Lets call this drop officially "v1". I will start tracking versions of
> the drop so its easier to refer to them in review notes, etc.
>
> Here are a few notes:
>
> A) I implemented Avi's idea for a fd-based signaling mechanism. I didnt quite
> get what he meant by "writable-fd". The way I saw it, it should be readable
> so that is how I implemented it. If that is not satisfactory, please
> elaborate on the writable idea and I will change it over.
>
I think that it should be writable, as the vcpu wants interrupts to be
pushed into it (a write op) rather than it indicates it wants data to be
pulled out of it.
> B) I changed the controversial kvm_irqdevice_ack() mechanism to use an "out"
> structure, instead of an int pointer + return bitmap. Hopefully, this design
> puts Avi's mind at ease as the return code is more standard now. In addition,
> this API makes it easier to extend, which I take advantage of later in the
> series for the TPR-shadow stuff.
>
It should certainly be cleaner code-wise. It's probably still complex
in terms of the number of cases it generates, but as I said before, I
have no better idea to offer.
> C) I changed the irq.task assignment from a lock to a barrier, per review
> comments. However, I left the irq.guestmode = 0 assignment in a lock because
> I believe it is actually required to eliminate a race. E.g. We want to make
> sure that the irq.pending and IPI-method are decided atomically and the
> irq.guest-mode is essentially identifiying a critical section. I could be
> convinced otherwise, but for now its still there.
>
Ok.
> D) Patch #8 is for demonstration purposes only. Dont apply it (yet) as it
> causes the system to error on VMENTRY. I include it purely so its clear where
> I am going.
>
It's certainly the right direction. But we don't support Windows x84
yet, which would be the primary (only?) beneficiary?
> Overall, this code (excluding patch #8) seems to be working quite well from a
> pure functional standpoint. One problem that I see is QEMU remains pretty
> busy even when the guest is idle. I have a feeling it has something to do
> with the way signals are delivered...TBD. Otherwise, its working from my
> perspective. I would love to hear feedback from testers.
>
Is this on all guests? Windows+APIC only? What exactly do you mean by
high?
An oprofile run should clear the mystery.
> An interesting discovery on my part while working on this is that there is an
> aparent mis-emulation in the QEMU LAPIC code. The kernel that ships as the
> SLED-10 installer (2.6.16.21, I think) maps LINT0 as an NMI and masks off all
> interrupts in the 8259 except the PIT. It also leaves the PIT input on the
> IOAPIC active.
>
> This means that every timer tick gets delivered both as a FIXED vector from
> the IOAPIC, and as an NMI. As far as I can tell from reading google, this is
> what linux intended. Note, however, that under QEMU LAPIC, LINT0 is dropped
> if the vector is not EXTINT whereas the in-kernel APIC emulates both.
> Therefore, cat'ing /proc/interrupts under stock KVM shows only IRQ: 0, and LOC
> incrementing, with NMI at 0. The in-kernel patches show NMIs also
> incrementing.
>
> I could generate a patch to fix the QEMU code, but what I am not sure of is
> whether this was intentionally coded to ignore the LINT0 NMI programming?
>
>
See 654501f79be082925c623806c00a27021565035f in kvm-userspace.git. I
confess I didn't delve deeply into it, but it seems related.
I wouldn't be surprised if indeed there's a qemu bug in this area.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/8] KVM: Add irqdevice object
[not found] ` <4642170B.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-13 12:05 ` Avi Kivity
0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2007-05-13 12:05 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>>>> On Wed, May 9, 2007 at 6:12 PM, in message
>>>>
> <64F9B87B6B770947A9F8391472E032160BBA6471-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>,
> "Dor Laor" <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>> I wonder if VMX or SVM have an option of injecting several virq at once.
>>
>
> VMX definitely does not (IIUC). I don't know enough about SVN to say for sure, but my gut tells me that its not likely. A physical CPU can only accept a single vector at a time, so I bet the virtual ones do as well. If there is more than one pending, I think the intention is to both inject the virq and set the IRQ_WINDOW_EXIT feature to exit on the next RFLAGS.IF window.
>
>
IIRC SVM can queue a single interrupt. You tell it it's priority level,
and it will inject it when eflags.if and tpr permit. Should be very useful.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/8] in-kernel APIC support "v1"
[not found] ` <4646FE71.5080009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-13 14:09 ` Gregory Haskins
[not found] ` <4646E3D1.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-13 14:09 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Sun, May 13, 2007 at 8:02 AM, in message <4646FE71.5080009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>> Here is my latest series incorporating the feedback and numerous bugfixes.
> I
>> did not keep an official change- log, so its difficult to say what changed off
>> the top of my head without an interdiff. I will keep a changelog from here
> on
>> out. Lets call this drop officially "v1". I will start tracking versions
> of
>> the drop so its easier to refer to them in review notes, etc.
>>
>> Here are a few notes:
>>
>> A) I implemented Avi's idea for a fd- based signaling mechanism. I didnt
> quite
>> get what he meant by "writable- fd". The way I saw it, it should be readable
>> so that is how I implemented it. If that is not satisfactory, please
>> elaborate on the writable idea and I will change it over.
>>
>
> I think that it should be writable, as the vcpu wants interrupts to be
> pushed into it (a write op) rather than it indicates it wants data to be
> pulled out of it.
Ok, I think we might just be confusing terms. What you describe is essentially what I do, but I don't do it via an explicit fd based write(). The PIC posts an interrupt to the VCPU, which "writes" to the irq.usignal state. This would then trigger an event to any listeners on the fd.
>
>> B) I changed the controversial kvm_irqdevice_ack() mechanism to use an "out"
>> structure, instead of an int pointer + return bitmap. Hopefully, this
> design
>> puts Avi's mind at ease as the return code is more standard now. In
> addition,
>> this API makes it easier to extend, which I take advantage of later in the
>> series for the TPR- shadow stuff.
>>
>
> It should certainly be cleaner code- wise. It's probably still complex
> in terms of the number of cases it generates, but as I said before, I
> have no better idea to offer.
Ok
>> D) Patch #8 is for demonstration purposes only. Dont apply it (yet) as it
>> causes the system to error on VMENTRY. I include it purely so its clear
> where
>> I am going.
>>
>
> It's certainly the right direction. But we don't support Windows x84
> yet, which would be the primary (only?) beneficiary?
Agreed. x86_64 TPR consumers only (and only MOV-to-CR8 users at that...if 64 bit windows still uses MMIO we're hosed ;)
>
>> Overall, this code (excluding patch #8) seems to be working quite well from
> a
>> pure functional standpoint. One problem that I see is QEMU remains pretty
>> busy even when the guest is idle. I have a feeling it has something to do
>> with the way signals are delivered...TBD. Otherwise, its working from my
>> perspective. I would love to hear feedback from testers.
>>
>
> Is this on all guests? Windows+APIC only? What exactly do you mean by
> high?
Yeah, all guests and it was really high (70-85% on QEMU in top). However its moot now since I found/fixed the issue in v3. It was that the guests were looping on HLT.
>
> An oprofile run should clear the mystery.
>
>> An interesting discovery on my part while working on this is that there is
> an
>> aparent mis- emulation in the QEMU LAPIC code. The kernel that ships as the
>> SLED- 10 installer (2.6.16.21, I think) maps LINT0 as an NMI and masks off all
>> interrupts in the 8259 except the PIT. It also leaves the PIT input on the
>> IOAPIC active.
>>
>> This means that every timer tick gets delivered both as a FIXED vector from
>> the IOAPIC, and as an NMI. As far as I can tell from reading google, this
> is
>> what linux intended. Note, however, that under QEMU LAPIC, LINT0 is dropped
>> if the vector is not EXTINT whereas the in- kernel APIC emulates both.
>> Therefore, cat'ing /proc/interrupts under stock KVM shows only IRQ: 0, and
> LOC
>> incrementing, with NMI at 0. The in- kernel patches show NMIs also
>> incrementing.
>>
>> I could generate a patch to fix the QEMU code, but what I am not sure of is
>> whether this was intentionally coded to ignore the LINT0 NMI programming?
>>
>>
>
> See 654501f79be082925c623806c00a27021565035f in kvm- userspace.git. I
> confess I didn't delve deeply into it, but it seems related.
I don't have access to the repo right this second, but I will take a look later.
>
> I wouldn't be surprised if indeed there's a qemu bug in this area.
Ok, I will try to post a patch to fix this then. It might not clean up easy as I think depending on the QEMU code, TBD.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/8] in-kernel APIC support "v1"
[not found] ` <4646E3D1.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-13 15:45 ` Avi Kivity
0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2007-05-13 15:45 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>> I think that it should be writable, as the vcpu wants interrupts to be
>> pushed into it (a write op) rather than it indicates it wants data to be
>> pulled out of it.
>>
>
> Ok, I think we might just be confusing terms. What you describe is essentially what I do, but I don't do it via an explicit fd based write(). The PIC posts an interrupt to the VCPU, which "writes" to the irq.usignal state. This would then trigger an event to any listeners on the fd.
>
>
Sorry, my fault. I kept saying "writable fd" while neglecting to
mention that I don't see the need for any actual write. It's more like
a "conceptual write".
>> It's certainly the right direction. But we don't support Windows x84
>> yet, which would be the primary (only?) beneficiary?
>>
>
> Agreed. x86_64 TPR consumers only (and only MOV-to-CR8 users at that...if 64 bit windows still uses MMIO we're hosed ;)
>
>
Well, cr8 was certainly designed for Windows. Nothing else uses it
AFAIK. I'd be greatly surprised if Windows x86 doesn't use it.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <20070509030325.23443.90129.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-14 9:34 ` Avi Kivity
[not found] ` <46482D2E.7040809-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-14 9:34 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
> The VCPU executes synchronously w.r.t. userspace today, and therefore
> interrupt injection is pretty straight forward. However, we will soon need
> to be able to inject interrupts asynchronous to the execution of the VCPU
> due to the introduction of SMP, paravirtualized drivers, and asynchronous
> hypercalls. This patch adds support to the interrupt mechanism to force
> a VCPU to VMEXIT when a new interrupt is pending.
>
>
Comments below are fairly minor, but worthwhile IMO.
> Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
> ---
>
> drivers/kvm/kvm.h | 2 ++
> drivers/kvm/kvm_main.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-
> drivers/kvm/svm.c | 43 +++++++++++++++++++++++++++++++++++
> drivers/kvm/vmx.c | 43 +++++++++++++++++++++++++++++++++++
> 4 files changed, 146 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> index 059f074..0f6cc32 100644
> --- a/drivers/kvm/kvm.h
> +++ b/drivers/kvm/kvm.h
> @@ -329,6 +329,8 @@ struct kvm_vcpu_irq {
> struct kvm_irqdevice dev;
> int pending;
> int deferred;
> + struct task_struct *task;
> + int guest_mode;
>
->guest_mode can be folded into ->task, by specifying that ->task !=
NULL is equivalent to ->guest_mode != 0. This will make the rest of the
code easier to read.
> };
>
> struct kvm_vcpu {
> diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
> index 199489b..a160638 100644
> --- a/drivers/kvm/kvm_main.c
> +++ b/drivers/kvm/kvm_main.c
> @@ -1868,6 +1868,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_arch_ops->decache_regs(vcpu);
> }
>
> + vcpu->irq.task = current;
> + smp_wmb();
> +
>
This is best moved where ->guest_mode is set.
> +/*
> * This function will be invoked whenever the vcpu->irq.dev raises its INTR
> * line
> */
> @@ -2318,10 +2335,50 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
> {
> struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
> unsigned long flags;
> + int direct_ipi = -1;
>
> spin_lock_irqsave(&vcpu->irq.lock, flags);
>
irqs are always enabled here, so spin_lock_irq() (and a corresponding
spin_unlock_irq) is sufficient.
> static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
> diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
> index 4c03881..91546ae 100644
> --- a/drivers/kvm/svm.c
> +++ b/drivers/kvm/svm.c
> @@ -1542,11 +1542,40 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> u16 gs_selector;
> u16 ldt_selector;
> int r;
> + unsigned long irq_flags;
>
> again:
> + /*
> + * We disable interrupts until the next VMEXIT to eliminate a race
> + * condition for delivery of virtual interrutps. Note that this is
> + * probably not as bad as it sounds, as interrupts will still invoke
> + * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
> + * scope) even if they are disabled.
> + *
> + * FIXME: Do we need to do anything additional to mask IPI/NMIs?
>
You can remove the FIXME.
> + */
> + local_irq_save(irq_flags);
>
Interrupts are always enabled here, so local_irq_disable() suffices.
> @@ -1688,6 +1717,13 @@ again:
> #endif
> : "cc", "memory" );
>
> + /*
> + * FIXME: We'd like to turn on interrupts ASAP, but is this so early
> + * that we will mess up the state of the CPU before we fully
> + * transition from guest to host?
> + */
>
You can remove the FIXME. Pre-patch enabled interrupts much earlier.
> + local_irq_restore(irq_flags);
> +
> if (vcpu->fpu_active) {
> fx_save(vcpu->guest_fx_image);
> fx_restore(vcpu->host_fx_image);
> @@ -1710,6 +1746,13 @@ again:
> reload_tss(vcpu);
>
> /*
> + * Signal that we have transitioned back to host mode
> + */
> + spin_lock_irqsave(&vcpu->irq.lock, irq_flags);
> + vcpu->irq.guest_mode = 0;
> + spin_unlock_irqrestore(&vcpu->irq.lock, irq_flags);
>
>> Don't you need to check interrupts here?
> No, we assume that host userspace won't sleep.
Right, I forgot again.
> (prof_on == KVM_PROFILING))
> diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
> index ca858cb..7b81fff 100644
> --- a/drivers/kvm/vmx.c
> +++ b/drivers/kvm/vmx.c
> @@ -1895,6 +1895,7 @@ static int vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> u16 fs_sel, gs_sel, ldt_sel;
> int fs_gs_ldt_reload_needed;
> int r;
> + unsigned long irq_flags;
>
> preempted:
> /*
> @@ -1929,9 +1930,37 @@ preempted:
> if (vcpu->guest_debug.enabled)
> kvm_guest_debug_pre(vcpu);
>
> + /*
> + * We disable interrupts until the next VMEXIT to eliminate a race
> + * condition for delivery of virtual interrutps. Note that this is
> + * probably not as bad as it sounds, as interrupts will still invoke
> + * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
> + * scope) even if they are disabled.
> + *
> + * FIXME: Do we need to do anything additional to mask IPI/NMIs?
> + */
> + local_irq_save(irq_flags);
> +
>
Pretty much same comments apply here. One day we'll unify some of this
code.
[...]
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors
[not found] ` <20070509030340.23443.84153.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-14 9:38 ` Avi Kivity
0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2007-05-14 9:38 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
> Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
> ---
>
> drivers/kvm/vmx.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++----
> drivers/kvm/vmx.h | 3 +++
> 2 files changed, 61 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
> index bee4831..1c99bc9 100644
> --- a/drivers/kvm/vmx.c
> +++ b/drivers/kvm/vmx.c
> @@ -1148,7 +1148,14 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
> PIN_BASED_VM_EXEC_CONTROL,
> PIN_BASED_EXT_INTR_MASK /* 20.6.1 */
> | PIN_BASED_NMI_EXITING /* 20.6.1 */
> + | PIN_BASED_VIRTUAL_NMI /* 20.6.1 */
> );
> +
> + if (!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & PIN_BASED_VIRTUAL_NMI))
> + printk(KERN_WARNING "KVM: Warning - Host processor does " \
> + "not support virtual-NMI injection. Using IRQ " \
> + "method\n");
>
Warning is too severe here. Things work (right?), there's nothing the
user can do about it, and no need to alert kvm-devel. KERN_DEBUG is
sufficient.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors
[not found] ` <20070509030350.23443.35387.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
@ 2007-05-14 11:09 ` Avi Kivity
[not found] ` <46484376.6090304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-14 11:09 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
> Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
> ---
>
>
How was this tested?
> + printk(KERN_WARNING "KVM: Warning - Host processor does " \
> + "not support TPR-shadow\n");
>
KERN_DEBUG.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <46482D2E.7040809-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-14 15:19 ` Gregory Haskins
[not found] ` <464845AD.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-14 15:19 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Mon, May 14, 2007 at 5:34 AM, in message <46482D2E.7040809-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>> The VCPU executes synchronously w.r.t. userspace today, and therefore
>> interrupt injection is pretty straight forward. However, we will soon need
>> to be able to inject interrupts asynchronous to the execution of the VCPU
>> due to the introduction of SMP, paravirtualized drivers, and asynchronous
>> hypercalls. This patch adds support to the interrupt mechanism to force
>> a VCPU to VMEXIT when a new interrupt is pending.
>>
>>
>
> Comments below are fairly minor, but worthwhile IMO.
>
>
>
>> Signed- off- by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
>> ---
>>
>> drivers/kvm/kvm.h | 2 ++
>> drivers/kvm/kvm_main.c | 59
> +++++++++++++++++++++++++++++++++++++++++++++++-
>> drivers/kvm/svm.c | 43 +++++++++++++++++++++++++++++++++++
>> drivers/kvm/vmx.c | 43 +++++++++++++++++++++++++++++++++++
>> 4 files changed, 146 insertions(+), 1 deletions(- )
>>
>> diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
>> index 059f074..0f6cc32 100644
>> --- a/drivers/kvm/kvm.h
>> +++ b/drivers/kvm/kvm.h
>> @@ - 329,6 +329,8 @@ struct kvm_vcpu_irq {
>> struct kvm_irqdevice dev;
>> int pending;
>> int deferred;
>> + struct task_struct *task;
>> + int guest_mode;
>>
>
> - >guest_mode can be folded into - >task, by specifying that - >task !=
> NULL is equivalent to - >guest_mode != 0. This will make the rest of the
> code easier to read.
The problem with doing it this way is that its no longer possible to detect the optimizing condition of "irq.task != current" when injecting interrupts. This means that userspace will be inadvertently sending itself a signal every time it injects interrupts, which IMHO is undesirable.
>
>> };
>>
>> struct kvm_vcpu {
>> diff -- git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
>> index 199489b..a160638 100644
>> --- a/drivers/kvm/kvm_main.c
>> +++ b/drivers/kvm/kvm_main.c
>> @@ - 1868,6 +1868,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *kvm_run)
>> kvm_arch_ops- >decache_regs(vcpu);
>> }
>>
>> + vcpu- >irq.task = current;
>> + smp_wmb();
>> +
>>
>
> This is best moved where - >guest_mode is set.
I can do this, but its common to all platforms so I figured it was best to be out here?
>
>> +/*
>> * This function will be invoked whenever the vcpu- >irq.dev raises its INTR
>> * line
>> */
>> @@ - 2318,10 +2335,50 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>> {
>> struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private;
>> unsigned long flags;
>> + int direct_ipi = - 1;
>>
>> spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>
>
> irqs are always enabled here, so spin_lock_irq() (and a corresponding
> spin_unlock_irq) is sufficient.
This and the rest of your comments make sense. Consider them all acked.
>
>> static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
>> diff -- git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
>> index 4c03881..91546ae 100644
>> --- a/drivers/kvm/svm.c
>> +++ b/drivers/kvm/svm.c
>> @@ - 1542,11 +1542,40 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct
> kvm_run *kvm_run)
>> u16 gs_selector;
>> u16 ldt_selector;
>> int r;
>> + unsigned long irq_flags;
>>
>> again:
>> + /*
>> + * We disable interrupts until the next VMEXIT to eliminate a race
>> + * condition for delivery of virtual interrutps. Note that this is
>> + * probably not as bad as it sounds, as interrupts will still invoke
>> + * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
>> + * scope) even if they are disabled.
>> + *
>> + * FIXME: Do we need to do anything additional to mask IPI/NMIs?
>>
>
> You can remove the FIXME.
>
>> + */
>> + local_irq_save(irq_flags);
>>
>
> Interrupts are always enabled here, so local_irq_disable() suffices.
>
>> @@ - 1688,6 +1717,13 @@ again:
>> #endif
>> : "cc", "memory" );
>>
>> + /*
>> + * FIXME: We'd like to turn on interrupts ASAP, but is this so early
>> + * that we will mess up the state of the CPU before we fully
>> + * transition from guest to host?
>> + */
>>
>
> You can remove the FIXME. Pre- patch enabled interrupts much earlier.
>
>> + local_irq_restore(irq_flags);
>> +
>> if (vcpu- >fpu_active) {
>> fx_save(vcpu- >guest_fx_image);
>> fx_restore(vcpu- >host_fx_image);
>> @@ - 1710,6 +1746,13 @@ again:
>> reload_tss(vcpu);
>>
>> /*
>> + * Signal that we have transitioned back to host mode
>> + */
>> + spin_lock_irqsave(&vcpu- >irq.lock, irq_flags);
>> + vcpu- >irq.guest_mode = 0;
>> + spin_unlock_irqrestore(&vcpu- >irq.lock, irq_flags);
>>
>
> >> Don't you need to check interrupts here?
> > No, we assume that host userspace won't sleep.
> Right, I forgot again.
>
>
>> (prof_on == KVM_PROFILING))
>> diff -- git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
>> index ca858cb..7b81fff 100644
>> --- a/drivers/kvm/vmx.c
>> +++ b/drivers/kvm/vmx.c
>> @@ - 1895,6 +1895,7 @@ static int vmx_vcpu_run(struct kvm_vcpu *vcpu, struct
> kvm_run *kvm_run)
>> u16 fs_sel, gs_sel, ldt_sel;
>> int fs_gs_ldt_reload_needed;
>> int r;
>> + unsigned long irq_flags;
>>
>> preempted:
>> /*
>> @@ - 1929,9 +1930,37 @@ preempted:
>> if (vcpu- >guest_debug.enabled)
>> kvm_guest_debug_pre(vcpu);
>>
>> + /*
>> + * We disable interrupts until the next VMEXIT to eliminate a race
>> + * condition for delivery of virtual interrutps. Note that this is
>> + * probably not as bad as it sounds, as interrupts will still invoke
>> + * a VMEXIT once transitioned to GUEST mode (and thus exit this lock
>> + * scope) even if they are disabled.
>> + *
>> + * FIXME: Do we need to do anything additional to mask IPI/NMIs?
>> + */
>> + local_irq_save(irq_flags);
>> +
>>
>
> Pretty much same comments apply here. One day we'll unify some of this
> code.
>
>
> [...]
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors
[not found] ` <46484376.6090304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-14 15:28 ` Gregory Haskins
0 siblings, 0 replies; 30+ messages in thread
From: Gregory Haskins @ 2007-05-14 15:28 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Mon, May 14, 2007 at 7:09 AM, in message <46484376.6090304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>> Signed- off- by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
>> ---
>>
>>
>
> How was this tested?
Its busted. Don't use it ;) Its just for example/comment only. I will exclude it from future submissions.
>
>> + printk(KERN_WARNING "KVM: Warning - Host processor does " \
>> + "not support TPR- shadow\n");
>>
>
> KERN_DEBUG.
Ack.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <464845AD.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-14 15:45 ` Avi Kivity
[not found] ` <46488426.8090705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-14 15:45 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>>> index 059f074..0f6cc32 100644
>>> --- a/drivers/kvm/kvm.h
>>> +++ b/drivers/kvm/kvm.h
>>> @@ - 329,6 +329,8 @@ struct kvm_vcpu_irq {
>>> struct kvm_irqdevice dev;
>>> int pending;
>>> int deferred;
>>> + struct task_struct *task;
>>> + int guest_mode;
>>>
>>>
>> - >guest_mode can be folded into - >task, by specifying that - >task !=
>> NULL is equivalent to - >guest_mode != 0. This will make the rest of the
>> code easier to read.
>>
>
> The problem with doing it this way is that its no longer possible to detect the optimizing condition of "irq.task != current" when injecting interrupts. This means that userspace will be inadvertently sending itself a signal every time it injects interrupts, which IMHO is undesirable.
>
>
I meant keeping ->task and dropping ->guest_mode. Or did I
misunderstand something?
>>>
>>> + vcpu- >irq.task = current;
>>> + smp_wmb();
>>> +
>>>
>>>
>> This is best moved where - >guest_mode is set.
>>
>
> I can do this, but its common to all platforms so I figured it was best to be out here?
>
>
Well, it scatters the logic. If we can merge guest_mode and task it's
moot anyway.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <46488426.8090705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-14 18:19 ` Gregory Haskins
[not found] ` <46486FD4.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-14 18:19 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Mon, May 14, 2007 at 11:45 AM, in message <46488426.8090705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>>>> index 059f074..0f6cc32 100644
>>>> --- a/drivers/kvm/kvm.h
>>>> +++ b/drivers/kvm/kvm.h
>>>> @@ - 329,6 +329,8 @@ struct kvm_vcpu_irq {
>>>> struct kvm_irqdevice dev;
>>>> int pending;
>>>> int deferred;
>>>> + struct task_struct *task;
>>>> + int guest_mode;
>>>>
>>>>
>>> - >guest_mode can be folded into - >task, by specifying that - >task !=
>>> NULL is equivalent to - >guest_mode != 0. This will make the rest of the
>>> code easier to read.
>>>
>>
>> The problem with doing it this way is that its no longer possible to detect
> the optimizing condition of "irq.task != current" when injecting interrupts.
> This means that userspace will be inadvertently sending itself a signal every
> time it injects interrupts, which IMHO is undesirable.
>>
>>
>
> I meant keeping - >task and dropping - >guest_mode. Or did I
> misunderstand something?
Its possible that I am actually misunderstanding you instead, but from my perspective those two variables are tracking orthogonal state. irq.task is keeping track of the thread that is running the CPU. This will tend to get set once (on the first entry to kvm_run() and stay unchanged for the duration of the VM. irq.guest_mode, on the other hand, will track whether the vcpu is in (or near) guest mode (to switch between direct_ipi and eventfd wakeup methods).
I like having both states tracked, because it allows me to optimize the vcpu interrupt if the context of the injection is the same as the execution. E.g. if the single QEMU thread calls KVM_RUN and then KVM_INTERRUPT, I can skip sending an eventfd because I know the irq.task == current and its pointless.
(Note that in the original designs, irq.task was also used to designate a target for send_sig. Perhaps it is no longer logical to have this scoped to the vcpu.irq structure anymore? E.g. should I make it vcpu.task?)
>
>>>>
>>>> + vcpu- >irq.task = current;
>>>> + smp_wmb();
>>>> +
>>>>
>>>>
>>> This is best moved where - >guest_mode is set.
>>>
>>
>> I can do this, but its common to all platforms so I figured it was best to
> be out here?
>>
>>
>
> Well, it scatters the logic. If we can merge guest_mode and task it's
> moot anyway.
Sounds reasonable. If you convince me to condense this it goes away outright, otherwise I will move it together. ;)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <46486FD4.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-15 7:28 ` Avi Kivity
[not found] ` <46496125.5020909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-15 7:28 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>>>> On Mon, May 14, 2007 at 11:45 AM, in message <46488426.8090705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
>>>>
> Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>> Gregory Haskins wrote:
>>
>>>>> index 059f074..0f6cc32 100644
>>>>> --- a/drivers/kvm/kvm.h
>>>>> +++ b/drivers/kvm/kvm.h
>>>>> @@ - 329,6 +329,8 @@ struct kvm_vcpu_irq {
>>>>> struct kvm_irqdevice dev;
>>>>> int pending;
>>>>> int deferred;
>>>>> + struct task_struct *task;
>>>>> + int guest_mode;
>>>>>
>>>>>
>>>>>
>>>> - >guest_mode can be folded into - >task, by specifying that - >task !=
>>>> NULL is equivalent to - >guest_mode != 0. This will make the rest of the
>>>> code easier to read.
>>>>
>>>>
>>> The problem with doing it this way is that its no longer possible to detect
>>>
>> the optimizing condition of "irq.task != current" when injecting interrupts.
>> This means that userspace will be inadvertently sending itself a signal every
>> time it injects interrupts, which IMHO is undesirable.
>>
>>>
>>>
>> I meant keeping - >task and dropping - >guest_mode. Or did I
>> misunderstand something?
>>
>
> Its possible that I am actually misunderstanding you instead, but from my perspective those two variables are tracking orthogonal state. irq.task is keeping track of the thread that is running the CPU. This will tend to get set once (on the first entry to kvm_run() and stay unchanged for the duration of the VM. irq.guest_mode, on the other hand, will track whether the vcpu is in (or near) guest mode (to switch between direct_ipi and eventfd wakeup methods).
>
> I like having both states tracked, because it allows me to optimize the vcpu interrupt if the context of the injection is the same as the execution. E.g. if the single QEMU thread calls KVM_RUN and then KVM_INTERRUPT, I can skip sending an eventfd because I know the irq.task == current and its pointless.
>
You can't rely on irq.task if !guest_mode. Under the current design,
the task may have exited and you'd be dereferencing unallocated memory.
While it won't oops or cause anything bad to happen (and current qemu
can't trigger this), it isn't nice.
Later we'll have vcpu and thread_info point to each other and then you
can do that kind of optimization.
Oh, and nobody said that the task waiting on the event is the same as
the task running the vcpu.
> (Note that in the original designs, irq.task was also used to designate a target for send_sig. Perhaps it is no longer logical to have this scoped to the vcpu.irq structure anymore? E.g. should I make it vcpu.task?)
>
I think so.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <46496125.5020909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-15 11:56 ` Gregory Haskins
[not found] ` <4649679C.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-15 11:56 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Tue, May 15, 2007 at 3:28 AM, in message <46496125.5020909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>
>>
>> I like having both states tracked, because it allows me to optimize the vcpu
> interrupt if the context of the injection is the same as the execution. E.g.
> if the single QEMU thread calls KVM_RUN and then KVM_INTERRUPT, I can skip
> sending an eventfd because I know the irq.task == current and its pointless.
>
>>
>
> You can't rely on irq.task if !guest_mode. Under the current design,
> the task may have exited and you'd be dereferencing unallocated memory.
> While it won't oops or cause anything bad to happen (and current qemu
> can't trigger this), it isn't nice.
Hmm...that is a good point.
>
> Later we'll have vcpu and thread_info point to each other and then you
> can do that kind of optimization.
I am not familiar with thread_info, but if it solves the dangling pointer problem, that sounds great. It sounds like you are in favor of leaving this optimization for a later time. As long as you are ok with every interrupt related ioctl such as KVM_APIC_MSG, and KVM_ISA_INTERRUPT posting a signal to itself, we can pull this for now. Conversely, if the thread_info approach isn't hard, I would prefer to get this right now, as the double interrupt thing seems nasty to me.
Alternatively, perhaps I can just replace irq.task with irq.pid? And I could also replace irq.guest_mode with irq.guest_cpu. I will then record the pid where today I record the task. Likewise, I can extract the guest_cpu (using task_cpu(current)) where today I assign irq.guest_mode = 1. That would effectively remove the dangling pointer problem while retaining the features that I like.
Thoughts?
>
> Oh, and nobody said that the task waiting on the event is the same as
> the task running the vcpu.
I'm a little confused by this statement. I don't use irq.task for assigning a target for the event. That is all self contained in the eventfd. Its true that some of the older designs used this as the send_sig() target, but the assumption there was we were posting a signal to the entire PID, not a specific TID. That could have been a bad assumption, but its moot now. Let me know if you meant something else.
>
>> (Note that in the original designs, irq.task was also used to designate a
> target for send_sig. Perhaps it is no longer logical to have this scoped to
> the vcpu.irq structure anymore? E.g. should I make it vcpu.task?)
>>
>
> I think so.
>
>
Ok. Assuming you accept my pid/guest_cpu idea above, I will make it "pid_t vcpu.owner", and "int vcpu.irq.guest_cpu" (where it will be -1 if not in guest_mode)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <4649679C.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-16 10:05 ` Avi Kivity
[not found] ` <464AD772.4050007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2007-05-16 10:05 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>
>> Later we'll have vcpu and thread_info point to each other and then you
>> can do that kind of optimization.
>>
>
> I am not familiar with thread_info, but if it solves the dangling pointer problem, that sounds great.
It's fairly tricky, but it's the right way forward. It means a 1:1
association of a vcpu and a thread for the lifetime of the vcpu.
> It sounds like you are in favor of leaving this optimization for a later time. As long as you are ok with every interrupt related ioctl such as KVM_APIC_MSG, and KVM_ISA_INTERRUPT posting a signal to itself, we can pull this for now.
Perhaps you can disable this by noticing that you're injecting an
interrupt now (another icky variable in struct kvm_vcpu).
> Conversely, if the thread_info approach isn't hard, I would prefer to get this right now, as the double interrupt thing seems nasty to me.
>
> Alternatively, perhaps I can just replace irq.task with irq.pid? And I could also replace irq.guest_mode with irq.guest_cpu. I will then record the pid where today I record the task. Likewise, I can extract the guest_cpu (using task_cpu(current)) where today I assign irq.guest_mode = 1. That would effectively remove the dangling pointer problem while retaining the features that I like.
>
pid can dangle just the same as a task pointer, only much worse.
> Thoughts?
>
>
>> Oh, and nobody said that the task waiting on the event is the same as
>> the task running the vcpu.
>>
>
> I'm a little confused by this statement. I don't use irq.task for assigning a target for the event. That is all self contained in the eventfd. Its true that some of the older designs used this as the send_sig() target, but the assumption there was we were posting a signal to the entire PID, not a specific TID. That could have been a bad assumption, but its moot now. Let me know if you meant something else.
>
Strike that. I was confused.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <464AD772.4050007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-16 12:10 ` Gregory Haskins
[not found] ` <464ABC67.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Gregory Haskins @ 2007-05-16 12:10 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>>> On Wed, May 16, 2007 at 6:05 AM, in message <464AD772.4050007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Gregory Haskins wrote:
>>
>>> Later we'll have vcpu and thread_info point to each other and then you
>>> can do that kind of optimization.
>>>
>>
>> I am not familiar with thread_info, but if it solves the dangling pointer
> problem, that sounds great.
>
> It's fairly tricky, but it's the right way forward. It means a 1:1
> association of a vcpu and a thread for the lifetime of the vcpu.
Any pointers would be appreciated. Otherwise I will hit up google.
>
>> It sounds like you are in favor of leaving this optimization for a later
> time. As long as you are ok with every interrupt related ioctl such as
> KVM_APIC_MSG, and KVM_ISA_INTERRUPT posting a signal to itself, we can pull
> this for now.
>
> Perhaps you can disable this by noticing that you're injecting an
> interrupt now (another icky variable in struct kvm_vcpu).
I don't think this can be made to work without having the same problem that we face already. Detecting self-injection is the same problem at ioctl entry point as it is at irqdevice::intr
>
>> Conversely, if the thread_info approach isn't hard, I would prefer to get
> this right now, as the double interrupt thing seems nasty to me.
>>
>> Alternatively, perhaps I can just replace irq.task with irq.pid? And I
> could also replace irq.guest_mode with irq.guest_cpu. I will then record the
> pid where today I record the task. Likewise, I can extract the guest_cpu
> (using task_cpu(current)) where today I assign irq.guest_mode = 1. That
> would effectively remove the dangling pointer problem while retaining the
> features that I like.
>>
>
> pid can dangle just the same as a task pointer, only much worse.
I agree that it can dangle briefly if userspace is using something like thread-pooling to execute VCPUs. I don't see how it can be worse, however. And also note that having it dangling doesn't have the nasty problem that the task pointer does: dereferencing an invalid pointer.
However, that being said: I do see how either of these solutions leave a potential race condition against missing some signals: If a thread that was executing the VCPU changes roles, AND it injects an interrupt before the new thread starts executing...we would miss the signal. This is not a realistic scenario today, but it is a hole. Hmmm..... how does that thread_info stuff work? ;)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU
[not found] ` <464ABC67.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
@ 2007-05-16 12:18 ` Avi Kivity
0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2007-05-16 12:18 UTC (permalink / raw)
To: Gregory Haskins; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Gregory Haskins wrote:
>>>> On Wed, May 16, 2007 at 6:05 AM, in message <464AD772.4050007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
>>>>
> Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>
>> Gregory Haskins wrote:
>>
>>>
>>>
>>>> Later we'll have vcpu and thread_info point to each other and then you
>>>> can do that kind of optimization.
>>>>
>>>>
>>> I am not familiar with thread_info, but if it solves the dangling pointer
>>>
>> problem, that sounds great.
>>
>> It's fairly tricky, but it's the right way forward. It means a 1:1
>> association of a vcpu and a thread for the lifetime of the vcpu.
>>
>
> Any pointers would be appreciated. Otherwise I will hit up google.
>
>
task_struct (include/linux/sched.h) is probably a better fit.
Basically, like each task has attributes like its registers, fpu state,
and open file table, it would also have a vcpu attribute if it's part of
a virtual machine.
>>> It sounds like you are in favor of leaving this optimization for a later
>>>
>> time. As long as you are ok with every interrupt related ioctl such as
>> KVM_APIC_MSG, and KVM_ISA_INTERRUPT posting a signal to itself, we can pull
>> this for now.
>>
>> Perhaps you can disable this by noticing that you're injecting an
>> interrupt now (another icky variable in struct kvm_vcpu).
>>
>
> I don't think this can be made to work without having the same problem that we face already. Detecting self-injection is the same problem at ioctl entry point as it is at irqdevice::intr
>
Ok.
>
>>> Conversely, if the thread_info approach isn't hard, I would prefer to get
>>>
>> this right now, as the double interrupt thing seems nasty to me.
>>
>>> Alternatively, perhaps I can just replace irq.task with irq.pid? And I
>>>
>> could also replace irq.guest_mode with irq.guest_cpu. I will then record the
>> pid where today I record the task. Likewise, I can extract the guest_cpu
>> (using task_cpu(current)) where today I assign irq.guest_mode = 1. That
>> would effectively remove the dangling pointer problem while retaining the
>> features that I like.
>>
>>>
>>>
>> pid can dangle just the same as a task pointer, only much worse.
>>
>
> I agree that it can dangle briefly if userspace is using something like thread-pooling to execute VCPUs. I don't see how it can be worse, however. And also note that having it dangling doesn't have the nasty problem that the task pointer does: dereferencing an invalid pointer.
>
> However, that being said: I do see how either of these solutions leave a potential race condition against missing some signals: If a thread that was executing the VCPU changes roles, AND it injects an interrupt before the new thread starts executing...we would miss the signal. This is not a realistic scenario today, but it is a hole. Hmmm..... how does that thread_info stuff work? ;)
>
These corner cases don't need to work well as VMs, they just need not to
be expoitable.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-05-16 12:18 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 3:03 [PATCH 0/8] in-kernel APIC support "v1" Gregory Haskins
[not found] ` <20070509023731.23443.86578.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 3:03 ` [PATCH 1/8] KVM: Adds support for in-kernel mmio handlers Gregory Haskins
[not found] ` <20070509030315.23443.93779.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 9:51 ` [PATCH 1/8] KVM: Adds support for in-kernel mmiohandlers Dor Laor
2007-05-09 3:03 ` [PATCH 2/8] KVM: Add irqdevice object Gregory Haskins
[not found] ` <20070509030320.23443.51197.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-09 15:16 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6157-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-05-09 18:04 ` Gregory Haskins
[not found] ` <4641D4D8.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-09 22:12 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160BBA6471-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-05-09 22:47 ` Gregory Haskins
[not found] ` <4642170B.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-13 12:05 ` Avi Kivity
2007-05-09 3:03 ` [PATCH 3/8] KVM: Adds ability to preempt an executing VCPU Gregory Haskins
[not found] ` <20070509030325.23443.90129.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-14 9:34 ` Avi Kivity
[not found] ` <46482D2E.7040809-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-14 15:19 ` Gregory Haskins
[not found] ` <464845AD.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-14 15:45 ` Avi Kivity
[not found] ` <46488426.8090705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-14 18:19 ` Gregory Haskins
[not found] ` <46486FD4.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-15 7:28 ` Avi Kivity
[not found] ` <46496125.5020909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-15 11:56 ` Gregory Haskins
[not found] ` <4649679C.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-16 10:05 ` Avi Kivity
[not found] ` <464AD772.4050007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-16 12:10 ` Gregory Haskins
[not found] ` <464ABC67.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-16 12:18 ` Avi Kivity
2007-05-09 3:03 ` [PATCH 4/8] KVM: Adds ability to signal userspace using a file-descriptor Gregory Haskins
2007-05-09 3:03 ` [PATCH 5/8] KVM: Add support for in-kernel LAPIC model Gregory Haskins
2007-05-09 3:03 ` [PATCH 6/8] KVM: Adds support for real NMI injection on VMX processors Gregory Haskins
[not found] ` <20070509030340.23443.84153.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-14 9:38 ` Avi Kivity
2007-05-09 3:03 ` [PATCH 7/8] KVM: Adds basic plumbing to support TPR shadow features Gregory Haskins
2007-05-09 3:03 ` [PATCH 8/8] KVM: Adds support for TPR shadowing under VMX processors Gregory Haskins
[not found] ` <20070509030350.23443.35387.stgit-sLgBBP33vUGnsjUZhwzVf9HuzzzSOjJt@public.gmane.org>
2007-05-14 11:09 ` Avi Kivity
[not found] ` <46484376.6090304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-14 15:28 ` Gregory Haskins
2007-05-13 12:02 ` [PATCH 0/8] in-kernel APIC support "v1" Avi Kivity
[not found] ` <4646FE71.5080009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-13 14:09 ` Gregory Haskins
[not found] ` <4646E3D1.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-05-13 15:45 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox