From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gregory Haskins" Subject: KVM in-kernel APIC update Date: Tue, 03 Apr 2007 18:31:53 -0400 Message-ID: <46128F80.BA47.005A.0@novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part200779C9.0__=" To: Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org --=__Part200779C9.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all, Attached is a snapshot of my current efforts on the kernel side for the in-kernel APIC work. Feedback welcome. This patch pretty much only deals explicitly with the LAPIC. We need to make some decisions about the extent of the in-kernel emulation we want to support. For instance, do we bring the entire ISA interrupt subsystem down (e.g. the singleton IOAPIC and dual 8259s) or a subset? The QEMU patch I posted previously should allow us to support any combination. This patch merges the existing kernel-apic branch, and then adds the following *) Support for SVN+VMX breakout. The original code was written back when VMX was the one and only platform. *) Refactored vcpu->irq_XX into an abstract interface. The goal is to simplify the core code away from IRQ handling so we can support both "user-mode" interrupts as well as in-kernel. This allows us to preserve ABI compatibility with older user-space that might still have the userspace IRQ handling. Additionally, this abstraction should be able to make ports to platforms that dont use xAPICs (or a different flavor e.g. IA64 SAPIC) a little easier. *) Cursory NMI handling. The existing code in both the kernel-apic and trunk for KVM doesn't really have the notion of NMIs. Everything uses a flat model with all interrupts being subject to current masking conditions. IIRC, IPIs tend to use NMI vectoring, so I think this will be key later if we want a better modeling of SMP system behavior. Even if they don't, we may want NMIs in the future for a different set of reasons. The new IRQ abstraction has the notion of NMI filtering so that it should be fairly easy to do the right thing at IRQ injection time at some point down the road. Right now this feature of the interface is unused. My current thoughts are that we at least move the IOAPIC into the kernel as well. That will give sufficient control to generate ISA bus interrupts for guests that understand APICs. If we want to be able to generate ISA interrupts for legacy guests which talk to the 8259s that will prove to be insufficient. The good news is that moving the 8259s down as well is probably not a huge deal either, especially since I have already prepped the usermode side. Thoughts? So heres a question for you guys out there. What is the expected use of the in-kernel APIC? My interests lie in the ability to send IPIs for SMP, as well as being able to inject asynchronous hypercall interrupts. I assume there are other reasons too, such as PV device interrupts, etc and I would like to make sure I am seeing the big picture before making any bad design decisions. My question is, how do we expect the PV devices to look from a bus perspective? The current Bochs/QEMU system model paints a fairly simple ISA architecture utilizing a single IOAPIC + dual 8259 setup. Do we expect in-kernel injected IRQs to follow the ISA model (e.g. either legacy or PCI interrupts only limited to IRQ0-15) or do we want to expand on this? The PCI hypercall device introduced a while back would be an example of something ISA based. Alternatives would be to utilize unused "pins" (such as IRQ16-23) on IOAPIC #0, or introducing new an entirely new bus/IOAPICs just for KVM, etc. If the latter, we also need to decide what the resource conveyance model and vector allocation policy should be. For instance, do we publish said resources formally in the MP/ACPI tables in Bochs? Doing so would allow MP/ACPI compliant OSs like linux to naturally route the IRQ. Conversely, do we do something more direct just like we do for KVM discovery via wrmsr? Cheers, -Greg --=__Part200779C9.0__= Content-Type: text/plain; name="kvm-apic.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="kvm-apic.patch" commit e1ff23affe598e90ffe21f5b0942296da89741ee Author: Gregory Haskins Date: Tue Apr 3 16:32:50 2007 -0400 KVM - Baseline support for in-kernel APIC =20 This patch is not yet complete. It is a starting point based upon the = original work done by Dor Laor. There are additional changes on the = kernel side that are needed to complete this work in addition to usermode side changes. =20 Signed-off by: Gregory Haskins diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile index c0a789f..7532eae 100644 --- a/drivers/kvm/Makefile +++ b/drivers/kvm/Makefile @@ -2,7 +2,7 @@ # Makefile for Kernel-based Virtual Machine module # =20 -kvm-objs :=3D kvm_main.o mmu.o x86_emulate.o +kvm-objs :=3D kvm_main.o mmu.o x86_emulate.o kvm_apic.o kvm_userint.o obj-$(CONFIG_KVM) +=3D kvm.o kvm-intel-objs =3D vmx.o obj-$(CONFIG_KVM_INTEL) +=3D kvm-intel.o diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index fceeb84..080cb1b 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -157,6 +157,51 @@ struct vmcs { =20 struct kvm_vcpu; =20 +struct kvm_irqinfo { + int vector; + int nmi; +}; + +#define KVM_IRQFLAGS_NMI (1 << 0) +#define KVM_IRQFLAGS_PEEK (1 << 1) + +struct kvm_irqdevice { + int (*pending)(struct kvm_irqdevice *this, int flags); + int (*read)(struct kvm_irqdevice *this, int flags,=20 + struct kvm_irqinfo *info); + int (*inject)(struct kvm_irqdevice *this, int irq, int flags); + int (*summary)(struct kvm_irqdevice *this, void *data); + void (*destructor)(struct kvm_irqdevice *this); + + void *private; +}; + +#define MAX_APIC_INT_VECTOR 256 + +struct kvm_apic { + u32 status; + u32 vcpu_id; + spinlock_t lock; + u32 pcpu_lock_owner; + atomic_t timer_pending; + u64 apic_base_msr; + unsigned long base_address; + u32 timer_divide_count; + struct hrtimer apic_timer; + int intr_pending_count[MAX_APIC_INT_VECTOR]; + ktime_t timer_last_update; + struct { + int deliver_mode; + int source[6]; + } direct_intr; + u32 err_status; + u32 err_write_count; + struct kvm_vcpu *vcpu; + struct page *regs_page; + void *regs; + struct kvm_irqdevice ext; /* Used for external/NMI interrupts = */ +}; + /* * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and = 2-level * 32-bit). The kvm_mmu structure abstracts the details of the current = mmu @@ -236,6 +281,11 @@ struct kvm_pio_request { int rep; }; =20 +#define KVM_VCPU_INIT_SIPI_SIPI_STATE_NORM 0 +#define KVM_VCPU_INIT_SIPI_SIPI_STATE_WAIT_SIPI 1 + +#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long) + struct kvm_vcpu { struct kvm *kvm; union { @@ -248,12 +298,9 @@ struct kvm_vcpu { u64 host_tsc; struct kvm_run *run; int interrupt_window_open; - unsigned long irq_summary; /* bit vector: 1 per word in irq_pending= */ -#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long) - unsigned long irq_pending[NR_IRQ_WORDS]; unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() = */ unsigned long rip; /* needs vcpu_load_rsp_rip() */ - + struct kvm_irqdevice irq_dev; unsigned long cr0; unsigned long cr2; unsigned long cr3; @@ -261,10 +308,8 @@ struct kvm_vcpu { struct page *para_state_page; gpa_t hypercall_gpa; unsigned long cr4; - unsigned long cr8; u64 pdptrs[4]; /* pae */ u64 shadow_efer; - u64 apic_base; u64 ia32_misc_enable_msr; int nmsrs; struct vmx_msr_entry *guest_msrs; @@ -298,6 +343,11 @@ struct kvm_vcpu { int sigset_active; sigset_t sigset; =20 + struct kvm_apic apic; + wait_queue_head_t halt_wq; + /* For AP startup */ + unsigned long init_sipi_sipi_state; + struct { int active; u8 save_iopl; @@ -319,6 +369,23 @@ struct kvm_mem_alias { gfn_t target_gfn; }; =20 +#define kvm_irq_pending(dev, flags) (dev)->pending(dev, flags) +#define kvm_irq_read(dev, flags, info) (dev)->read(dev, flags, info) +#define kvm_irq_inject(dev, irq, flags) (dev)->inject(dev, irq, flags) +#define kvm_irq_summary(dev, data) (dev)->summary(dev, data) + +#define kvm_vcpu_irq_pending(vcpu, flags) \ + kvm_irq_pending(&vcpu->irq_dev, flags) +#define kvm_vcpu_irq_read(vcpu, flags, info) \ + kvm_irq_read(&vcpu->irq_dev, flags, info) +#define kvm_vcpu_irq_inject(vcpu, irq, flags) \ + kvm_irq_inject(&vcpu->irq_dev, irq, flags) +#define kvm_vcpu_irq_summary(vcpu, data) \ + kvm_irq_summary(&vcpu->irq_dev, data) + + +int kvm_userint_init(struct kvm_vcpu *vcpu); + struct kvm_memory_slot { gfn_t base_gfn; unsigned long npages; @@ -345,6 +412,7 @@ struct kvm { unsigned long rmap_overflow; struct list_head vm_list; struct file *filp; + struct kvm_options options; }; =20 struct kvm_stat { @@ -564,6 +632,13 @@ static inline struct kvm_mmu_page *page_header(hpa_t = shadow_page) return (struct kvm_mmu_page *)page_private(page); } =20 +static inline int vcpu_slot(struct kvm_vcpu *vcpu) +{ + return vcpu - vcpu->kvm->vcpus; +} + +void kvm_crash_guest(struct kvm *kvm); + static inline u16 read_fs(void) { u16 seg; diff --git a/drivers/kvm/kvm_apic.c b/drivers/kvm/kvm_apic.c new file mode 100644 index 0000000..f3e7edc --- /dev/null +++ b/drivers/kvm/kvm_apic.c @@ -0,0 +1,1289 @@ +/* + * kvm_apic.c: Local APIC virtualization + * + * + * Copyright (C) 2006 Qumranet, Inc. + * + * Authors: + * Dor Laor + * Gregory Haskins + * + * Copyright (c) 2004, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify = it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License = for + * more details. + * + * You should have received a copy of the GNU General Public License = along with + * this program; if not, write to the Free Software Foundation, Inc., 59 = Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ + +#include "kvm.h" +#include "kvm_apic.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/*XXX remove this definition after GFW enabled */ +#define APIC_NO_BIOS + +#define PRId64 "d" +#define PRIx64 "llx" +#define PRIu64 "u" +#define PRIo64 "o" + +#define APIC_BUS_CYCLE_NS 1 + +static unsigned int apic_lvt_mask[APIC_LVT_NUM] =3D +{ + LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */ + LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */ + LVT_MASK | APIC_MODE_MASK, /* LVTPC */ + LINT_MASK, LINT_MASK, /* LVT0-1 */ + LVT_MASK /* LVTERR */ +}; + + +#define ASSERT(x) = \ + if (!(x)) { = \ + printk(KERN_EMERG "assertion failed %s: %d: %s\n", = __FILE__, __LINE__, #x);\ + BUG();\ + } + +static int apic_find_highest_irr(struct kvm_apic *apic) +{ + int result; +=09 + result =3D find_highest_bit((unsigned long *)(apic->regs + = APIC_IRR), + MAX_APIC_INT_VECTOR); +=09 + ASSERT( result =3D=3D 0 || result >=3D 16); +=09 + return result; +} + + +static int apic_find_highest_isr(struct kvm_apic *apic) +{ + int result; +=09 + result =3D find_highest_bit((unsigned long *)(apic->regs + = APIC_ISR), + MAX_APIC_INT_VECTOR); +=09 + ASSERT( result =3D=3D 0 || result >=3D 16); +=09 + return result; +} + +static u32 apic_update_ppr(struct kvm_apic *apic) +{ + u32 tpr, isrv, ppr; + int isr; +=09 + tpr =3D kvm_apic_get_reg(apic, APIC_TASKPRI); + isr =3D apic_find_highest_isr(apic); + isrv =3D (isr >> 4) & 0xf; +=09 + if ((tpr >> 4) >=3D isrv) + ppr =3D tpr & 0xff; + else + ppr =3D isrv << 4; /* low 4 bits of PPR have to be = cleared */ +=09 + kvm_apic_set_reg(apic, APIC_PROCPRI, ppr); +=09 + pr_debug("%s: ppr 0x%x, isr 0x%x, isrv 0x%x\n", + __FUNCTION__, ppr, isr, isrv); +=09 + return ppr; +} + +void kvm_apic_update_tpr(struct kvm_apic *apic, unsigned long cr8) +{ + spin_lock_bh(&apic->lock); + kvm_apic_set_reg(apic, APIC_TASKPRI, ((cr8 & 0x0f) << 4)); + apic_update_ppr(apic); + spin_unlock_bh(&apic->lock); +} + +/* + * The function do not need a lock because it's atomic value + */ +unsigned long kvm_apic_read_tpr(struct kvm_apic* apic) +{ + unsigned long tpr =3D (unsigned long)kvm_apic_get_reg(apic, = APIC_TASKPRI); + return (tpr & 0xf0) >> 4; +} +EXPORT_SYMBOL_GPL(kvm_apic_read_tpr); + + +/* + * This only for fixed delivery mode + */ +static int apic_match_dest(struct kvm_vcpu *vcpu, + struct kvm_apic *source, + int short_hand, + int dest, + int dest_mode, + int delivery_mode) +{ + int result =3D 0; + struct kvm_apic *target =3D &vcpu->apic; +=09 + pr_debug("target %p, source %p, dest 0x%x, dest_mode 0x%x, " + "short_hand 0x%x, delivery_mode 0x%x\n", + target, source, dest, dest_mode, short_hand,=20 + delivery_mode); + + if (unlikely(target =3D=3D NULL) && + ((delivery_mode !=3D APIC_DM_INIT) && + (delivery_mode !=3D APIC_DM_STARTUP) && + (delivery_mode !=3D APIC_DM_NMI))) { + + pr_debug("uninitialized target vcpu %p, " + "delivery_mode 0x%x, dest 0x%x.\n",=20 + vcpu, delivery_mode, dest); + return result; + } +=09 + switch (short_hand) { + case APIC_DEST_NOSHORT: /* no shorthand */ + if (!dest_mode) /* Physical */ + result =3D ( ((target !=3D NULL) ? + GET_APIC_ID(kvm_apic_get_reg(target, = APIC_ID)): + vcpu_slot(vcpu))) =3D=3D dest; + else { /* Logical */ + u32 ldr; + if (target =3D=3D NULL) + break; + ldr =3D kvm_apic_get_reg(target, APIC_LDR); +=09 + /* Flat mode */ + if (kvm_apic_get_reg(target, APIC_DFR) =3D=3D = APIC_DFR_FLAT) + result =3D GET_APIC_LOGICAL_ID(ldr) & = dest; + else { + if (delivery_mode =3D=3D APIC_DM_LOWEST && + dest =3D=3D 0xff) { + printk(KERN_ALERT "Broadcast IPI = with lowest priority " + "delivery mode\n"); + kvm_crash_guest(vcpu->kvm); + } + result =3D (GET_APIC_LOGICAL_ID(ldr) = =3D=3D (dest & 0xf)) ? + (GET_APIC_LOGICAL_ID(ldr) >> 4) = & (dest >> 4) : 0; + } + } + break; + + case APIC_DEST_SELF: + if (target =3D=3D source) + result =3D 1; + break; + case APIC_DEST_ALLINC: + result =3D 1; + break; +=09 + case APIC_DEST_ALLBUT: + if (target !=3D source) + result =3D 1; + break; +=09 + default: + break; + } +=09 + return result; +} + +/* + * Add a pending IRQ into lapic. + * Return 1 if successfully added and 0 if discarded. + */ +static int apic_accept_irq(struct kvm_apic *apic, + int delivery_mode, + int vector, + int level, + int trig_mode) +{ + int result =3D 0; +=09 + switch (delivery_mode) { + case APIC_DM_FIXED: + case APIC_DM_LOWEST: + /* FIXME add logic for vcpu on reset */ + if (unlikely(apic =3D=3D NULL || !apic_enabled(apic))) + break; + + if (test_and_set_bit(vector, apic->regs + APIC_IRR) && = trig_mode) { + pr_debug("level trig mode repeatedly for vector = %d\n", + vector); + break; + } + + if (trig_mode) { + pr_debug("level trig mode for vector %d\n", = vector); + set_bit(vector, apic->regs + APIC_TMR); + } + =09 + =09 + /* + * FIXME(kvm) When we'll have smp support we will need to = wakeup + * a sleeping/running vcpu to inject the interrupt to it. + */ + result =3D 1; + break; + + case APIC_DM_REMRD: + case APIC_DM_SMI: + printk(KERN_WARNING "%s: Ignore deliver mode %d\n", = __FUNCTION__, delivery_mode); + break; + case APIC_DM_NMI: + case APIC_DM_EXTINT: + kvm_irq_inject(&apic->ext, vector,=20 + (delivery_mode =3D=3D APIC_DM_NMI) ?=20 + KVM_IRQFLAGS_NMI : 0); + break; + + case APIC_DM_INIT: + if (trig_mode && !(level & APIC_INT_ASSERT)) /* = Deassert */ + printk(KERN_INFO "This kvm_apic is for P4, no work = for De-assert init\n"); + else { + /* FIXME(xen) How to check the situation after = vcpu reset? */ + if (apic->vcpu->launched) { + printk(KERN_ALERT "Reset kvm vcpu not = supported yet\n"); + kvm_crash_guest(apic->vcpu->kvm); + } + apic->vcpu->init_sipi_sipi_state =3D=20 + KVM_VCPU_INIT_SIPI_SIPI_STATE_WAIT_SIPI; + result =3D 1; + } + break; + + case APIC_DM_STARTUP: /* FIXME: currently no support for SMP */ + printk(KERN_ALERT "%s:SMP not supported yet\n", __FUNCTION_= _); + kvm_crash_guest(apic->vcpu->kvm); + break; + + default: + printk(KERN_ALERT "TODO: not support interrupt type %x\n", = delivery_mode); + kvm_crash_guest(apic->vcpu->kvm); + break; + } + + return result; +} + +static int _apic_inject(struct kvm_irqdevice *this, int irq, int flags) +{ + struct kvm_apic *apic =3D (struct kvm_apic*)this->private; + int ret; + int apic_type =3D (flags & KVM_IRQFLAGS_NMI) ? APIC_DM_NMI :=20 + APIC_DM_EXTINT; + + spin_lock_bh(&apic->lock); + ret =3D apic_accept_irq(apic, apic_type, irq, 0, 1); + spin_unlock_bh(&apic->lock); + + return ret; +} + +#if 0 +int kvm_apic_receive_msg(struct kvm_vcpu *vcpu, struct kvm_apic_msg *msg) +{ + pr_debug("%s: vcpu(%d), delivery_mode(%d), vector(%x), trig_mode(%d= )\n", + __FUNCTION__, vcpu_slot(vcpu), msg->delivery_mode, = msg->vector, msg->trig_mode); + + spin_lock_bh(&vcpu->apic.lock); + apic_accept_irq(vcpu->apic, msg->delivery_mode, msg->vector, 1, = msg->trig_mode); + spin_unlock_bh(&vcpu->apic.lock); + + return 0; +} +#endif + + + +/* + * This function is used by both ioapic and local APIC + * The bitmap is for vcpu_id + * Should be called by the ioapic too. + * The implementation is null because there is no SMP support + */ +struct kvm_apic *kvm_apic_round_robin(struct kvm_vcpu *vcpu, + u8 dest_mode, + u8 vector, + u32 bitmap) +{ + if (dest_mode =3D=3D 0) { /* Physical mode */ + pr_debug("%s: lowest priority for physical mode\n", = __FUNCTION__); + return NULL; + } +=09 + if (!bitmap) { + pr_debug("%s no bit set in bitmap.\n", __FUNCTION__); + return NULL; + } +=09 + /* FIXME: add SMP support */ + return &vcpu->apic; +} + +static void apic_EOI_set(struct kvm_apic *apic) +{ + int vector =3D apic_find_highest_isr(apic); +=09 + /* + * Not every write EOI will has corresponding ISR, + * one example is when Kernel check timer on setup_IO_APIC + */ + if (!vector) + return; +=09 + clear_bit(vector, apic->regs + APIC_ISR); + apic_update_ppr(apic); +=09 + clear_bit(vector, apic->regs + APIC_TMR); +} + +static int apic_check_vector(struct kvm_apic *apic,u32 dm, u32 vector) +{ + if (dm =3D=3D APIC_DM_FIXED && vector < 16) { + apic->err_status |=3D 0x40; + apic_accept_irq(apic, APIC_DM_FIXED, + apic_lvt_vector(apic, APIC_LVTERR), 0, 0); + pr_debug("%s: check failed " + " dm %x vector %x\n", __FUNCTION__, dm, vector); + return 0; + } + return 1; +} + +static void apic_ipi(struct kvm_vcpu *vcpu) +{ + struct kvm_apic *apic =3D &vcpu->apic; + u32 icr_low =3D kvm_apic_get_reg(apic, APIC_ICR); + u32 icr_high =3D kvm_apic_get_reg(apic, APIC_ICR2); +=09 + unsigned int dest =3D GET_APIC_DEST_FIELD(icr_high); + unsigned int short_hand =3D icr_low & APIC_SHORT_MASK; + unsigned int trig_mode =3D icr_low & APIC_INT_LEVELTRIG; + unsigned int level =3D icr_low & APIC_INT_ASSERT; + unsigned int dest_mode =3D icr_low & APIC_DEST_MASK; + unsigned int delivery_mode =3D icr_low & APIC_MODE_MASK; + unsigned int vector =3D icr_low & APIC_VECTOR_MASK; +=09 + struct kvm_apic *target; + u32 lpr_map =3D 0; +=09 + pr_debug("icr_high 0x%x, icr_low 0x%x, " + "short_hand 0x%x, dest 0x%x, trig_mode 0x%x, = level 0x%x, " + "dest_mode 0x%x, delivery_mode 0x%x, vector = 0x%x\n", + icr_high, icr_low, short_hand, dest, + trig_mode, level, dest_mode, delivery_mode, = vector); +=09 + if (apic_match_dest(vcpu, apic, short_hand, dest, dest_mode, = delivery_mode)) { + if (delivery_mode =3D=3D APIC_DM_LOWEST) + set_bit(vcpu_slot(vcpu), &lpr_map); + else + apic_accept_irq(apic, delivery_mode, + vector, level, trig_mode); + } + =20 + if (delivery_mode =3D=3D APIC_DM_LOWEST) { + /* Currently only UP is supported so target =3D=3D apic */ + target =3D kvm_apic_round_robin(vcpu, dest_mode, vector, = lpr_map); +=09 + if (target)=20 + apic_accept_irq(target, delivery_mode, + vector, level, trig_mode); + } +} + +static u32 apic_get_tmcct(struct kvm_apic *apic) +{ + u32 counter_passed; + ktime_t passed, now =3D apic->apic_timer.base->get_time(); + u32 tmcct =3D kvm_apic_get_reg(apic, APIC_TMCCT); +=09 + ASSERT(apic !=3D NULL); +=09 + if (unlikely(ktime_to_ns(now) <=3D ktime_to_ns(apic->timer_last_upd= ate))) { + /* Wrap around */ + passed =3D ktime_add( + ({ (ktime_t){ .tv64 =3D KTIME_MAX - (apic->timer_la= st_update).tv64 }; }), + now); + pr_debug("time elapsed\n"); + } else + passed =3D ktime_sub(now, apic->timer_last_update); + + counter_passed =3D ktime_to_ns(passed) /=20 + (APIC_BUS_CYCLE_NS * apic->timer_divide_count); + tmcct -=3D counter_passed; +=09 + if (tmcct <=3D 0) { + if (unlikely(!apic_lvtt_period(apic))) { + tmcct =3D 0; + } else { + do { + tmcct +=3D kvm_apic_get_reg(apic, = APIC_TMICT); + } while ( tmcct <=3D 0 ); + } + } +=09 + apic->timer_last_update =3D now; + kvm_apic_set_reg(apic, APIC_TMCCT, tmcct); +=09 + return tmcct; +} + +static void kvm_apic_read_aligned(struct kvm_apic *apic, + unsigned int offset, + unsigned int len, + unsigned int *result) +{ + ASSERT(len =3D=3D 4 && offset > 0 && offset <=3D APIC_TDCR); + *result =3D 0; +=09 + switch (offset) { + case APIC_ARBPRI: + printk(KERN_WARNING "access local APIC ARBPRI register = which is for P6\n"); + break; +=09 + case APIC_TMCCT: /* Timer CCR */ + *result =3D apic_get_tmcct(apic); + break; +=09 + case APIC_ESR: + apic->err_write_count =3D 0; + *result =3D kvm_apic_get_reg(apic, offset); + break; +=09 + default: + *result =3D kvm_apic_get_reg(apic, offset); + break; + } +} + +static unsigned long __kvm_apic_read(struct kvm_vcpu *vcpu, + unsigned long address, + unsigned long len) +{ + unsigned int alignment; + unsigned int tmp; + unsigned long result; + struct kvm_apic *apic =3D &vcpu->apic; + unsigned int offset =3D address - apic->base_address; +=09 + if (offset > APIC_TDCR) + return 0; +=09 + /* some bugs on kernel cause read this with byte*/ + if (len !=3D 4) + pr_debug("read with len=3D0x%lx, should be 4 instead.\n", = len); +=09 + alignment =3D offset & 0x3; +=09 + kvm_apic_read_aligned(apic, offset & ~0x3, 4, &tmp); + switch (len) { + case 1: + result =3D *((unsigned char *)&tmp + alignment); + break; +=09 + case 2: + ASSERT(alignment !=3D 3); + result =3D *(unsigned short *)((unsigned char *)&tmp + = alignment); + break; +=09 + case 4: + ASSERT(alignment =3D=3D 0); + result =3D *(unsigned int *)((unsigned char *)&tmp + = alignment); + break; +=09 + default: + printk(KERN_ALERT "Local APIC read with len=3D0x%lx, = should be 4 instead.\n", len); + kvm_crash_guest(vcpu->kvm); + result =3D 0; /* to make gcc happy */ + break; + } +=09 + pr_debug("%s: offset 0x%x with length 0x%lx, " + "and the result is 0x%lx\n", __FUNCTION__, = offset, len, result); +=09 + return result; +} + +unsigned long kvm_apic_read(struct kvm_vcpu *vcpu, + unsigned long address, + unsigned long len) +{ + unsigned long result; + + spin_lock_bh(&vcpu->apic.lock); + result =3D __kvm_apic_read(vcpu, address, len); + spin_unlock_bh(&vcpu->apic.lock); + + return result; +} + +void kvm_apic_write(struct kvm_vcpu *vcpu, + unsigned long address, + unsigned long len, + unsigned long val) +{ + struct kvm_apic *apic =3D &vcpu->apic; + unsigned int offset =3D address - apic->base_address; + + spin_lock_bh(&apic->lock); + + /* too common printing */ + if (offset !=3D APIC_EOI) + pr_debug("%s: offset 0x%x with length 0x%lx, and value is = 0x%lx\n", + __FUNCTION__, offset, len, val); + + /* + * According to IA 32 Manual, all registers should be accessed = with + * 32 bits alignment. + */ + if (len !=3D 4) { + unsigned int tmp; + unsigned char alignment; + =09 + /* Some kernels do will access with byte/word alignment */ + pr_debug("Notice: Local APIC write with len =3D %lx\n",len)= ; + alignment =3D offset & 0x3; + tmp =3D __kvm_apic_read(vcpu, offset & ~0x3, 4); + switch (len) { + case 1: + /* + * XXX the saddr is a tmp variable from caller, so = should be ok + * But we should still change the following ref to = val to + * local variable later + */ + val =3D (tmp & ~(0xff << (8*alignment))) | + ((val & 0xff) << (8*alignment)); + break; +=09 + case 2: + if (alignment !=3D 0x0 && alignment !=3D 0x2) { + printk(KERN_ALERT "alignment error for = apic with len =3D=3D 2\n"); + kvm_crash_guest(vcpu->kvm); + } + =09 + val =3D (tmp & ~(0xffff << (8*alignment))) | + ((val & 0xffff) << (8*alignment)); + break; +=09 + case 3: + /* will it happen? */ + printk(KERN_ALERT "apic_write with len =3D 3 = !!!\n"); + kvm_crash_guest(vcpu->kvm); + break; +=09 + default: + printk(KERN_ALERT "Local APIC write with len =3D = %lx, should be 4 instead\n", len); + kvm_crash_guest(vcpu->kvm); + break; + } + } + + offset &=3D 0xff0; + + switch (offset) { + case APIC_ID: /* Local APIC ID */ + kvm_apic_set_reg(apic, APIC_ID, val); + break; + + case APIC_TASKPRI: + kvm_apic_set_reg(apic, APIC_TASKPRI, val & 0xff); + apic_update_ppr(apic); + break; + + case APIC_EOI: + apic_EOI_set(apic); + break; + + case APIC_LDR: + kvm_apic_set_reg(apic, APIC_LDR, val & APIC_LDR_MASK); + break; + + case APIC_DFR: + kvm_apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF); + break; + + case APIC_SPIV: + kvm_apic_set_reg(apic, APIC_SPIV, val & 0x3ff); + if (!(val & APIC_SPIV_APIC_ENABLED)) { + int i; + u32 lvt_val; + + apic->status |=3D APIC_SOFTWARE_DISABLE_MASK; + for (i =3D 0; i < APIC_LVT_NUM; i++) { + lvt_val =3D kvm_apic_get_reg(apic, = APIC_LVTT + 0x10 * i); + kvm_apic_set_reg(apic, APIC_LVTT + 0x10 * = i, + lvt_val | APIC_LVT_MASKED)= ; + } + + if ((kvm_apic_get_reg(apic, APIC_LVT0) & APIC_MODE_= MASK) + =3D=3D APIC_DM_EXTINT) + clear_bit(_APIC_BSP_ACCEPT_PIC, &apic->stat= us); + } else { + apic->status &=3D ~APIC_SOFTWARE_DISABLE_MASK; + if ((kvm_apic_get_reg(apic, APIC_LVT0) & APIC_MODE_= MASK) + =3D=3D APIC_DM_EXTINT) + set_bit(_APIC_BSP_ACCEPT_PIC, &apic->status= ); + } + break; + + case APIC_ESR: + apic->err_write_count =3D !apic->err_write_count; + if (!apic->err_write_count) + apic->err_status =3D 0; + break; + + case APIC_ICR: + /* No delay here, so we always clear the pending bit*/ + kvm_apic_set_reg(apic, APIC_ICR, val & ~(1 << 12)); + apic_ipi(vcpu); + break; + + case APIC_ICR2: + kvm_apic_set_reg(apic, APIC_ICR2, val & 0xff000000); + break; + + case APIC_LVTT: + case APIC_LVTTHMR: + case APIC_LVTPC: + case APIC_LVT0: + case APIC_LVT1: + case APIC_LVTERR: + { + if (apic->status & APIC_SOFTWARE_DISABLE_MASK) + val |=3D APIC_LVT_MASKED; + + val &=3D apic_lvt_mask[(offset - APIC_LVTT) >> 4]; + kvm_apic_set_reg(apic, offset, val); + + /* On hardware, when write vector less than 0x20 will = error */ + if (!(val & APIC_LVT_MASKED)) + apic_check_vector(apic, apic_lvt_dm(apic, offset), + apic_lvt_vector(apic, offset)); + if (!vcpu_slot(vcpu) && (offset =3D=3D APIC_LVT0)) { + if ((val & APIC_MODE_MASK) =3D=3D APIC_DM_EXTINT) + if (val & APIC_LVT_MASKED) + clear_bit(_APIC_BSP_ACCEPT_PIC, = &apic->status); + else + set_bit(_APIC_BSP_ACCEPT_PIC, = &apic->status); + else + clear_bit(_APIC_BSP_ACCEPT_PIC, &apic->stat= us); + } + } + break; + + case APIC_TMICT: + { + ktime_t now =3D apic->apic_timer.base->get_time(); + u32 offset; + + kvm_apic_set_reg(apic, APIC_TMICT, val); + kvm_apic_set_reg(apic, APIC_TMCCT, val); + apic->timer_last_update =3D now; + offset =3D APIC_BUS_CYCLE_NS * apic->timer_divide_count * = val; + + /* Make sure the lock ordering is coherent */ + spin_unlock_bh(&apic->lock); + hrtimer_cancel(&apic->apic_timer); + hrtimer_start(&apic->apic_timer, ktime_add_ns(now, = offset), HRTIMER_ABS); + + pr_debug("%s: bus cycle is %"PRId64"ns, now 0x%016"PRIx64",= " + "timer initial count 0x%x, offset 0x%x, = " + "expire @ 0x%016"PRIx64".\n", __FUNCTION_= _, + APIC_BUS_CYCLE_NS, ktime_to_ns(now), + kvm_apic_get_reg(apic, APIC_TMICT), + offset, ktime_to_ns(ktime_add_ns(now, = offset))); + } + return; + + case APIC_TDCR: + { + unsigned int tmp1, tmp2; + + tmp1 =3D val & 0xf; + tmp2 =3D ((tmp1 & 0x3) | ((tmp1 & 0x8) >> 1)) + 1; + apic->timer_divide_count =3D 0x1 << (tmp2 & 0x7); + + kvm_apic_set_reg(apic, APIC_TDCR, val); + + pr_debug("timer divide count is 0x%x\n", + apic->timer_divide_count); + } + break; + + default: + printk(KERN_WARNING "Local APIC Write to read-only = register\n"); + break; + } + + spin_unlock_bh(&apic->lock); +} + +int kvm_apic_range(struct kvm_vcpu *vcpu, unsigned long addr) +{ + struct kvm_apic *apic =3D &vcpu->apic; +=09 + spin_lock_bh(&apic->lock); + + if (apic_global_enabled(apic) && + (addr >=3D apic->base_address) && + (addr < apic->base_address + VLOCAL_APIC_MEM_LENGTH)) { + spin_unlock_bh(&apic->lock); + return 1; + } + spin_unlock_bh(&apic->lock); +=09 + return 0; +} + +void kvm_apic_msr_set(struct kvm_apic *apic, u64 value) +{ + /* When apic disabled */ + if (apic =3D=3D NULL) + return; + + spin_lock_bh(&apic->lock); + if (apic->vcpu_id ) + value &=3D ~MSR_IA32_APICBASE_BSP; +=09 + apic->apic_base_msr =3D value; + apic->base_address =3D apic->apic_base_msr & MSR_IA32_APICBASE_BASE= ; +=09 + /* with FSB delivery interrupt, we can restart APIC functionality = */ + if (!(value & MSR_IA32_APICBASE_ENABLE)) + set_bit(_APIC_GLOB_DISABLE, &apic->status); + else + clear_bit(_APIC_GLOB_DISABLE, &apic->status); +=09 + pr_debug("apic base msr is 0x%016"PRIx64", and base address is = 0x%lx.\n", + apic->apic_base_msr, apic->base_address); + + spin_unlock_bh(&apic->lock); +} + +static int __apic_timer_fn(struct kvm_apic *apic) +{ + struct kvm_vcpu *vcpu; + u32 timer_vector; + ktime_t now; + int result =3D HRTIMER_NORESTART; + + if (unlikely(!apic_enabled(apic) || !apic_lvt_enabled(apic, = APIC_LVTT))) { + pr_debug("%s: time interrupt although apic is down\n", = __FUNCTION__); + return HRTIMER_NORESTART; + } +=09 + vcpu =3D apic->vcpu; + timer_vector =3D apic_lvt_vector(apic, APIC_LVTT); + now =3D apic->apic_timer.base->get_time(); + apic->timer_last_update =3D now; +=09 + if (test_and_set_bit(timer_vector, apic->regs + APIC_IRR)) { + apic->intr_pending_count[timer_vector]++; + pr_debug("%s: increasing intr_pending_count to %d\n" , = __FUNCTION__, apic->intr_pending_count[timer_vector]); + } + + if (apic_lvtt_period(apic)) { + u32 offset; + u32 tmict =3D kvm_apic_get_reg(apic, APIC_TMICT); + =20 + kvm_apic_set_reg(apic, APIC_TMCCT, tmict); + offset =3D APIC_BUS_CYCLE_NS * apic->timer_divide_count * = tmict; + =20 + result =3D HRTIMER_RESTART; + apic->apic_timer.expires =3D ktime_add_ns(now, offset); + =09 + pr_debug("%s: now 0x%016"PRIx64", expire @ 0x%016"PRIx64", = " + "timer initial count 0x%x, timer current count = 0x%x.\n", + __FUNCTION__, + ktime_to_ns(now), ktime_to_ns(apic->apic_timer.expir= es), + kvm_apic_get_reg(apic, APIC_TMICT), + kvm_apic_get_reg(apic, APIC_TMCCT)); + } else { + kvm_apic_set_reg(apic, APIC_TMCCT, 0); + pr_debug("%s: now 0x%016"PRIx64", " + "timer initial count 0x%x, timer current count = 0x%x.\n", + __FUNCTION__, + ktime_to_ns(now), kvm_apic_get_reg(apic, APIC_TMICT)= , + kvm_apic_get_reg(apic, APIC_TMCCT)); + } +=09 + wake_up_interruptible(&vcpu->halt_wq); + return result; +} + +void apic_check_pending_timer(struct kvm_apic *apic) +{ + int timer_restart; + + if (!apic || !apic_enabled(apic)) + return; + + if (!atomic_read(&apic->timer_pending)) + return; + + atomic_dec(&apic->timer_pending); + timer_restart =3D __apic_timer_fn(apic); + if (timer_restart =3D=3D HRTIMER_RESTART) { + pr_debug("%s restarting timer\n", __FUNCTION__); + hrtimer_start(&apic->apic_timer, apic->apic_timer.expires, = HRTIMER_ABS); + } +} +EXPORT_SYMBOL_GPL(apic_check_pending_timer); + +static void receive_apic_ipi(void *arg) +{ + int cpu =3D smp_processor_id(); + + pr_debug("%s: cpu(%d)\n", __FUNCTION__, cpu); +} + +static int apic_timer_fn(struct hrtimer* timer) +{ + struct kvm_apic *apic; + int restart_timer =3D HRTIMER_NORESTART; + u32 apic_lock_owner; + + apic =3D container_of(timer, struct kvm_apic, apic_timer); + + while (!spin_trylock_bh(&apic->lock)) { + /* + * Send an IPI in order cause vmexit for the lock holder + * The IPI receiver will handle the timer function + */ + if ((apic_lock_owner =3D apic->pcpu_lock_owner) !=3D -1) { + BUG_ON(smp_processor_id() =3D=3D apic_lock_owner); + pr_debug("%s: cpu(%d) send ipi to %d\n", + __FUNCTION__, + smp_processor_id(), + apic_lock_owner); + + atomic_inc(&apic->timer_pending); + smp_call_function_single(apic_lock_owner, + receive_apic_ipi, + apic, 0, 1); + return restart_timer; + } + cpu_relax(); + } + + restart_timer =3D __apic_timer_fn(apic); + spin_unlock_bh(&apic->lock); + + return restart_timer; +} + +static int apic_read_irr(struct kvm_apic *apic) +{ + if (apic && apic_enabled(apic)) { + int highest_irr =3D apic_find_highest_irr(apic); +=09 + if ((highest_irr & 0xf0) > kvm_apic_get_reg(apic, = APIC_PROCPRI)) { + if (highest_irr < 0x10) { + u32 err_vector; +=09 + apic->err_status |=3D 0x20; + err_vector =3D apic_lvt_vector(apic, = APIC_LVTERR); +=09 + pr_debug("Sending an illegal vector = 0x%x.\n", highest_irr); +=09 + set_bit(err_vector, apic->regs + APIC_IRR);= + highest_irr =3D err_vector; + } +=09 + return highest_irr; + } + } + + return 0; +} + +static void apic_post_interrupt(struct kvm_apic *apic, int vector, int = deliver_mode) +{ + if (unlikely(apic =3D=3D NULL)) + return; +=09 + switch (deliver_mode) { + case APIC_DM_FIXED: + case APIC_DM_LOWEST: + set_bit(vector, apic->regs + APIC_ISR); + clear_bit(vector, apic->regs + APIC_IRR); + apic_update_ppr(apic); +=09 + if (vector =3D=3D apic_lvt_vector(apic, APIC_LVTT)) { + apic->intr_pending_count[vector]--; + if (apic->intr_pending_count[vector] > 0) + test_and_set_bit(vector, apic->regs + = APIC_IRR); + } + break; +=09 + /*XXX deal with these later */ + case APIC_DM_REMRD: + pr_debug("%s: Ignore deliver mode %d \n", __FUNCTION__, = deliver_mode); + break; +=09 + case APIC_DM_SMI: + case APIC_DM_NMI: + case APIC_DM_INIT: + case APIC_DM_STARTUP: + case APIC_DM_EXTINT: + apic->direct_intr.deliver_mode &=3D (1 << (deliver_mode >> = 8)); + break; +=09 + default: + pr_debug("%s: invalid deliver mode\n", __FUNCTION__); + break; + } +} + +static int apic_pending_irr(struct kvm_apic *apic) +{ + if (apic && apic_enabled(apic)) { + int highest_irr =3D apic_find_highest_irr(apic); +=09 + if ((highest_irr & 0xF0) > kvm_apic_get_reg(apic, = APIC_PROCPRI)) + return 1; + } + return 0; +} + +static int _apic_pending(struct kvm_irqdevice *this, int flags) +{ + struct kvm_apic *apic =3D (struct kvm_apic*)this->private; + + if(!(flags & KVM_IRQFLAGS_NMI) && apic_pending_irr(apic)) + return 1; + + if(kvm_irq_pending(&apic->ext, flags)) + return 1; +=09 + return 0; +} + +static int _apic_read(struct kvm_irqdevice *this, int flags,=20 + struct kvm_irqinfo *info) +{ + struct kvm_apic *apic =3D (struct kvm_apic*)this->private; + + int state =3D 0; + int vector; + + /* We consider the IRR if the APIC is present, enabled, and we=20 + are not filtering out based on NMI + */ + if(!(flags & KVM_IRQFLAGS_NMI) && apic_pending_irr(apic)) + state |=3D 1; + if(kvm_irq_pending(&apic->ext, flags)) + state |=3D 2; + + switch(state) { + case 0: {=20 + /* No interrupts are pending */ + return 0; + } + case 1: { + /* Only APIC interrupts are pending */ + vector =3D apic_read_irr(apic); + apic_post_interrupt(apic, vector, APIC_DM_FIXED); + if(info) { + info->vector =3D vector; + info->nmi =3D 0; + } + return vector; + } + case 2: { + /* Only external/NMI interrupts are pending */ + return kvm_irq_read(&apic->ext, flags, info); + } + case 3: { + /* We have a conflict. Figure out which is higher pri */ + int highest_apic_irq =3D apic_read_irr(apic); + int highest_ext_irq =3D kvm_irq_read(&apic->ext,=20 + KVM_IRQFLAGS_PEEK,=20 + info); + if(highest_apic_irq > highest_ext_irq) { + apic_post_interrupt(apic, highest_apic_irq,=20 + APIC_DM_FIXED); + if(info) { + info->vector =3D highest_apic_irq; + info->nmi =3D 0; + } + return highest_apic_irq; + } else { + return kvm_irq_read(&apic->ext, flags, info); + } + } + } + + return 0; +} + +static void kvm_apic_dump_state(struct kvm_apic *apic) +{ + u64 *tmp; + + printk(KERN_INFO "%s begin\n", __FUNCTION__); +=09 + printk(KERN_INFO "status =3D 0x%08x\n", apic->status); + printk(KERN_INFO "apic_base_msr=3D0x%016llx, apicbase =3D = 0x%08lx\n", apic->apic_base_msr, apic->base_address); +=09 + tmp =3D (u64*)(apic->regs + APIC_IRR); + printk(KERN_INFO "IRR =3D 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n= ", tmp[3], tmp[2], tmp[1], tmp[0]); + tmp =3D (u64*)(apic->regs + APIC_ISR); + printk(KERN_INFO "ISR =3D 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n= ", tmp[3], tmp[2], tmp[1], tmp[0]); + tmp =3D (u64*)(apic->regs + APIC_TMR); + printk(KERN_INFO "TMR =3D 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n= ", tmp[3], tmp[2], tmp[1], tmp[0]); + + printk(KERN_INFO "APIC_ID=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_ID)); + printk(KERN_INFO "APIC_TASKPRI=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_TASKPRI) & 0xff); + printk(KERN_INFO "APIC_PROCPRI=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_PROCPRI)); +=09 + printk(KERN_INFO "APIC_DFR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_DFR) | 0x0FFFFFFF); + printk(KERN_INFO "APIC_LDR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LDR) & APIC_LDR_MASK); + printk(KERN_INFO "APIC_SPIV=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_SPIV) & 0x3ff); + printk(KERN_INFO "APIC_ESR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_ESR)); + printk(KERN_INFO "APIC_ICR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_ICR) & ~(1 << 12)); + printk(KERN_INFO "APIC_ICR2=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_ICR2) & 0xff000000); + + printk(KERN_INFO "APIC_LVTERR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVTERR)); + printk(KERN_INFO "APIC_LVT1=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVT1)); + printk(KERN_INFO "APIC_LVT0=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVT0)); + printk(KERN_INFO "APIC_LVTPC=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVTPC)); + printk(KERN_INFO "APIC_LVTTHMR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVTTHMR)); + printk(KERN_INFO "APIC_LVTT=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_LVTT)); +=09 + printk(KERN_INFO "APIC_TMICT=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_TMICT)); + printk(KERN_INFO "APIC_TDCR=3D0x%08x\n", kvm_apic_get_reg(apic, = APIC_TDCR)); + =20 + printk(KERN_INFO "%s end\n", __FUNCTION__); +} + +#if 0 +/* + * kvm_apic_set not need to be locked with the apic->lock since it is = called when the + * guest is stopped. + */ +int kvm_apic_set(struct kvm_apic *apic, struct kvm_apic_state* as) +{ + pr_debug("%s begin\n", __FUNCTION__); +=09 + apic->vcpu_id =3D as->vcpu; + apic->status =3D as->status; + kvm_apic_reset(apic); + kvm_apic_msr_set(apic, as->apicbase); + memset(&apic->intr_pending_count, 0, sizeof(int) * MAX_APIC_INT_VEC= TOR); + memcpy((void*)(apic->regs + APIC_IRR), &as->irr, sizeof(as->irr)); + memcpy((void*)(apic->regs + APIC_ISR), &as->isr, sizeof(as->isr)); + memcpy((void*)(apic->regs + APIC_TMR), &as->tmr, sizeof(as->tmr)); + kvm_apic_set_reg(apic, APIC_ID, as->id); + kvm_apic_set_reg(apic, APIC_TASKPRI, as->tpr & 0xff); + kvm_apic_set_reg(apic, APIC_PROCPRI, as->ppr); +=09 + kvm_apic_set_reg(apic, APIC_DFR, as->dfr | 0x0FFFFFFF); + kvm_apic_set_reg(apic, APIC_LDR, as->ldr & APIC_LDR_MASK); + kvm_apic_set_reg(apic, APIC_SPIV, as->spurious_vec & 0x3ff); + kvm_apic_set_reg(apic, APIC_ESR, as->esr); + kvm_apic_set_reg(apic, APIC_ICR, as->icr[0] & ~(1 << 12)); + kvm_apic_set_reg(apic, APIC_ICR2, as->icr[1] & 0xff000000); + + kvm_apic_set_reg(apic, APIC_LVTERR, as->lvterr); + kvm_apic_set_reg(apic, APIC_LVT1, as->lvt1); + kvm_apic_set_reg(apic, APIC_LVT0, as->lvt0); + kvm_apic_set_reg(apic, APIC_LVTPC, as->lvtpc); + kvm_apic_set_reg(apic, APIC_LVTTHMR, as->lvtthmr); + kvm_apic_set_reg(apic, APIC_LVTT, as->lvtt); +=09 + kvm_apic_write(apic->vcpu, (unsigned long)(apic->regs + APIC_TDCR),= 4, as->divide_conf); + kvm_apic_write(apic->vcpu, (unsigned long)(apic->regs + APIC_TMICT)= , 4, as->initial_count); + + pr_debug("%s end\n", __FUNCTION__); + kvm_apic_dump_state(apic); + return 0; +} + +int kvm_apic_get(struct kvm_apic *apic, struct kvm_apic_state* as) +{ + pr_debug("%s begin apic=3D%p\n", __FUNCTION__, apic); + + spin_lock_bh(&apic->lock); + as->status =3D apic->status; + as->apicbase =3D apic->apic_base_msr; + memcpy(as->irr, (void*)(apic->regs + APIC_IRR), sizeof(as->irr)); + memcpy(as->isr, (void*)(apic->regs + APIC_ISR), sizeof(as->isr)); + memcpy(as->tmr, (void*)(apic->regs + APIC_TMR), sizeof(as->tmr)); + + as->id =3D kvm_apic_get_reg(apic, APIC_ID); + as->tpr =3D kvm_apic_get_reg(apic, APIC_TASKPRI) & 0xff; + as->ppr =3D kvm_apic_get_reg(apic, APIC_PROCPRI); + =20 + as->dfr =3D kvm_apic_get_reg(apic, APIC_DFR) | 0x0FFFFFFF; + as->ldr =3D kvm_apic_get_reg(apic, APIC_LDR) & APIC_LDR_MASK; + as->spurious_vec =3D kvm_apic_get_reg(apic, APIC_SPIV) & 0x3ff; + as->esr =3D kvm_apic_get_reg(apic, APIC_ESR); + as->icr[0] =3D kvm_apic_get_reg(apic, APIC_ICR) & ~(1 << 12); + as->icr[1] =3D kvm_apic_get_reg(apic, APIC_ICR2) & 0xff000000; + + as->lvterr =3D kvm_apic_get_reg(apic, APIC_LVTERR); + as->lvt1 =3D kvm_apic_get_reg(apic, APIC_LVT1); + as->lvt0 =3D kvm_apic_get_reg(apic, APIC_LVT0); + as->lvtpc =3D kvm_apic_get_reg(apic, APIC_LVTPC); + as->lvtthmr =3D kvm_apic_get_reg(apic, APIC_LVTTHMR); + as->lvtt =3D kvm_apic_get_reg(apic, APIC_LVTT); + as->initial_count =3D kvm_apic_get_reg(apic, APIC_TMICT); + as->divide_conf =3D kvm_apic_get_reg(apic, APIC_TDCR); + =20 + kvm_apic_dump_state(apic); + + spin_unlock_bh(&apic->lock); + return 0; +} +#endif + +int kvm_apic_reset(struct kvm_apic *apic) +{ + struct kvm_vcpu *vcpu; + int i; + + printk(KERN_INFO "%s\n", __FUNCTION__); + ASSERT(apic !=3D NULL); + vcpu =3D apic->vcpu; + ASSERT(vcpu !=3D NULL); +=09 + + /* Stop the timer in case it's a reset an active apic */ + if (apic->apic_timer.function) + hrtimer_cancel(&apic->apic_timer); + + spin_lock_bh(&apic->lock); + + kvm_apic_set_reg(apic, APIC_ID, vcpu_slot(vcpu) << 24); + kvm_apic_set_reg(apic, APIC_LVR, APIC_VERSION); + + for (i =3D 0; i < APIC_LVT_NUM; i++) + kvm_apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKE= D); + + kvm_apic_set_reg(apic, APIC_DFR, 0xffffffffU); + kvm_apic_set_reg(apic, APIC_SPIV, 0xff);=20 + kvm_apic_set_reg(apic, APIC_TASKPRI, 0); + kvm_apic_set_reg(apic, APIC_LDR, 0); + kvm_apic_set_reg(apic, APIC_ESR, 0); + kvm_apic_set_reg(apic, APIC_ICR, 0); + kvm_apic_set_reg(apic, APIC_ICR2, 0); + kvm_apic_set_reg(apic, APIC_TDCR, 0); + kvm_apic_set_reg(apic, APIC_TMICT, 0); + memset((void*)(apic->regs + APIC_IRR), 0, KVM_IRQ_BITMAP_SIZE(u8));= + memset((void*)(apic->regs + APIC_ISR), 0, KVM_IRQ_BITMAP_SIZE(u8));= + memset((void*)(apic->regs + APIC_TMR), 0, KVM_IRQ_BITMAP_SIZE(u8));= +=09 + apic->apic_base_msr =3D MSR_IA32_APICBASE_ENABLE | APIC_DEFAULT_PHY= S_BASE; + if (vcpu_slot(vcpu) =3D=3D 0) + apic->apic_base_msr |=3D MSR_IA32_APICBASE_BSP; + apic->base_address =3D apic->apic_base_msr & MSR_IA32_APICBASE_BAS= E; + + hrtimer_init(&apic->apic_timer, CLOCK_MONOTONIC, HRTIMER_ABS); + apic->apic_timer.function =3D apic_timer_fn; + apic->timer_divide_count =3D 0; + apic->status =3D 0; + apic->pcpu_lock_owner =3D -1; + memset(&apic->intr_pending_count, 0, sizeof(int) * MAX_APIC_INT_VE= CTOR); + +#ifdef APIC_NO_BIOS + /* + * XXX According to mp specific, BIOS will enable LVT0/1, + * remove it after BIOS enabled + */ + if (!vcpu_slot(vcpu)) { + kvm_apic_set_reg(apic, APIC_LVT0, APIC_MODE_EXTINT << 8); + kvm_apic_set_reg(apic, APIC_LVT1, APIC_MODE_NMI << 8); + set_bit(_APIC_BSP_ACCEPT_PIC, &apic->status); + } +#endif + + spin_unlock_bh(&apic->lock); + + printk(KERN_INFO "%s: vcpu=3D%p, id=3D%d, apic_apic_base_msr=3D0x%= 016"PRIx64", " + "base_address=3D0x%0lx.\n", + __FUNCTION__, vcpu, GET_APIC_ID(kvm_apic_get_reg= (apic, APIC_ID)), + apic->apic_base_msr, apic->base_address); +=09 + return 1; +} + +static int _apic_summary(struct kvm_irqdevice *this, void *data) +{ + struct kvm_apic *apic =3D (struct kvm_apic*)this->private; +=09 + /* FIXME */ + return kvm_irq_summary(&apic->ext, data); +} + +/* + * Should be called with the apic lock held + */ +static void _apic_destructor(struct kvm_irqdevice *this) +{ + struct kvm_apic *apic =3D (struct kvm_apic*)this->private; + + if (apic->regs_page) { + if (apic->apic_timer.function) + hrtimer_cancel(&apic->apic_timer); + __free_page(apic->regs_page); + apic->regs_page =3D 0; + } +} + +/* + * Should be called by vcpu_setup + */ +int kvm_apic_init(struct kvm_vcpu *vcpu) +{ + struct kvm_irqdevice *dev =3D &vcpu->irq_dev; + + dev->pending =3D _apic_pending; + dev->read =3D _apic_read; + dev->inject =3D _apic_inject; + dev->summary =3D _apic_summary; + dev->destructor =3D _apic_destructor; + + dev->private =3D &vcpu->apic; + + struct kvm_apic *apic =3D &vcpu->apic; +=09 + ASSERT(vcpu !=3D NULL); + pr_debug("apic_init %d\n", vcpu_slot(vcpu)); +=09 + apic->regs_page =3D alloc_page(GFP_KERNEL); + if ( apic->regs_page =3D=3D NULL ) { + printk(KERN_ALERT "malloc apic regs error for vcpu %x\n", = vcpu_slot(vcpu)); + return -ENOMEM; + } + apic->regs =3D page_address(apic->regs_page); + memset(apic->regs, 0, PAGE_SIZE); + + apic->vcpu =3D vcpu; + spin_lock_init(&apic->lock); +=09 + kvm_apic_reset(apic); + return 0; +} + + diff --git a/drivers/kvm/kvm_apic.h b/drivers/kvm/kvm_apic.h new file mode 100644 index 0000000..b5e6d72 --- /dev/null +++ b/drivers/kvm/kvm_apic.h @@ -0,0 +1,142 @@ +/* + * kvm_apic.h: Local APIC virtualization=20 + * + * Copyright (C) 2006 Qumranet, Inc. + * + * Authors: + * Dor Laor + * + * Copyright (c) 2004, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify = it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License = for + * more details. + * + * You should have received a copy of the GNU General Public License = along with + * this program; if not, write to the Free Software Foundation, Inc., 59 = Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef __KVM_APIC_H__ +#define __KVM_APIC_H__ + +#include "kvm.h" +#include +#include + +static __inline__ int find_highest_bit(unsigned long *data, int nr_bits) +{ + int length =3D BITS_TO_LONGS(nr_bits); + while (length && !data[--length]) + continue; + return __ffs(data[length]) + (length * BITS_PER_LONG); +} + +#define APIC_LVT_NUM 6 +/* 14 is the version for Xeon and Pentium 8.4.8*/ +#define APIC_VERSION (0x14UL | ((APIC_LVT_NUM - 1) << = 16)) +#define VLOCAL_APIC_MEM_LENGTH (1 << 12) +/* followed define is not in apicdef.h */ +#define APIC_SHORT_MASK 0xc0000 +#define APIC_DEST_NOSHORT 0x0 +#define APIC_DEST_MASK 0x800 +#define _APIC_GLOB_DISABLE 0x0 +#define APIC_GLOB_DISABLE_MASK 0x1 +#define APIC_SOFTWARE_DISABLE_MASK 0x2 +#define _APIC_BSP_ACCEPT_PIC 0x3 + +#define apic_enabled(apic) \ + (!((apic)->status & \ + (APIC_GLOB_DISABLE_MASK | APIC_SOFTWARE_DISABLE_MASK))) + +#define apic_global_enabled(apic) \ + (!(test_bit(_APIC_GLOB_DISABLE, &(apic)->status))) + +#define LVT_MASK \ + APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK + +#define LINT_MASK \ + LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY |\ + APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER + +#define KVM_APIC_ID(apic) \ + (GET_APIC_ID(kvm_apic_get_reg(apic, APIC_ID))) + +#define apic_lvt_enabled(apic, lvt_type) \ + (!(kvm_apic_get_reg(apic, lvt_type) & APIC_LVT_MASKED)) + +#define apic_lvt_vector(apic, lvt_type) \ + (kvm_apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK) + +#define apic_lvt_dm(apic, lvt_type) \ + (kvm_apic_get_reg(apic, lvt_type) & APIC_MODE_MASK) + +#define apic_lvtt_period(apic) \ + (kvm_apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC) + + +static inline int kvm_apic_set_irq(struct kvm_apic *apic, u8 vec, u8 = trigger) +{ + int ret; +=09 + ret =3D test_and_set_bit(vec, apic->regs + APIC_IRR); + if (trigger) + set_bit(vec, apic->regs + APIC_TMR); +=09 + /* We may need to wake up target vcpu, besides set pending bit = here */ + return ret; +} + +static inline u32 kvm_apic_get_reg(struct kvm_apic *apic, u32 reg) +{ + return *((u32 *)(apic->regs + reg)); +} + +static inline void kvm_apic_set_reg(struct kvm_apic *apic, u32 reg, u32 = val) +{ + *((u32 *)(apic->regs + reg)) =3D val; +} + + +void kvm_apic_post_injection(struct kvm_vcpu* vcpu, int vector, int = deliver_mode); + +int kvm_cpu_get_apic_interrupt(struct kvm_vcpu* vcpu); +int kvm_cpu_has_pending_irq(struct kvm_vcpu *vcpu); + +extern int kvm_apic_init(struct kvm_vcpu *vcpu); + +extern void kvm_apic_msr_set(struct kvm_apic *apic, u64 value); + +int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu); + +struct kvm_apic* kvm_apic_round_robin(struct kvm_vcpu *vcpu, + u8 dest_mode, + u8 vector, + u32 bitmap); + +u64 kvm_get_apictime_scheduled(struct kvm_vcpu *vcpu); + +int kvm_apic_range(struct kvm_vcpu *vcpu, unsigned long addr); +void kvm_apic_write(struct kvm_vcpu *vcpu, unsigned long address, + unsigned long len, unsigned long val); +unsigned long kvm_apic_read(struct kvm_vcpu *vcpu, unsigned long address, + unsigned long len); +void kvm_free_apic(struct kvm_vcpu *vcpu); +unsigned long kvm_apic_read_tpr(struct kvm_apic* apic); +void kvm_apic_update_tpr(struct kvm_apic *apic, unsigned long cr8); + +#if 0 +int kvm_apic_receive_msg(struct kvm_vcpu *vcpu, struct kvm_apic_msg = *msg); +int kvm_apic_reset(struct kvm_apic *apic); + +int kvm_apic_get(struct kvm_apic *apic, struct kvm_apic_state* as); +int kvm_apic_set(struct kvm_apic *apic, struct kvm_apic_state* as); +#endif + +void apic_check_pending_timer(struct kvm_apic *apic); +#endif /* __KVM_APIC_H__ */ diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 4473174..0df9070 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -16,6 +16,7 @@ */ =20 #include "kvm.h" +#include "kvm_apic.h" =20 #include #include @@ -381,6 +382,22 @@ static void kvm_free_vcpus(struct kvm *kvm) kvm_free_vcpu(&kvm->vcpus[i]); } =20 +/* + * The function kills a guest while there still is a user space processes + * with a descriptor to it + */ +void kvm_crash_guest(struct kvm *kvm) +{ + unsigned int i; + + for (i =3D 0; i < KVM_MAX_VCPUS; ++i) { + /* FIXME: in the future it should send IPI to gracefully = stop the other + * vCPUs + */ + kvm_free_vcpu(&kvm->vcpus[i]); + } +} + static int kvm_dev_release(struct inode *inode, struct file *filp) { return 0; @@ -597,7 +614,7 @@ void set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) inject_gp(vcpu); return; } - vcpu->cr8 =3D cr8; + kvm_apic_update_tpr(&vcpu->apic, cr8); } EXPORT_SYMBOL_GPL(set_cr8); =20 @@ -1015,12 +1032,37 @@ static int emulator_write_std(unsigned long addr, return X86EMUL_UNHANDLEABLE; } =20 +struct kvm_mmio_handler { + unsigned long (*read)(struct kvm_vcpu *v, + unsigned long addr, + unsigned long length); + void (*write)(struct kvm_vcpu *v, + unsigned long addr, + unsigned long length, + unsigned long val); + int (*in_range)(struct kvm_vcpu *v, unsigned long addr); +}; + +struct kvm_mmio_handler apic_mmio_handler =3D { + kvm_apic_read, + kvm_apic_write, + kvm_apic_range, +}; + +#define KVM_MMIO_HANDLER_NR_ARRAY_SIZE 1 +static struct kvm_mmio_handler *kvm_mmio_handlers[KVM_MMIO_HANDLER_NR_ARRA= Y_SIZE] =3D +{ + &apic_mmio_handler, +}; + static int emulator_read_emulated(unsigned long addr, unsigned long *val, unsigned int bytes, struct x86_emulate_ctxt *ctxt) { struct kvm_vcpu *vcpu =3D ctxt->vcpu; + gpa_t gpa; + int i; =20 if (vcpu->mmio_read_completed) { memcpy(val, vcpu->mmio_data, bytes); @@ -1029,18 +1071,23 @@ static int emulator_read_emulated(unsigned long = addr, } else if (emulator_read_std(addr, val, bytes, ctxt) =3D=3D X86EMUL_CONTINUE) return X86EMUL_CONTINUE; - else { - gpa_t gpa =3D vcpu->mmu.gva_to_gpa(vcpu, addr); + =20 + gpa =3D vcpu->mmu.gva_to_gpa(vcpu, addr); + if (gpa =3D=3D UNMAPPED_GVA) + return vcpu_printf(vcpu, "not present\n"), X86EMUL_PROPAGAT= E_FAULT; =20 - if (gpa =3D=3D UNMAPPED_GVA) - return X86EMUL_PROPAGATE_FAULT; - vcpu->mmio_needed =3D 1; - vcpu->mmio_phys_addr =3D gpa; - vcpu->mmio_size =3D bytes; - vcpu->mmio_is_write =3D 0; + for (i =3D 0; i < KVM_MMIO_HANDLER_NR_ARRAY_SIZE; i++) + if (kvm_mmio_handlers[i]->in_range(vcpu, gpa)) { + *val =3D kvm_mmio_handlers[i]->read(vcpu, gpa, = bytes); + return X86EMUL_CONTINUE; + } =20 - return X86EMUL_UNHANDLEABLE; - } + vcpu->mmio_needed =3D 1; + vcpu->mmio_phys_addr =3D gpa; + vcpu->mmio_size =3D bytes; + vcpu->mmio_is_write =3D 0; + + return X86EMUL_UNHANDLEABLE; } =20 static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -1070,6 +1117,7 @@ static int emulator_write_emulated(unsigned long = addr, { struct kvm_vcpu *vcpu =3D ctxt->vcpu; gpa_t gpa =3D vcpu->mmu.gva_to_gpa(vcpu, addr); + int i; =20 if (gpa =3D=3D UNMAPPED_GVA) return X86EMUL_PROPAGATE_FAULT; @@ -1077,6 +1125,12 @@ static int emulator_write_emulated(unsigned long = addr, if (emulator_write_phys(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; =20 + for (i =3D 0; i < KVM_MMIO_HANDLER_NR_ARRAY_SIZE; i++) + if (kvm_mmio_handlers[i]->in_range(vcpu, gpa)) { + kvm_mmio_handlers[i]->write(vcpu, gpa, bytes, = val); + return X86EMUL_CONTINUE; + } + vcpu->mmio_needed =3D 1; vcpu->mmio_phys_addr =3D gpa; vcpu->mmio_size =3D bytes; @@ -1479,7 +1533,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 = msr, u64 *pdata) data =3D 3; break; case MSR_IA32_APICBASE: - data =3D vcpu->apic_base; + data =3D vcpu->apic.apic_base_msr; + break; + case MSR_IA32_TIME_STAMP_COUNTER: + // FIXME + //data =3D guest_read_tsc(); break; case MSR_IA32_MISC_ENABLE: data =3D vcpu->ia32_misc_enable_msr; @@ -1557,7 +1615,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 = msr, u64 data) case 0x200 ... 0x2ff: /* MTRRs */ break; case MSR_IA32_APICBASE: - vcpu->apic_base =3D data; + kvm_apic_msr_set(&vcpu->apic, data); break; case MSR_IA32_MISC_ENABLE: vcpu->ia32_misc_enable_msr =3D data; @@ -1812,9 +1870,6 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, = struct kvm_run *kvm_run) if (vcpu->sigset_active) sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); =20 - /* re-sync apic's tpr */ - vcpu->cr8 =3D kvm_run->cr8; - if (kvm_run->io_completed) { if (vcpu->pio.count) { r =3D complete_pio(vcpu); @@ -1953,12 +2008,11 @@ static int kvm_vcpu_ioctl_get_sregs(struct = kvm_vcpu *vcpu, sregs->cr2 =3D vcpu->cr2; sregs->cr3 =3D vcpu->cr3; sregs->cr4 =3D vcpu->cr4; - sregs->cr8 =3D vcpu->cr8; + sregs->cr8 =3D kvm_apic_read_tpr(&vcpu->apic); sregs->efer =3D vcpu->shadow_efer; - sregs->apic_base =3D vcpu->apic_base; + sregs->apic_base =3D vcpu->apic.apic_base_msr; =20 - memcpy(sregs->interrupt_bitmap, vcpu->irq_pending, - sizeof sregs->interrupt_bitmap); + kvm_vcpu_irq_summary(vcpu, &sregs->interrupt_bitmap); =20 vcpu_put(vcpu); =20 @@ -1991,13 +2045,13 @@ static int kvm_vcpu_ioctl_set_sregs(struct = kvm_vcpu *vcpu, mmu_reset_needed |=3D vcpu->cr3 !=3D sregs->cr3; vcpu->cr3 =3D sregs->cr3; =20 - vcpu->cr8 =3D sregs->cr8; + kvm_apic_update_tpr(&vcpu->apic, sregs->cr8); =20 mmu_reset_needed |=3D vcpu->shadow_efer !=3D sregs->efer; #ifdef CONFIG_X86_64 kvm_arch_ops->set_efer(vcpu, sregs->efer); #endif - vcpu->apic_base =3D sregs->apic_base; + kvm_apic_msr_set(&vcpu->apic, sregs->apic_base); =20 kvm_arch_ops->decache_cr0_cr4_guest_bits(vcpu); =20 @@ -2012,12 +2066,18 @@ static int kvm_vcpu_ioctl_set_sregs(struct = kvm_vcpu *vcpu, if (mmu_reset_needed) kvm_mmu_reset_context(vcpu); =20 - memcpy(vcpu->irq_pending, sregs->interrupt_bitmap, - sizeof vcpu->irq_pending); - vcpu->irq_summary =3D 0; - for (i =3D 0; i < NR_IRQ_WORDS; ++i) - if (vcpu->irq_pending[i]) - __set_bit(i, &vcpu->irq_summary); + /* walk the interrupt-bitmap and inject an IRQ for each bit found = */ + for (i =3D 0; i < NR_IRQ_WORDS; ++i) { + unsigned long word =3D sregs->interrupt_bitmap[i]; + while(word) { + int bit_index =3D __ffs(word); + int irq =3D i * BITS_PER_LONG + bit_index; + =09 + kvm_vcpu_irq_inject(vcpu, irq, 0); + + clear_bit(bit_index, &word); + } + } =20 set_segment(vcpu, &sregs->cs, VCPU_SREG_CS); set_segment(vcpu, &sregs->ds, VCPU_SREG_DS); @@ -2178,14 +2238,8 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu = *vcpu, { if (irq->irq < 0 || irq->irq >=3D 256) return -EINVAL; - vcpu_load(vcpu); - - set_bit(irq->irq, vcpu->irq_pending); - set_bit(irq->irq / BITS_PER_LONG, &vcpu->irq_summary); =20 - vcpu_put(vcpu); - - return 0; + return kvm_vcpu_irq_inject(vcpu, irq->irq, 0); } =20 static int kvm_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu, @@ -2332,10 +2386,16 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm = *kvm, int n) if (r < 0) goto out_free_vcpus; =20 + if(kvm->options.apic_enabled) + kvm_apic_init(vcpu); + else + kvm_userint_init(vcpu); + kvm_arch_ops->vcpu_load(vcpu); r =3D kvm_mmu_setup(vcpu); - if (r >=3D 0) + if (r >=3D 0) { r =3D kvm_arch_ops->vcpu_setup(vcpu); + } vcpu_put(vcpu); =20 if (r < 0) @@ -2672,6 +2732,20 @@ static long kvm_vm_ioctl(struct file *filp, goto out; break; } + case KVM_GET_OPTIONS: { + r =3D -EFAULT; + if (copy_to_user(&kvm->options, argp, sizeof(kvm->options))= ) + goto out; + r =3D 0; + break; + } + case KVM_SET_OPTIONS: { + r =3D -EFAULT; + if (copy_from_user(&kvm->options, argp, sizeof(kvm->options= ))) + goto out; + r =3D 0; + break; + } default: ; } diff --git a/drivers/kvm/kvm_userint.c b/drivers/kvm/kvm_userint.c new file mode 100644 index 0000000..218ae81 --- /dev/null +++ b/drivers/kvm/kvm_userint.c @@ -0,0 +1,165 @@ +/* + * kvm_userint.c: User Interrupts IRQ device - This acts as an extention + * of an interrupt controller that exists elsewhere = (typically + * in userspace/QEMU) + * + * Copyright (C) 2007 Qumranet + * Copyright (C) 2007 Novell + * + * Authors: + * Gregory Haskins + * + * This program is free software; you can redistribute it and/or modify = it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License = for + * more details. + * + * You should have received a copy of the GNU General Public License = along with + * this program; if not, write to the Free Software Foundation, Inc., 59 = Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ +#include "kvm.h" + +/*---------------------------------------------------------------------- + * optimized bitarray object - works like bitarrays in bitops, but = uses=20 + * a summary field to accelerate lookups=20 + *---------------------------------------------------------------------*/ + +struct bitarray { + unsigned long summary; /* 1 per word in pending */ + unsigned long pending[NR_IRQ_WORDS]; +}; + +static int bitarray_pending(struct bitarray *this) +{ + return this->summary ? 1 : 0;=09 +} + +static int bitarray_findhighest(struct bitarray *this) +{ + if(!this->summary) + return 0; + + int word_index =3D __ffs(this->summary); + int bit_index =3D __ffs(this->pending[word_index]); + + return word_index * BITS_PER_LONG + bit_index;=09 +} + +static void bitarray_set(struct bitarray *this, int nr) +{ + set_bit(nr, &this->pending); + set_bit(nr / BITS_PER_LONG, &this->summary);=20 +}=20 + +static void bitarray_clear(struct bitarray *this, int nr) +{ + int word =3D nr / BITS_PER_LONG; + + clear_bit(nr, &this->pending); + if(!this->pending[word]) + clear_bit(word, &this->summary); +} + +static int bitarray_test(struct bitarray *this, int nr) +{ + return test_bit(nr, &this->pending); +} + +/*---------------------------------------------------------------------- + * userint interface - provides the actual kvm_irqdevice implementation + *---------------------------------------------------------------------*/ + +typedef struct { + struct bitarray irq_pending; + struct bitarray nmi_pending; +}kvm_userint; + +static int userint_pending(struct kvm_irqdevice *this, int flags) +{ + kvm_userint *s =3D (kvm_userint*)this->private; + + if(flags & KVM_IRQFLAGS_NMI) + return bitarray_pending(&s->nmi_pending); + else + return bitarray_pending(&s->irq_pending); +} + +static int userint_read(struct kvm_irqdevice *this, int flags,=20 + struct kvm_irqinfo *info) +{ + kvm_userint *s =3D (kvm_userint*)this->private; + int irq; + + if(flags & KVM_IRQFLAGS_NMI) + irq =3D bitarray_findhighest(&s->nmi_pending); + else + irq =3D bitarray_findhighest(&s->irq_pending); + + if(!irq) + return 0; + + if(info) { + info->vector =3D irq; + if(bitarray_test(&s->nmi_pending, irq)) + info->nmi =3D 1; + else + info->nmi =3D 0; + } + + if(!(flags & KVM_IRQFLAGS_PEEK)) { + /* If the "peek" flag is not set, automatically clear = the=20 + interrupt as the EOI mechanism (if any) will take = place=20 + in userspace */ + bitarray_clear(&s->irq_pending, irq); + bitarray_clear(&s->nmi_pending, irq); + } + + return irq; +} + +static int userint_inject(struct kvm_irqdevice* this, int irq, int flags) +{ + kvm_userint *s =3D (kvm_userint*)this->private; + + bitarray_set(&s->irq_pending, irq); + if(flags & KVM_IRQFLAGS_NMI) + bitarray_set(&s->nmi_pending, irq); + + return 0; +} + +static int userint_summary(struct kvm_irqdevice* this, void *data) +{=09 + kvm_userint *s =3D (kvm_userint*)this->private; + + memcpy(data, s->irq_pending.pending, sizeof s->irq_pending.pending)= ; + + return 0; +} + +static void userint_destructor(struct kvm_irqdevice *this) +{ + kfree(this->private); +} + +int kvm_userint_init(struct kvm_vcpu *vcpu) +{ + struct kvm_irqdevice *dev =3D &vcpu->irq_dev; + + dev->pending =3D userint_pending; + dev->read =3D userint_read; + dev->inject =3D userint_inject; + dev->summary =3D userint_summary; + dev->destructor =3D userint_destructor; + + dev->private =3D kzalloc(sizeof(kvm_userint), GFP_KERNEL); + + return 0; +} + diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c index b7e1410..0e0a291 100644 --- a/drivers/kvm/svm.c +++ b/drivers/kvm/svm.c @@ -22,6 +22,7 @@ #include =20 #include "kvm_svm.h" +#include "kvm_apic.h" #include "x86_emulate.h" =20 MODULE_AUTHOR("Qumranet"); @@ -108,20 +109,12 @@ static unsigned get_addr_size(struct kvm_vcpu *vcpu) =20 static inline u8 pop_irq(struct kvm_vcpu *vcpu) { - int word_index =3D __ffs(vcpu->irq_summary); - int bit_index =3D __ffs(vcpu->irq_pending[word_index]); - int irq =3D word_index * BITS_PER_LONG + bit_index; - - clear_bit(bit_index, &vcpu->irq_pending[word_index]); - if (!vcpu->irq_pending[word_index]) - clear_bit(word_index, &vcpu->irq_summary); - return irq; + return kvm_vcpu_irq_read(vcpu, 0, NULL); } =20 static inline void push_irq(struct kvm_vcpu *vcpu, u8 irq) { - set_bit(irq, vcpu->irq_pending); - set_bit(irq / BITS_PER_LONG, &vcpu->irq_summary); + kvm_vcpu_irq_inject(vcpu, irq, 0); } =20 static inline void clgi(void) @@ -587,9 +580,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) init_vmcb(vcpu->svm->vmcb); =20 fx_init(vcpu); - vcpu->apic_base =3D 0xfee00000 | - /*for vcpu 0*/ MSR_IA32_APICBASE_BSP | - MSR_IA32_APICBASE_ENABLE; =20 return 0; =20 @@ -1092,7 +1082,7 @@ static int halt_interception(struct kvm_vcpu *vcpu, = struct kvm_run *kvm_run) { vcpu->svm->next_rip =3D vcpu->svm->vmcb->save.rip + 1; skip_emulated_instruction(vcpu); - if (vcpu->irq_summary) + if (kvm_vcpu_irq_pending(vcpu, 0)) return 1; =20 kvm_run->exit_reason =3D KVM_EXIT_HLT; @@ -1263,7 +1253,7 @@ static int interrupt_window_interception(struct = kvm_vcpu *vcpu, * possible */ if (kvm_run->request_interrupt_window && - !vcpu->irq_summary) { + !kvm_vcpu_irq_pending(vcpu, 0)) { ++kvm_stat.irq_window_exits; kvm_run->exit_reason =3D KVM_EXIT_IRQ_WINDOW_OPEN; return 0; @@ -1399,7 +1389,7 @@ static void do_interrupt_requests(struct kvm_vcpu = *vcpu, (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) && (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF)); =20 - if (vcpu->interrupt_window_open && vcpu->irq_summary) + if (vcpu->interrupt_window_open && kvm_vcpu_irq_pending(vcpu, 0)) /* * If interrupts enabled, and not blocked by sti or mov = ss. Good. */ @@ -1409,7 +1399,8 @@ static void do_interrupt_requests(struct kvm_vcpu = *vcpu, * Interrupts blocked. Wait for unblock. */ if (!vcpu->interrupt_window_open && - (vcpu->irq_summary || kvm_run->request_interrupt_window)) { + (kvm_vcpu_irq_pending(vcpu, 0) ||=20 + kvm_run->request_interrupt_window)) { control->intercept |=3D 1ULL << INTERCEPT_VINTR; } else control->intercept &=3D ~(1ULL << INTERCEPT_VINTR); @@ -1418,11 +1409,13 @@ static void do_interrupt_requests(struct kvm_vcpu = *vcpu, static void post_kvm_run_save(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { - kvm_run->ready_for_interrupt_injection =3D (vcpu->interrupt_window_= open && - vcpu->irq_summary =3D=3D = 0); + kvm_run->ready_for_interrupt_injection =3D=20 + (vcpu->interrupt_window_open &&=20 + !kvm_vcpu_irq_pending(vcpu, 0)); + kvm_run->if_flag =3D (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF)= !=3D 0; - kvm_run->cr8 =3D vcpu->cr8; - kvm_run->apic_base =3D vcpu->apic_base; + kvm_run->cr8 =3D kvm_apic_read_tpr(&vcpu->apic); + kvm_run->apic_base =3D vcpu->apic.apic_base_msr; } =20 /* @@ -1434,7 +1427,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu, static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { - return (!vcpu->irq_summary && + return (!kvm_vcpu_irq_pending(vcpu, 0) && kvm_run->request_interrupt_window && vcpu->interrupt_window_open && (vcpu->svm->vmcb->save.rflags & X86_EFLAGS_IF)); @@ -1489,6 +1482,8 @@ again: fx_save(vcpu->host_fx_image); fx_restore(vcpu->guest_fx_image); =20 + apic_check_pending_timer(&vcpu->apic); + asm volatile ( #ifdef CONFIG_X86_64 "push %%rbx; push %%rcx; push %%rdx;" @@ -1601,6 +1596,8 @@ again: fx_save(vcpu->guest_fx_image); fx_restore(vcpu->host_fx_image); =20 + apic_check_pending_timer(&vcpu->apic); + if ((vcpu->svm->vmcb->save.dr7 & 0xff)) load_db_regs(vcpu->svm->host_db_regs); =20 diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c index 61a6116..c250e33 100644 --- a/drivers/kvm/vmx.c +++ b/drivers/kvm/vmx.c @@ -18,6 +18,7 @@ #include "kvm.h" #include "vmx.h" #include "kvm_vmx.h" +#include "kvm_apic.h" #include #include #include @@ -994,10 +995,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu) =20 memset(vcpu->regs, 0, sizeof(vcpu->regs)); vcpu->regs[VCPU_REGS_RDX] =3D get_rdx_init_val(); - vcpu->cr8 =3D 0; - vcpu->apic_base =3D 0xfee00000 | - /*for vcpu 0*/ MSR_IA32_APICBASE_BSP | - MSR_IA32_APICBASE_ENABLE; =20 fx_init(vcpu); =20 @@ -1219,13 +1216,8 @@ static void inject_rmode_irq(struct kvm_vcpu *vcpu, = int irq) =20 static void kvm_do_inject_irq(struct kvm_vcpu *vcpu) { - int word_index =3D __ffs(vcpu->irq_summary); - int bit_index =3D __ffs(vcpu->irq_pending[word_index]); - int irq =3D word_index * BITS_PER_LONG + bit_index; - - clear_bit(bit_index, &vcpu->irq_pending[word_index]); - if (!vcpu->irq_pending[word_index]) - clear_bit(word_index, &vcpu->irq_summary); + int irq =3D kvm_vcpu_irq_read(vcpu, 0, NULL); + BUG_ON(!irq); =20 if (vcpu->rmode.active) { inject_rmode_irq(vcpu, irq); @@ -1246,7 +1238,7 @@ static void do_interrupt_requests(struct kvm_vcpu = *vcpu, (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) =3D=3D 0); =20 if (vcpu->interrupt_window_open && - vcpu->irq_summary && + kvm_vcpu_irq_pending(vcpu, 0) && !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK)= ) /* * If interrupts enabled, and not blocked by sti or mov = ss. Good. @@ -1255,7 +1247,7 @@ static void do_interrupt_requests(struct kvm_vcpu = *vcpu, =20 cpu_based_vm_exec_control =3D vmcs_read32(CPU_BASED_VM_EXEC_CONTROL= ); if (!vcpu->interrupt_window_open && - (vcpu->irq_summary || kvm_run->request_interrupt_window)) + (kvm_vcpu_irq_pending(vcpu, 0) || kvm_run->request_interrupt_wi= ndow)) /* * Interrupts blocked. Wait for unblock. */ @@ -1314,8 +1306,8 @@ static int handle_exception(struct kvm_vcpu *vcpu, = struct kvm_run *kvm_run) =20 if (is_external_interrupt(vect_info)) { int irq =3D vect_info & VECTORING_INFO_VECTOR_MASK; - set_bit(irq, vcpu->irq_pending); - set_bit(irq / BITS_PER_LONG, &vcpu->irq_summary); + /* FIXME: Is this right? */ + kvm_vcpu_irq_inject(vcpu, irq, 0);=20 } =20 if ((intr_info & INTR_INFO_INTR_TYPE_MASK) =3D=3D 0x200) { /* nmi = */ @@ -1520,7 +1512,7 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct = kvm_run *kvm_run) printk(KERN_DEBUG "handle_cr: read CR8 " "cpu erratum AA15\n"); vcpu_load_rsp_rip(vcpu); - vcpu->regs[reg] =3D vcpu->cr8; + vcpu->regs[reg] =3D kvm_apic_read_tpr(&vcpu->apic);= vcpu_put_rsp_rip(vcpu); skip_emulated_instruction(vcpu); return 1; @@ -1617,10 +1609,10 @@ static void post_kvm_run_save(struct kvm_vcpu = *vcpu, struct kvm_run *kvm_run) { kvm_run->if_flag =3D (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) = !=3D 0; - kvm_run->cr8 =3D vcpu->cr8; - kvm_run->apic_base =3D vcpu->apic_base; - kvm_run->ready_for_interrupt_injection =3D (vcpu->interrupt_window_= open && - vcpu->irq_summary =3D=3D = 0); + kvm_run->cr8 =3D kvm_apic_read_tpr(&vcpu->apic); + kvm_run->apic_base =3D vcpu->apic.apic_base_msr; + kvm_run->ready_for_interrupt_injection =3D=20 + (vcpu->interrupt_window_open && !kvm_vcpu_irq_pending(vcpu, = 0)); } =20 static int handle_interrupt_window(struct kvm_vcpu *vcpu, @@ -1631,7 +1623,7 @@ static int handle_interrupt_window(struct kvm_vcpu = *vcpu, * possible */ if (kvm_run->request_interrupt_window && - !vcpu->irq_summary) { + !kvm_vcpu_irq_pending(vcpu, 0)) { kvm_run->exit_reason =3D KVM_EXIT_IRQ_WINDOW_OPEN; ++kvm_stat.irq_window_exits; return 0; @@ -1642,7 +1634,7 @@ static int handle_interrupt_window(struct kvm_vcpu = *vcpu, static int handle_halt(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { skip_emulated_instruction(vcpu); - if (vcpu->irq_summary) + if (kvm_vcpu_irq_pending(vcpu, 0)) return 1; =20 kvm_run->exit_reason =3D KVM_EXIT_HLT; @@ -1713,7 +1705,7 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, = struct kvm_vcpu *vcpu) static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { - return (!vcpu->irq_summary && + return (!kvm_vcpu_irq_pending(vcpu, 0) && kvm_run->request_interrupt_window && vcpu->interrupt_window_open && (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF)); @@ -1763,6 +1755,8 @@ again: save_msrs(vcpu->host_msrs, vcpu->nmsrs); load_msrs(vcpu->guest_msrs, NR_BAD_MSRS); =20 + apic_check_pending_timer(&vcpu->apic); + asm ( /* Store host registers */ "pushf \n\t" @@ -1905,6 +1899,8 @@ again: } ++kvm_stat.exits; =20 + apic_check_pending_timer(&vcpu->apic); + save_msrs(vcpu->guest_msrs, NR_BAD_MSRS); load_msrs(vcpu->host_msrs, NR_BAD_MSRS); =20 diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 07bf353..6d0a4de 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -233,6 +233,10 @@ struct kvm_dirty_log { }; }; =20 +struct kvm_options { + __u32 apic_enabled; +}; + struct kvm_cpuid_entry { __u32 function; __u32 eax; @@ -284,6 +288,8 @@ struct kvm_signal_mask { #define KVM_CREATE_VCPU _IO(KVMIO, 0x41) #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log) #define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alia= s) +#define KVM_GET_OPTIONS _IOW(KVMIO, 0x44, struct kvm_options) +#define KVM_SET_OPTIONS _IOW(KVMIO, 0x45, struct kvm_options) =20 /* * ioctls for vcpu fds --=__Part200779C9.0__= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV --=__Part200779C9.0__= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel --=__Part200779C9.0__=--