public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* PowerPC 440 progress
@ 2007-09-18 22:42 Hollis Blanchard
  2007-09-19  0:32 ` Tim Anderson
  0 siblings, 1 reply; 2+ messages in thread
From: Hollis Blanchard @ 2007-09-18 22:42 UTC (permalink / raw)
  To: kvm-ppc-devel; +Cc: kvm-devel

[-- Attachment #1: Type: text/plain, Size: 4133 bytes --]

With the attached patch, we can now execute a 440 Linux guest on a 440
host through many initcalls:

        CPU clock-frequency <- 0x27bc86ae (667MHz)
        CPU timebase-frequency <- 0x27bc86ae (667MHz)
        /plb: clock-frequency <- 9ef21ab (167MHz)
        /plb/opb: clock-frequency <- 4f790d5 (83MHz)
        /plb/opb/ebc: clock-frequency <- 34fb5e3 (56MHz)
        /plb/opb/serial@ef600300: clock-frequency <- a8c000 (11MHz)
        /plb/opb/serial@ef600400: clock-frequency <- a8c000 (11MHz)
        /plb/opb/serial@ef600500: clock-frequency <- a8c000 (11MHz)
        /plb/opb/serial@ef600600: clock-frequency <- a8c000 (11MHz)
        Memory <- <0x0 0x0 0x9000000> (144MB)
        ENET0: local-mac-address <- 00:00:00:00:00:00
        ENET1: local-mac-address <- 00:00:00:00:00:00
        
        zImage starting: loaded at 0x00400000 (sp: 0x00fffe98)
        Allocating 0x263c5c bytes for kernel ...
        gunzipping (0x00000000 <- 0x0040b000:0x00661acc)...done 0x243a9c bytes
        
        Linux/PowerPC load: 
        Finalizing device tree... flat tree at 0x66e3a0
        id mach(): done
        MMU:enter
        MMU:hw init
        MMU:mapin
        MMU:setio
        MMU:exit
        Using Bamboo machine description
        Linux version 2.6.23-rc1 (hollisb@basalt) (gcc version 3.4.2) #88 Tue Sep 18 17:18:36 CDT 2007
        console [udbg0] enabled
        setup_arch: bootmem
        arch: exit
        Zone PFN ranges:
          DMA             0 ->    36864
          Normal      36864 ->    36864
        Movable zone start PFN for each node
        early_node_map[1] active PFN ranges
            0:        0 ->    36864
        Built 1 zonelists in Zone order.  Total pages: 36576
        Kernel command line: console=ttyS0 debug
        UIC0 (32 IRQ sources) at DCR 0xc0
        UIC1 (32 IRQ sources) at DCR 0xd0
        PID hash table entries: 1024 (order: 10, 4096 bytes)
        time_init: decrementer frequency = 666.666670 MHz
        time_init: processor frequency   = 666.666670 MHz
        Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
        Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
        Memory: 143500k/147456k available (2192k kernel code, 3816k reserved, 100k data, 127k bss, 124k init)
        Calibrating delay loop... 1167.36 BogoMIPS (lpj=2334720)
        Mount-cache hash table entries: 512
        NET: Registered protocol family 16
        
        PCI: Probing PCI hardware
        NET: Registered protocol family 2
        IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
        TCP established hash table entries: 8192 (order: 4, 65536 bytes)
        TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
        TCP: Hash tables configured (established 8192 bind 8192)
        TCP reno registered
        io scheduler noop registered
        io scheduler anticipatory registered (default)
        io scheduler deadline registered
        io scheduler cfq registered
        Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled

The guest currently seems to be stuck in the serial driver reading IER.
Qemu doesn't seem to be getting the accesses though, so more debugging
is required.

Also, signal delivery and scheduling other host tasks are now working,
which makes for a nicer development environment. If you run "gdb qemu"
on the host, you can at least do a post-mortem of guest memory.

Interesting note (at least, I thought it was interesting): since the
guest can read the timebase without trapping, we must always report the
real timebase frequency to the guest.

The easiest way to do this right now was to implement DCR-read
passthrough, since that's where the Linux bootwrapper gets the
frequencies for the device tree. Long-term, we may want to have qemu
supply a device tree itself (but it still must report the real
frequency).

Another interesting note: since the guest can read SPRG4-7 without
trapping, we must context-switch those registers.

Signed-off-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

-- 
Hollis Blanchard
IBM Linux Technology Center

[-- Attachment #2: kvm_powerpc --]
[-- Type: text/plain, Size: 66726 bytes --]

PowerPC 440 KVM implementation.
Signed-off-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/powerpc/boot/dts/bamboo.dts b/arch/powerpc/boot/dts/bamboo.dts
--- a/arch/powerpc/boot/dts/bamboo.dts
+++ b/arch/powerpc/boot/dts/bamboo.dts
@@ -247,6 +247,6 @@
 
 	chosen {
 		linux,stdout-path = "/plb/opb/serial@ef600300";
-		bootargs = "console=ttyS0,115200";
+		/* bootargs = "console=ttyS0,115200"; */
 	};
 };
diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -3,14 +3,14 @@
 #
 menuconfig VIRTUALIZATION
 	bool "Virtualization"
-	depends on X86
+	depends on (X86 || PPC)
 	default y
 
 if VIRTUALIZATION
 
 config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
-	depends on X86 && EXPERIMENTAL
+	depends on EXPERIMENTAL
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	---help---
@@ -46,4 +46,14 @@ config KVM_AMD
 	  Provides support for KVM on AMD processors equipped with the AMD-V
 	  (SVM) extensions.
 
+config KVM_POWERPC
+	bool
+
+config KVM_POWERPC_440
+	tristate "KVM guest support for PowerPC 440"
+	depends on KVM && 44x
+	select KVM_POWERPC
+	---help---
+	  Provides support for running PowerPC 440 virtual machines.
+
 endif # VIRTUALIZATION
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+
+obj-$(CONFIG_KVM_POWERPC) += powerpc/
diff --git a/drivers/kvm/powerpc/Makefile b/drivers/kvm/powerpc/Makefile
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/Makefile
@@ -0,0 +1,35 @@
+
+obj-y += hack.o emulate.o tlb.o
+obj-$(CONFIG_KVM_POWERPC_440) += exceptions_44x.o
+AFLAGS_exceptions_44x.o := -I$(src)
+
+
+# XXX something is wrong with these dependencies
+$(obj)/exceptions_44x.o: $(obj)/kvm-offsets.h
+$(obj)/kvm-offsets.h: $(obj)/kvm-offsets.s Kbuild
+	$(call cmd,offsets)
+$(obj)/kvm-offsets.s: $(src)/kvm-offsets.c
+	$(Q)mkdir -p $(dir $@)
+	$(call if_changed_dep,cc_s_c)
+
+# Default sed regexp - multiline due to syntax constraints
+define sed-y
+	"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}"
+endef
+
+quiet_cmd_offsets = GEN     $@
+define cmd_offsets
+	(set -e; \
+	 echo "#ifndef __KVM_OFFSETS_H__"; \
+	 echo "#define __KVM_OFFSETS_H__"; \
+	 echo "/*"; \
+	 echo " * DO NOT MODIFY."; \
+	 echo " *"; \
+	 echo " * This file was generated by Kbuild"; \
+	 echo " *"; \
+	 echo " */"; \
+	 echo ""; \
+	 sed -ne $(sed-y) $<; \
+	 echo ""; \
+	 echo "#endif" ) > $@
+endef
diff --git a/drivers/kvm/powerpc/emulate.c b/drivers/kvm/powerpc/emulate.c
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/emulate.c
@@ -0,0 +1,585 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/jiffies.h>
+#include <linux/timer.h>
+#include <linux/types.h>
+#include <linux/string.h>
+
+#include <asm/dcr.h>
+#include <asm/time.h>
+
+#include "kvm.h"
+#include "tlb.h"
+
+#define DCRN_CPR0_CFGADDR	0xc
+#define DCRN_CPR0_CFGDATA	0xd
+
+static u32 cpr0_cfgaddr;
+
+/* Instruction decoding */
+static inline unsigned int get_op(u32 inst)
+{
+	return inst >> 26;
+}
+
+static inline unsigned int get_xop(u32 inst)
+{
+	return (inst >> 1) & 0x3ff;
+}
+
+static inline unsigned int get_sprn(u32 inst)
+{
+	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_dcrn(u32 inst)
+{
+	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_rt(u32 inst)
+{
+	return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_rs(u32 inst)
+{
+	return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_ra(u32 inst)
+{
+	return (inst >> 16) & 0x1f;
+}
+
+static inline unsigned int get_rb(u32 inst)
+{
+	return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_rc(u32 inst)
+{
+	return inst & 0x1;
+}
+
+static inline unsigned int get_ws(u32 inst)
+{
+	return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_d(u32 inst)
+{
+	return inst & 0xffff;
+}
+
+static inline int kvm_valid_shadow_tlbe(const struct kvm_vcpu *vcpu,
+                                        const struct tlbe *tlbe)
+{
+	unsigned int index;
+	gpa_t gpa;
+
+	if (!get_tlb_v(tlbe))
+		return 0;
+
+	/* Does it match current guest AS? */
+	if (get_tlb_ts(tlbe) != !!(vcpu->guest_msr & MSR_IS))
+		return 0;
+
+	/* Does it collide with the KVM interrupt handler mapping? */
+	/* XXX this nasty test will eventually be removed */
+	index = tlbe - vcpu->guest_tlb;
+	if (index == vcpu->trampoline_tlbe)
+		return 0;
+
+	gpa = get_tlb_raddr(tlbe);
+	if (gpa > vcpu->kvm->ram_size)
+		return 0;
+
+	return 1;
+}
+
+static inline int emul_tlbwe(struct kvm_vcpu *vcpu, u32 inst)
+{
+	struct tlbe *tlbe;
+	struct tlbe *shadow_tlbe;
+	unsigned int ra;
+	unsigned int rs;
+	unsigned int ws;
+	unsigned int index;
+
+	ra = get_ra(inst);
+	rs = get_rs(inst);
+	ws = get_ws(inst);
+
+	index = vcpu->gpr[ra];
+
+	tlbe = &vcpu->guest_tlb[index];
+	shadow_tlbe = &vcpu->shadow_tlb[index];
+
+	switch (ws) {
+	case PPC44x_TLB_PAGEID:
+		tlbe->mmucr = vcpu->mmucr;
+		tlbe->word0 = vcpu->gpr[rs];
+
+		if (kvm_valid_shadow_tlbe(vcpu, tlbe)) {
+			/* XXX Make sure (va, size) doesn't overlap any other
+			 * entries. 440x6 user manual says the result would be
+			 * "undefined." */
+			u32 epn = get_tlb_eaddr(tlbe);
+
+			/* Insert only 4KB mappings, and force TS on. */
+			shadow_tlbe->word0 = epn | PPC44x_TLB_VALID |
+			                     PPC44x_TLB_TS | PPC44x_TLB_4K;
+			shadow_tlbe->mmucr = tlbe->mmucr;
+		}
+
+		break;
+
+	case PPC44x_TLB_XLAT:
+		tlbe->word1 = vcpu->gpr[rs];
+
+		if (kvm_valid_shadow_tlbe(vcpu, tlbe)) {
+			gpa_t gpaddr = get_tlb_raddr(tlbe);
+			hpa_t hpaddr = gpa_to_hpa(vcpu, gpaddr);
+			/* XXX check hpaddr */
+
+			shadow_tlbe->word1 = ((hpaddr >> 32) & 0xf) |
+					     (hpaddr & 0xfffffc00);
+		} else {
+			/* Make sure it's marked invalid. */
+			shadow_tlbe->word0 = 0;
+		}
+
+		break;
+
+	case PPC44x_TLB_ATTRIB:
+		tlbe->word2 = vcpu->gpr[rs];
+		if (kvm_valid_shadow_tlbe(vcpu, tlbe)) {
+			shadow_tlbe->word2 = kvm_tlb_shadow_attrib(tlbe);
+		}
+		break;
+
+	default:
+		return EMULATE_FAIL;
+	}
+
+#if 0
+	printk("guest tlbe:  %08x %08x %08x %08x\n", tlbe->mmucr, tlbe->word0,
+	       tlbe->word1, tlbe->word2);
+	printk("shadow tlbe: %08x %08x %08x %08x\n", shadow_tlbe->mmucr,
+	       shadow_tlbe->word0, shadow_tlbe->word1, shadow_tlbe->word2);
+#endif
+
+	return EMULATE_DONE;
+}
+
+static inline int emul_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                            unsigned int bytes, int is_bigendian)
+{
+	run->mmio.phys_addr = vcpu->paddr_accessed;
+	run->mmio.len = bytes;
+	run->mmio.is_write = 0;
+	vcpu->pending_mmio_be = is_bigendian;
+
+	return EMULATE_MMIO_ASSIST;
+}
+
+static inline int emul_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                             u32 val, unsigned int bytes, int is_bigendian)
+{
+	void *data = run->mmio.data;
+
+	run->mmio.phys_addr = vcpu->paddr_accessed;
+	run->mmio.len = bytes;
+	run->mmio.is_write = 1;
+	vcpu->pending_mmio_be = is_bigendian;
+
+	if (is_bigendian) {
+		switch (bytes) {
+		case 4: *(u32 *)data = val; break;
+		case 2: *(u16 *)data = val; break;
+		case 1: *(u8 *)data = val; break;
+		}
+	} else {
+		switch (bytes) {
+		case 4: st_le32(data, val); break;
+		case 2: st_le16(data, val); break;
+		case 1: *(u8 *)data = val; break;
+		}
+	}
+
+	return EMULATE_MMIO_ASSIST;
+}
+
+static inline void kvm_emulate_dec(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->tcr & TCR_DIE) {
+		/* The decrementer ticks at the same rate as the timebase, so
+		 * that's how we convert the guest DEC value to the number of
+		 * host ticks. */
+		unsigned long nr_jiffies = vcpu->dec / tb_ticks_per_jiffy;
+
+		mod_timer(&vcpu->dec_timer, get_jiffies_64() + nr_jiffies);
+		//printk("DEC timer: %lx\n", vcpu->dec_timer.expires);
+	} else {
+		del_timer(&vcpu->dec_timer);
+	}
+}
+
+int kvm_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	u32 inst = vcpu->last_inst;
+	u32 ea;
+	int ra;
+	int rb;
+	int rc;
+	int rs;
+	int rt;
+	int sprn;
+	int dcrn;
+	enum emulation_result emulated = EMULATE_DONE;
+	int advance = 1;
+
+	switch (get_op(inst)) {
+	case 3:							/* trap */
+		printk("trap!\n");
+		queue_exception(vcpu, PPC44x_INTERRUPT_PROGRAM);
+		advance = 0;
+		break;
+
+	case 19:
+		switch (get_xop(inst)) {
+		case 50:					/* rfi */
+			vcpu->pc = vcpu->srr0;
+			vcpu->guest_msr = vcpu->srr1;
+			vcpu->shadow_msr |= vcpu->guest_msr & GUEST_MSR_MASK;
+			vcpu->shadow_msr &= ~(vcpu->guest_msr & GUEST_MSR_MASK);
+			/*
+			printk("rfi: pc %x msr %x (%x)\n", vcpu->pc,
+			       vcpu->guest_msr, vcpu->shadow_msr);
+			*/
+			advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+
+	case 31:
+		switch (get_xop(inst)) {
+
+		case 83:					/* mfmsr */
+			rt = get_rt(inst);
+			vcpu->gpr[rt] = vcpu->guest_msr;
+			break;
+
+		case 87:					/* lbzx */
+			rt = get_rt(inst);
+			vcpu->pending_io_gpr = rt;
+			emulated = emul_load(run, vcpu, 1, 1);
+			break;
+
+		case 131:					/* wrtee */
+			rs = get_rs(inst);
+			vcpu->guest_msr = (vcpu->guest_msr & ~MSR_EE)
+			                  | (vcpu->gpr[rs] & MSR_EE);
+			break;
+
+		case 146:					/* mtmsr */
+			rs = get_rs(inst);
+			vcpu->guest_msr = vcpu->gpr[rs];
+			vcpu->shadow_msr |= vcpu->guest_msr & GUEST_MSR_MASK;
+			vcpu->shadow_msr &= ~(vcpu->guest_msr & GUEST_MSR_MASK);
+			break;
+
+		case 163:					/* wrteei */
+			vcpu->guest_msr = (vcpu->guest_msr & ~MSR_EE)
+			                  | (inst & MSR_EE);
+			break;
+
+		case 215:					/* stbx */
+			rs = get_rs(inst);
+			vcpu->pending_io_gpr = rs;
+			emulated = emul_store(run, vcpu, vcpu->gpr[rs], 1, 1);
+			break;
+
+		case 323:					/* mfdcr */
+			dcrn = get_dcrn(inst);
+			rt = get_rt(inst);
+
+			/* emulate some access in kernel */
+			switch (dcrn) {
+			case DCRN_CPR0_CFGADDR:
+				vcpu->gpr[rt] = cpr0_cfgaddr;
+				emulated = EMULATE_DONE;
+				break;
+			case DCRN_CPR0_CFGDATA:
+				local_irq_disable();
+				mtdcr(DCRN_CPR0_CFGADDR, cpr0_cfgaddr);
+				vcpu->gpr[rt] = mfdcr(DCRN_CPR0_CFGDATA);
+				local_irq_enable();
+				emulated = EMULATE_DONE;
+				break;
+			default:
+				run->dcr.dcrn = dcrn;
+				run->dcr.data =  0;
+				run->dcr.is_write = 0;
+				vcpu->pending_io_gpr = rt;
+				emulated = EMULATE_DCR_ASSIST;
+			}	
+
+			break;
+
+		case 339:					/* mfspr */
+			sprn = get_sprn(inst);
+			rt = get_rt(inst);
+			switch (sprn) {
+			case SPRN_SRR0: vcpu->gpr[rt] = vcpu->srr0; break;
+			case SPRN_SRR1: vcpu->gpr[rt] = vcpu->srr1; break;
+			case SPRN_MMUCR: vcpu->gpr[rt] = vcpu->mmucr; break;
+			case SPRN_PID: vcpu->gpr[rt] = vcpu->pid; break;
+			case SPRN_IVPR: vcpu->gpr[rt] = vcpu->ivpr; break;
+			case SPRN_CCR0: vcpu->gpr[rt] = vcpu->ccr0; break;
+			case SPRN_CCR1: vcpu->gpr[rt] = vcpu->ccr1; break;
+			case SPRN_PVR: vcpu->gpr[rt] = vcpu->pvr; break;
+			case SPRN_DEAR: vcpu->gpr[rt] = vcpu->dear; break;
+			case SPRN_ESR: vcpu->gpr[rt] = vcpu->dear; break;
+			case SPRN_DBCR0: vcpu->gpr[rt] = vcpu->dbcr0; break;
+			case SPRN_DBCR1: vcpu->gpr[rt] = vcpu->dbcr1; break;
+
+			/* Note: mftb and TBRL/TBWL are user-accessible, so
+			 * the guest can always access the real TB anyways.
+			 * In fact, we probably will never see these traps. */
+			case SPRN_TBWL: vcpu->gpr[rt] = mftbl(); break;
+			case SPRN_TBWU: vcpu->gpr[rt] = mftbu(); break;
+
+			case SPRN_SPRG0: vcpu->gpr[rt] = vcpu->sprg0; break;
+			case SPRN_SPRG1: vcpu->gpr[rt] = vcpu->sprg1; break;
+			case SPRN_SPRG2: vcpu->gpr[rt] = vcpu->sprg2; break;
+			case SPRN_SPRG3: vcpu->gpr[rt] = vcpu->sprg3; break;
+			/* Note: SPRG4-7 are user-readable, so we don't get
+			 * a trap. */
+
+			case SPRN_IVOR0: vcpu->gpr[rt] = vcpu->ivor[0]; break;
+			case SPRN_IVOR1: vcpu->gpr[rt] = vcpu->ivor[1]; break;
+			case SPRN_IVOR2: vcpu->gpr[rt] = vcpu->ivor[2]; break;
+			case SPRN_IVOR3: vcpu->gpr[rt] = vcpu->ivor[3]; break;
+			case SPRN_IVOR4: vcpu->gpr[rt] = vcpu->ivor[4]; break;
+			case SPRN_IVOR5: vcpu->gpr[rt] = vcpu->ivor[5]; break;
+			case SPRN_IVOR6: vcpu->gpr[rt] = vcpu->ivor[6]; break;
+			case SPRN_IVOR7: vcpu->gpr[rt] = vcpu->ivor[7]; break;
+			case SPRN_IVOR8: vcpu->gpr[rt] = vcpu->ivor[8]; break;
+			case SPRN_IVOR9: vcpu->gpr[rt] = vcpu->ivor[9]; break;
+			case SPRN_IVOR10: vcpu->gpr[rt] = vcpu->ivor[10]; break;
+			case SPRN_IVOR11: vcpu->gpr[rt] = vcpu->ivor[11]; break;
+			case SPRN_IVOR12: vcpu->gpr[rt] = vcpu->ivor[12]; break;
+			case SPRN_IVOR13: vcpu->gpr[rt] = vcpu->ivor[13]; break;
+			case SPRN_IVOR14: vcpu->gpr[rt] = vcpu->ivor[14]; break;
+			case SPRN_IVOR15: vcpu->gpr[rt] = vcpu->ivor[15]; break;
+
+			default:
+				printk("mfspr: unknown spr %x\n", sprn);
+				vcpu->gpr[rt] = 0;
+				break;
+			}
+			break;
+
+		case 451:					/* mtdcr */
+			dcrn = get_dcrn(inst);
+			rs = get_rs(inst);
+
+			/* emulate some access in kernel */
+			switch (dcrn) {
+			case DCRN_CPR0_CFGADDR:
+				cpr0_cfgaddr = vcpu->gpr[rs];
+				emulated = EMULATE_DONE;
+				break;
+			default:
+				run->dcr.dcrn = dcrn;
+				run->dcr.data = vcpu->gpr[rs];
+				run->dcr.is_write = 1;
+				emulated = EMULATE_DCR_ASSIST;
+			}
+
+			break;
+
+		case 467:					/* mtspr */
+			sprn = get_sprn(inst);
+			rs = get_rs(inst);
+			switch (sprn) {
+			case SPRN_SRR0: vcpu->srr0 = vcpu->gpr[rs]; break;
+			case SPRN_SRR1: vcpu->srr1 = vcpu->gpr[rs]; break;
+			case SPRN_MMUCR: vcpu->mmucr = vcpu->gpr[rs]; break;
+			case SPRN_PID: vcpu->pid = vcpu->gpr[rs]; break;
+			case SPRN_CCR0: vcpu->ccr0 = vcpu->gpr[rs]; break;
+			case SPRN_CCR1: vcpu->ccr1 = vcpu->gpr[rs]; break;
+			case SPRN_DEAR: vcpu->dear = vcpu->gpr[rs]; break;
+			case SPRN_ESR: vcpu->esr = vcpu->gpr[rs]; break;
+			case SPRN_DBCR0: vcpu->dbcr0 = vcpu->gpr[rs]; break;
+			case SPRN_DBCR1: vcpu->dbcr1 = vcpu->gpr[rs]; break;
+
+			/* XXX We need to context-switch the timebase for
+			 * watchdog and FIT. */
+			case SPRN_TBWL: break;
+			case SPRN_TBWU: break;
+
+			case SPRN_DEC:
+				vcpu->dec = vcpu->gpr[rs];
+				kvm_emulate_dec(vcpu);
+				break;
+
+			case SPRN_TSR: break;
+			case SPRN_TCR:
+				vcpu->tcr = vcpu->gpr[rs];
+				kvm_emulate_dec(vcpu);
+				break;
+
+			case SPRN_SPRG0: vcpu->sprg0 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG1: vcpu->sprg1 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG2: vcpu->sprg2 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG3: vcpu->sprg3 = vcpu->gpr[rs]; break;
+
+			/* Note: SPRG4-7 are user-readable. These values are
+			 * loaded into the real SPRGs when resuming the
+			 * guest. */
+			case SPRN_SPRG4: vcpu->sprg4 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG5: vcpu->sprg5 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG6: vcpu->sprg6 = vcpu->gpr[rs]; break;
+			case SPRN_SPRG7: vcpu->sprg7 = vcpu->gpr[rs]; break;
+
+			case SPRN_IVPR: vcpu->ivpr = vcpu->gpr[rs]; break;
+			case SPRN_IVOR0: vcpu->ivor[0] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR1: vcpu->ivor[1] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR2: vcpu->ivor[2] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR3: vcpu->ivor[3] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR4: vcpu->ivor[4] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR5: vcpu->ivor[5] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR6: vcpu->ivor[6] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR7: vcpu->ivor[7] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR8: vcpu->ivor[8] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR9: vcpu->ivor[9] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR10: vcpu->ivor[10] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR11: vcpu->ivor[11] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR12: vcpu->ivor[12] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR13: vcpu->ivor[13] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR14: vcpu->ivor[14] = vcpu->gpr[rs]; break;
+			case SPRN_IVOR15: vcpu->ivor[15] = vcpu->gpr[rs]; break;
+
+			default:
+				printk("mtspr: unknown spr %x\n", sprn);
+				emulated = EMULATE_FAIL;
+				break;
+			}
+			break;
+
+		case 566:					/* tlbsync */
+			break;
+
+		case 946:					/* tlbre */
+			break;
+
+		case 978:					/* tlbwe */
+			emulated = emul_tlbwe(vcpu, inst);
+			break;
+
+		case 914:	{				/* tlbsx */
+			int index;
+			unsigned int as = get_mmucr_sts(vcpu);
+			unsigned int pid = get_mmucr_stid(vcpu);
+
+			rt = get_rt(inst);
+			ra = get_ra(inst);
+			rb = get_rb(inst);
+			rc = get_rc(inst);
+
+			ea = rb;
+			if (ra)
+				ea += vcpu->gpr[ra];
+
+			index = tlb_search(vcpu, ea, pid, as, NULL);
+			if (index < 0) {
+				/* XXX handle Rc */
+			}
+			vcpu->gpr[rt] = index;
+
+			}
+			break;
+
+		case 966:					/* iccci */
+			break;
+
+		default:
+			printk("unknown: op %d xop %d\n", get_op(inst),
+				get_xop(inst));
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+
+	case 32:						/* lwz */
+		rt = get_rt(inst);
+		vcpu->pending_io_gpr = rt;
+		emulated = emul_load(run, vcpu, 4, 1);
+		break;
+
+	case 34:						/* lbz */
+		rt = get_rt(inst);
+		vcpu->pending_io_gpr = rt;
+		emulated = emul_load(run, vcpu, 1, 1);
+		break;
+
+	case 36:						/* stw */
+		rs = get_rs(inst);
+		vcpu->pending_io_gpr = rs;
+		emulated = emul_store(run, vcpu, vcpu->gpr[rs], 4, 1);
+		break;
+
+	case 38:						/* stb */
+		rs = get_rs(inst);
+		vcpu->pending_io_gpr = rs;
+		emulated = emul_store(run, vcpu, vcpu->gpr[rs], 1, 1);
+		break;
+
+	case 40:						/* lhz */
+		rt = get_rt(inst);
+		vcpu->pending_io_gpr = rt;
+		emulated = emul_load(run, vcpu, 2, 1);
+		break;
+
+	case 44:						/* sth */
+		rs = get_rs(inst);
+		vcpu->pending_io_gpr = rs;
+		emulated = emul_store(run, vcpu, vcpu->gpr[rs], 2, 1);
+		break;
+
+	default:
+		printk("unknown op %d\n", get_op(inst));
+		emulated = EMULATE_FAIL;
+		break;
+	}
+
+	if ((emulated == EMULATE_DONE) && advance)
+		vcpu->pc += 4; /* Advance past emulated instruction. */
+
+	return emulated;
+}
diff --git a/drivers/kvm/powerpc/exceptions_44x.S b/drivers/kvm/powerpc/exceptions_44x.S
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/exceptions_44x.S
@@ -0,0 +1,508 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/reg.h>
+#include <asm/mmu-44x.h>
+#include <asm/page.h>
+
+#include "kvm.h"
+#include "kvm-offsets.h"
+
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * 4))
+
+/* The host stack layout: */
+#define HOST_R1         0 /* Implied by stwu. */
+#define HOST_CALLEE_LR  4
+#define HOST_RUN        8
+/* r2 is special: it holds 'current', and it made nonvolatile in the
+ * kernel with the -ffixed-r2 gcc option. */
+#define HOST_R2         12
+#define HOST_NV_GPRS    16
+#define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * 4))
+#define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4)
+#define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */
+#define HOST_STACK_LR   (HOST_STACK_SIZE + 4) /* In caller stack frame. */
+
+.macro KVM_HANDLER ivor_nr
+_GLOBAL(kvm_trampoline_handler_\ivor_nr)
+	/* Get pointer to vcpu and record exit number. */
+	mtspr	SPRN_SPRG0, r4
+	mfspr	r4, SPRN_SPRG1
+	stw	r5, VCPU_GPR(r5)(r4)
+	li	r5, \ivor_nr
+	/* This branch is fixed up at install time to jump to
+	 * kvm_trampoline_resume_host(). */
+	b	.
+.endm
+
+_GLOBAL(kvm_trampoline_start)
+
+KVM_HANDLER PPC44x_INTERRUPT_CRITICAL
+KVM_HANDLER PPC44x_INTERRUPT_MACHINE_CHECK
+KVM_HANDLER PPC44x_INTERRUPT_DATA_STORAGE
+KVM_HANDLER PPC44x_INTERRUPT_INST_STORAGE
+KVM_HANDLER PPC44x_INTERRUPT_EXTERNAL
+KVM_HANDLER PPC44x_INTERRUPT_ALIGNMENT
+KVM_HANDLER PPC44x_INTERRUPT_PROGRAM
+KVM_HANDLER PPC44x_INTERRUPT_FP_UNAVAIL
+KVM_HANDLER PPC44x_INTERRUPT_SYSCALL
+KVM_HANDLER PPC44x_INTERRUPT_AP_UNAVAIL
+KVM_HANDLER PPC44x_INTERRUPT_DECREMENTER
+KVM_HANDLER PPC44x_INTERRUPT_FIT
+KVM_HANDLER PPC44x_INTERRUPT_WATCHDOG
+KVM_HANDLER PPC44x_INTERRUPT_DTLB_MISS
+KVM_HANDLER PPC44x_INTERRUPT_ITLB_MISS
+KVM_HANDLER PPC44x_INTERRUPT_DEBUG
+
+/* Registers:
+ *  SPRG0: guest r4
+ *  r4: vcpu pointer
+ *  r5: KVM exit number
+ */
+_GLOBAL(kvm_trampoline_resume_host)
+	stw	r3, VCPU_GPR(r3)(r4)
+	stw	r6, VCPU_GPR(r6)(r4)
+	stw	r7, VCPU_GPR(r7)(r4)
+	stw	r8, VCPU_GPR(r8)(r4)
+	stw	r9, VCPU_GPR(r9)(r4)
+	mfcr	r3
+	stw	r3, VCPU_CR(r4)
+
+	cmpwi	cr0, r5, PPC44x_INTERRUPT_PROGRAM
+	cmpwi	cr1, r5, PPC44x_INTERRUPT_DTLB_MISS
+	cror	4*cr0+eq, 4*cr0+eq, 4*cr1+eq
+	bne	cr0, ..skip_inst_copy
+	/* Save the faulting instruction for possible emulation. */
+	mfspr	r9, SPRN_SRR0
+	mfmsr	r8
+	ori	r7, r8, MSR_DS
+	mtmsr	r7
+	isync
+	lwz	r9, 0(r9)
+	mtmsr	r8
+	isync
+	stw	r9, VCPU_LAST_INST(r4)
+	/* Also grab DEAR and ESR before the host can clobber them. */
+	mfspr	r9, SPRN_DEAR
+	stw	r9, VCPU_FAULT_DEAR(r4)
+	mfspr	r9, SPRN_ESR
+	stw	r9, VCPU_FAULT_ESR(r4)
+..skip_inst_copy:
+
+	/* Reload all host TLB mappings. Unfortunely we must skip the
+	 * trampoline mapping here. */
+	/* Future optimization: only reload host kernel mappings here, and do
+	 * the rest in heavyweight_exit. */
+	lwz	r9, VCPU_TRAMPOLINE_TLBE(r4)
+	mfspr	r8, SPRN_MMUCR			/* Save MMUCR. */
+	addi	r3, r4, VCPU_HOST_TLB - 4
+	li	r6, 0
+1:
+	cmpwi	cr0, r6, PPC44x_TLB_SIZE
+	cmpl	cr1, r6, r9
+	/* Is this greater than the size of the TLB? */
+	beq	cr0, ..host_tlb_reload_done
+	/* Is this the trampoline entry? */
+	beq	cr1, ..skip_entry
+	lwzu	r7, 4(r3)
+	mtspr	SPRN_MMUCR, r7
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_PAGEID
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_XLAT
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_ATTRIB
+	addi	r6, r6, 1
+	b	1b
+..skip_entry:
+	addi	r3, r3, 16
+	addi	r6, r6, 1
+	b	1b
+
+..host_tlb_reload_done:
+	isync
+
+	/* We can jump into the host kernel now that it's completely mapped
+	 * (we still have a modified entry 0 though). */
+	mfctr	r7
+	LOAD_REG_ADDR(r6, resume_host_continued)
+	stw	r7, VCPU_CTR(r4)
+	mtctr	r6
+	bctr
+_GLOBAL(kvm_trampoline_resume_host_len)
+	.long	. - kvm_trampoline_resume_host
+
+_GLOBAL(kvm_trampoline_resume_guest)
+	/* Load all shadow TLB mappings. For simplicity, this includes a reload
+	 * of the trampoline mapping. */
+	mfspr	r8, SPRN_MMUCR			/* Save host MMUCR. */
+	addi	r3, r4, VCPU_SHADOW_TLB - 4
+	li	r6, 0
+1:
+	lwzu	r7, 4(r3)
+	/* Set TID. (The other MMUCR bits are restored later.) */
+	mtspr	SPRN_MMUCR, r7
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_PAGEID
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_XLAT
+	lwzu	r7, 4(r3)
+	tlbwe	r7, r6, PPC44x_TLB_ATTRIB
+	addi	r6, r6, 1
+	cmpwi	r6, PPC44x_TLB_SIZE
+	blt	1b
+	mtspr	SPRN_MMUCR, r8			/* Restore host MMUCR. */
+
+	/* Finish loading guest volatiles and jump to guest. */
+	lwz	r3, VCPU_CTR(r4)
+	mtctr	r3
+	lwz	r3, VCPU_CR(r4)
+	mtcr	r3
+	lwz	r5, VCPU_GPR(r5)(r4)
+	lwz	r6, VCPU_GPR(r6)(r4)
+	lwz	r7, VCPU_GPR(r7)(r4)
+	lwz	r8, VCPU_GPR(r8)(r4)
+	lwz	r3, VCPU_PC(r4)
+	mtsrr0	r3
+	lwz	r3, VCPU_SHADOW_MSR(r4)
+	mtsrr1	r3
+	lwz	r3, VCPU_GPR(r3)(r4)
+	lwz	r4, VCPU_GPR(r4)(r4)
+	rfi
+_GLOBAL(kvm_trampoline_resume_guest_len)
+	.long	. - kvm_trampoline_resume_guest
+
+_GLOBAL(kvm_trampoline_handler_len)
+	.long	kvm_trampoline_handler_1 - kvm_trampoline_handler_0
+
+
+/* Registers:
+ * SPRG0: guest r4
+ * r4: vcpu pointer
+ * r5: KVM exit number
+ * r8: MMUCR
+ * r9: TLB entry # of the trampoline mapping
+ */
+resume_host_continued:
+	/* Switch back to the linear vcpu mapping. */
+	lwz	r4, VCPU_LINEAR(r4)
+
+	/* We're done with the trampoline mapping now. */
+	mulli	r3, r9, TLBE_BYTES
+	add	r3, r3, r4
+	lwz	r7, (VCPU_HOST_TLB + 0)(r3)
+	mtspr	SPRN_MMUCR, r7
+	lwz	r7, (VCPU_HOST_TLB + 4)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_PAGEID
+	lwz	r7, (VCPU_HOST_TLB + 8)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_XLAT
+	lwz	r7, (VCPU_HOST_TLB + 12)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_ATTRIB
+	mtspr	SPRN_MMUCR, r8			/* Restore MMUCR. */
+	/* Don't need isync here because nothing in the host is using the
+	 * trampoline mapping in the first place. */
+
+	/* Save remaining volatile guest register state to vcpu. */
+	stw	r0, VCPU_GPR(r0)(r4)
+	stw	r1, VCPU_GPR(r1)(r4)
+	stw	r2, VCPU_GPR(r2)(r4)
+	stw	r10, VCPU_GPR(r10)(r4)
+	stw	r11, VCPU_GPR(r11)(r4)
+	stw	r12, VCPU_GPR(r12)(r4)
+	stw	r13, VCPU_GPR(r13)(r4)
+	mflr	r3
+	stw	r3, VCPU_LR(r4)
+	mfxer	r3
+	stw	r3, VCPU_XER(r4)
+	mfspr	r3, SPRN_SPRG0
+	stw	r3, VCPU_GPR(r4)(r4)
+	mfspr	r3, SPRN_SRR0
+	stw	r3, VCPU_PC(r4)
+
+	/* Program and DTLB interrupts save the complete GPR state for
+	 * emulation. */
+	cmpwi	cr0, r5, PPC44x_INTERRUPT_PROGRAM
+	cmpwi	cr1, r5, PPC44x_INTERRUPT_DTLB_MISS
+	cror	4*cr0+eq, 4*cr0+eq, 4*cr1+eq
+	bne	..skip_nv_store
+	stw	r14, VCPU_GPR(r14)(r4)
+	stw	r15, VCPU_GPR(r15)(r4)
+	stw	r16, VCPU_GPR(r16)(r4)
+	stw	r17, VCPU_GPR(r17)(r4)
+	stw	r18, VCPU_GPR(r18)(r4)
+	stw	r19, VCPU_GPR(r19)(r4)
+	stw	r20, VCPU_GPR(r20)(r4)
+	stw	r21, VCPU_GPR(r21)(r4)
+	stw	r22, VCPU_GPR(r22)(r4)
+	stw	r23, VCPU_GPR(r23)(r4)
+	stw	r24, VCPU_GPR(r24)(r4)
+	stw	r25, VCPU_GPR(r25)(r4)
+	stw	r26, VCPU_GPR(r26)(r4)
+	stw	r27, VCPU_GPR(r27)(r4)
+	stw	r28, VCPU_GPR(r28)(r4)
+	stw	r29, VCPU_GPR(r29)(r4)
+	stw	r30, VCPU_GPR(r30)(r4)
+	stw	r31, VCPU_GPR(r31)(r4)
+..skip_nv_store:
+
+	/* Restore host stack pointer before IVPR. */
+	lwz	r1, VCPU_HOST_STACK(r4)
+
+	/* Restore host IVPR before re-enabling interrupts. We cheat and know
+	 * that Linux IVPR is always 0xc0000000. */
+	lis	r3, 0xc000
+	mtspr	SPRN_IVPR, r3
+
+	/* Switch to kernel stack and jump to handler. */
+	LOAD_REG_ADDR(r3, kvm_handle_exit)
+	mtctr	r3
+	lwz	r3, HOST_RUN(r1)
+	lwz	r2, HOST_R2(r1)
+	/* Save vcpu pointer to nonvolatile register. */
+	stw	r14, VCPU_GPR(r14)(r4)
+	mr	r14, r4
+	bctrl
+	/* Restore stack and replace vcpu pointer. */
+	mr	r4, r14
+	lwz	r14, VCPU_GPR(r14)(r4)
+
+	/* Program interrupts restore complete GPR state. */
+	cmpwi	r3, RESUME_GUEST_NV
+	bne	..skip_nv_load
+	lwz	r15, VCPU_GPR(r15)(r4)
+	lwz	r16, VCPU_GPR(r16)(r4)
+	lwz	r17, VCPU_GPR(r17)(r4)
+	lwz	r18, VCPU_GPR(r18)(r4)
+	lwz	r19, VCPU_GPR(r19)(r4)
+	lwz	r20, VCPU_GPR(r20)(r4)
+	lwz	r21, VCPU_GPR(r21)(r4)
+	lwz	r22, VCPU_GPR(r22)(r4)
+	lwz	r23, VCPU_GPR(r23)(r4)
+	lwz	r24, VCPU_GPR(r24)(r4)
+	lwz	r25, VCPU_GPR(r25)(r4)
+	lwz	r26, VCPU_GPR(r26)(r4)
+	lwz	r27, VCPU_GPR(r27)(r4)
+	lwz	r28, VCPU_GPR(r28)(r4)
+	lwz	r29, VCPU_GPR(r29)(r4)
+	lwz	r30, VCPU_GPR(r30)(r4)
+	lwz	r31, VCPU_GPR(r31)(r4)
+..skip_nv_load:
+
+	/* Should we return to the guest? */
+	cmpwi	r3, 0
+	bgt	lightweight_exit
+
+heavyweight_exit:
+	/* Not returning to guest.
+	 * Note: We don't need to touch SPRs here because the guest has been
+	 * running the whole time using the host's SPR values. */
+
+	/* We already saved guest volatile register state; now save the
+	 * non-volatiles. */
+	stw	r14, VCPU_GPR(r14)(r4)
+	stw	r15, VCPU_GPR(r15)(r4)
+	stw	r16, VCPU_GPR(r16)(r4)
+	stw	r17, VCPU_GPR(r17)(r4)
+	stw	r18, VCPU_GPR(r18)(r4)
+	stw	r19, VCPU_GPR(r19)(r4)
+	stw	r20, VCPU_GPR(r20)(r4)
+	stw	r21, VCPU_GPR(r21)(r4)
+	stw	r22, VCPU_GPR(r22)(r4)
+	stw	r23, VCPU_GPR(r23)(r4)
+	stw	r24, VCPU_GPR(r24)(r4)
+	stw	r25, VCPU_GPR(r25)(r4)
+	stw	r26, VCPU_GPR(r26)(r4)
+	stw	r27, VCPU_GPR(r27)(r4)
+	stw	r28, VCPU_GPR(r28)(r4)
+	stw	r29, VCPU_GPR(r29)(r4)
+	stw	r30, VCPU_GPR(r30)(r4)
+	stw	r31, VCPU_GPR(r31)(r4)
+
+	/* Load host non-volatile register state from host stack. */
+	lwz	r14, HOST_NV_GPR(r14)(r1)
+	lwz	r15, HOST_NV_GPR(r15)(r1)
+	lwz	r16, HOST_NV_GPR(r16)(r1)
+	lwz	r17, HOST_NV_GPR(r17)(r1)
+	lwz	r18, HOST_NV_GPR(r18)(r1)
+	lwz	r19, HOST_NV_GPR(r19)(r1)
+	lwz	r20, HOST_NV_GPR(r20)(r1)
+	lwz	r21, HOST_NV_GPR(r21)(r1)
+	lwz	r22, HOST_NV_GPR(r22)(r1)
+	lwz	r23, HOST_NV_GPR(r23)(r1)
+	lwz	r24, HOST_NV_GPR(r24)(r1)
+	lwz	r25, HOST_NV_GPR(r25)(r1)
+	lwz	r26, HOST_NV_GPR(r26)(r1)
+	lwz	r27, HOST_NV_GPR(r27)(r1)
+	lwz	r28, HOST_NV_GPR(r28)(r1)
+	lwz	r29, HOST_NV_GPR(r29)(r1)
+	lwz	r30, HOST_NV_GPR(r30)(r1)
+	lwz	r31, HOST_NV_GPR(r31)(r1)
+
+	/* Return to kvm_vcpu_run(). */
+	lwz	r4, HOST_STACK_LR(r1)
+	addi	r1, r1, HOST_STACK_SIZE
+	mtlr	r4
+	/* r3 still contains the return code from kvm_handle_exit(). */
+	blr
+
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__vcpu_run)
+	stwu	r1, -HOST_STACK_SIZE(r1)
+	stw	r1, VCPU_HOST_STACK(r4)	/* Save stack pointer to vcpu. */
+
+	/* Save host state to stack. */
+	stw	r3, HOST_RUN(r1)
+	mflr	r3
+	stw	r3, HOST_STACK_LR(r1)
+
+	/* Save host non-volatile register state to stack. */
+	stw	r14, HOST_NV_GPR(r14)(r1)
+	stw	r15, HOST_NV_GPR(r15)(r1)
+	stw	r16, HOST_NV_GPR(r16)(r1)
+	stw	r17, HOST_NV_GPR(r17)(r1)
+	stw	r18, HOST_NV_GPR(r18)(r1)
+	stw	r19, HOST_NV_GPR(r19)(r1)
+	stw	r20, HOST_NV_GPR(r20)(r1)
+	stw	r21, HOST_NV_GPR(r21)(r1)
+	stw	r22, HOST_NV_GPR(r22)(r1)
+	stw	r23, HOST_NV_GPR(r23)(r1)
+	stw	r24, HOST_NV_GPR(r24)(r1)
+	stw	r25, HOST_NV_GPR(r25)(r1)
+	stw	r26, HOST_NV_GPR(r26)(r1)
+	stw	r27, HOST_NV_GPR(r27)(r1)
+	stw	r28, HOST_NV_GPR(r28)(r1)
+	stw	r29, HOST_NV_GPR(r29)(r1)
+	stw	r30, HOST_NV_GPR(r30)(r1)
+	stw	r31, HOST_NV_GPR(r31)(r1)
+
+	/* XXX all guest SPRS */
+
+	/* Load guest non-volatiles. */
+	lwz	r14, VCPU_GPR(r14)(r4)
+	lwz	r15, VCPU_GPR(r15)(r4)
+	lwz	r16, VCPU_GPR(r16)(r4)
+	lwz	r17, VCPU_GPR(r17)(r4)
+	lwz	r18, VCPU_GPR(r18)(r4)
+	lwz	r19, VCPU_GPR(r19)(r4)
+	lwz	r20, VCPU_GPR(r20)(r4)
+	lwz	r21, VCPU_GPR(r21)(r4)
+	lwz	r22, VCPU_GPR(r22)(r4)
+	lwz	r23, VCPU_GPR(r23)(r4)
+	lwz	r24, VCPU_GPR(r24)(r4)
+	lwz	r25, VCPU_GPR(r25)(r4)
+	lwz	r26, VCPU_GPR(r26)(r4)
+	lwz	r27, VCPU_GPR(r27)(r4)
+	lwz	r28, VCPU_GPR(r28)(r4)
+	lwz	r29, VCPU_GPR(r29)(r4)
+	lwz	r30, VCPU_GPR(r30)(r4)
+	lwz	r31, VCPU_GPR(r31)(r4)
+
+lightweight_exit:
+	stw	r2, HOST_R2(r1)
+
+	/* Load some guest volatiles. */
+	lwz	r0, VCPU_GPR(r0)(r4)
+	lwz	r2, VCPU_GPR(r2)(r4)
+	lwz	r9, VCPU_GPR(r9)(r4)
+	lwz	r10, VCPU_GPR(r10)(r4)
+	lwz	r11, VCPU_GPR(r11)(r4)
+	lwz	r12, VCPU_GPR(r12)(r4)
+	lwz	r13, VCPU_GPR(r13)(r4)
+	lwz	r3, VCPU_LR(r4)
+	mtlr	r3
+	lwz	r3, VCPU_XER(r4)
+	mtxer	r3
+
+	/* Prevent any TLB updates. */
+	mfmsr	r5
+	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@h
+	ori	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
+	andc	r6,r5,r6
+	mtmsr	r6
+
+	/* Save all the host TLB mappings. */
+	addi	r3, r4, VCPU_HOST_TLB - 4
+	li	r6, 0
+1:
+	tlbre	r7, r6, PPC44x_TLB_PAGEID
+	mfspr	r5, SPRN_MMUCR
+	stwu	r5, 4(r3)
+	stwu	r7, 4(r3)
+	tlbre	r7, r6, PPC44x_TLB_XLAT
+	stwu	r7, 4(r3)
+	tlbre	r7, r6, PPC44x_TLB_ATTRIB
+	stwu	r7, 4(r3)
+	addi	r6, r6, 1
+	cmpwi	r6, PPC44x_TLB_SIZE
+	blt	1b
+
+	/* Create the trampoline mapping. */
+	lwz	r6, VCPU_TRAMPOLINE_TLBE(r4)
+	mulli	r3, r6, TLBE_BYTES
+	add	r3, r3, r4
+	lwz	r7, (VCPU_SHADOW_TLB + 0)(r3)
+	mtspr	SPRN_MMUCR, r7
+	lwz	r7, (VCPU_SHADOW_TLB + 4)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_PAGEID
+	lwz	r7, (VCPU_SHADOW_TLB + 8)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_XLAT
+	lwz	r7, (VCPU_SHADOW_TLB + 12)(r3)
+	tlbwe	r7, r6, PPC44x_TLB_ATTRIB
+
+	/* Switch the IVPR to the trampoline. */
+	/* XXX If we take a TLB miss after this we're screwed! */
+	lwz	r8, VCPU_TRAMPOLINE(r4)
+	mtspr	SPRN_IVPR, r8
+
+	/* Transpose the vcpu pointer into the trampoline mapping. */
+	rlwimi	r8, r4, 0, 32 - VCPU_SIZE_LOG, 31
+	mr	r4, r8
+	/* Save vcpu pointer for the exception handlers. */
+	mtspr	SPRN_SPRG1, r4
+
+	/* Can't switch the stack pointer until after IVPR is switched,
+	 * because host interrupt handlers would get confused. */
+	lwz	r1, VCPU_GPR(r1)(r4)
+
+	/* Similarly, host interrupt handlers could clobber the SPRGs. In
+	 * fact, they may have already, which is why we need to reload the
+	 * registers here. */
+	lwz	r3, VCPU_SPRG4(r4)
+	mtspr	SPRN_SPRG4, r3
+	lwz	r3, VCPU_SPRG5(r4)
+	mtspr	SPRN_SPRG5, r3
+	lwz	r3, VCPU_SPRG6(r4)
+	mtspr	SPRN_SPRG6, r3
+	lwz	r3, VCPU_SPRG7(r4)
+	mtspr	SPRN_SPRG7, r3
+
+	/* Need absolute branch to reach kvm_trampoline_resume_guest() in the
+	 * trampoline. */
+	lwz	r3, VCPU_RESUME_GUEST(r4)
+	mtctr	r3
+
+	iccci	0, 0 /* XXX hack */
+
+	bctr
diff --git a/drivers/kvm/powerpc/hack.c b/drivers/kvm/powerpc/hack.c
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/hack.c
@@ -0,0 +1,673 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/debugfs.h>
+#include <linux/vmalloc.h>
+#include <linux/miscdevice.h>
+
+#include <asm/uaccess.h>
+#include <asm/cache.h>
+#include <asm/cacheflush.h>
+#include <asm/time.h>
+
+#include "kvm.h"
+#include "tlb.h"
+
+
+/* TODO: use vcpu_printf() */
+
+void kvm_dump_vcpu(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	printk("pc:   %08x msr:  %08x\n", vcpu->pc, vcpu->guest_msr);
+	printk("lr:   %08x ctr:  %08x\n", vcpu->lr, vcpu->ctr);
+	printk("srr0: %08x srr1: %08x\n", vcpu->srr0, vcpu->srr1);
+	for (i = 0; i < 32; i += 4) {
+		printk("gpr%02d: %08x %08x %08x %08x\n", i,
+		       vcpu->gpr[i],
+		       vcpu->gpr[i+1],
+		       vcpu->gpr[i+2],
+		       vcpu->gpr[i+3]);
+	}
+}
+
+/* Assumes 4KB pages on the host. */
+hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	struct page *page;
+	struct vm_area_struct *vma;
+	unsigned long hva = gpa + vcpu->kvm->ram_base;
+	hpa_t hpa;
+
+	/* XXX double-check (lack of) locking */
+	vma = find_extend_vma(current->mm, hva);
+	BUG_ON(!vma);
+	if (!vma)
+		return ~0UL;
+
+	page = follow_page(vma, hva, 0);
+	BUG_ON(!page);
+	if (!page)
+		return ~0UL;
+
+	hpa = (page_to_pfn(page) << PAGE_SHIFT) | (gpa & ~PAGE_MASK);
+	return hpa;
+}
+
+void kvm_decrementer_func(unsigned long data)
+{
+	struct kvm_vcpu *vcpu = (struct vcpu *)data;
+
+	vcpu->tsr |= TSR_DIS;
+	queue_exception(vcpu, PPC44x_INTERRUPT_DECREMENTER);
+}
+
+static struct kvm_vcpu *alloc_vcpu(struct kvm *kvm)
+{
+	unsigned long ivorlist[16];
+	struct kvm_vcpu *vcpu;
+	unsigned long base;
+	void *handlers;
+	void *resume_host;
+	void *resume_guest;
+	struct vm_struct *area;
+	struct tlbe *tlbe;
+	int i;
+
+	/* IVPR must be 16-bit aligned, so we need a 64KB allocation. This
+	 * must be physically contiguous so that a single TLB entry maps the
+	 * whole thing. */
+	base = __get_free_pages(GFP_KERNEL, VCPU_SIZE_ORDER);
+	printk("base: %lx\n", base);
+	if (!base)
+		return NULL;
+
+	/* Our trampoline cannot be mapped by the kernel linear mapping,
+	 * because
+	 * a) for performance and simplicity we create a mapping for it in TLB
+	 * entry 0, and
+	 * b) when swapping the TLB, it cannot be mapped by two entries
+	 * simultaneously.
+	 *
+	 * Further, by reserving a virtual address area from all other uses, we
+	 * can avoid frequent icache flushes. We manually handle the mapping
+	 * instead of letting the kernel do it to avoid the simultaneous
+	 * mapping issue.
+	 *
+	 * We must use VM_IOREMAP to ensure we get an area with the required
+	 * alignment.
+	 */
+	area = get_vm_area(VCPU_SIZE_BYTES, VM_IOREMAP);
+	printk("trampoline: %p\n", area->addr);
+	if (!area) {
+		free_pages(base, VCPU_SIZE_ORDER);
+		return NULL;
+	}
+
+	handlers = (void *)base;
+	clear_pages(handlers, VCPU_SIZE_ORDER);
+
+	/* XXX make sure our handlers are smaller than Linux's */
+	/* XXX do we need to check ordering? IVOR15 is always greatest? */
+
+	/* Copy our interrupt handlers to match host IVORs. That way we don't
+	 * have to swap the IVORs on every guest/host transition. */
+	ivorlist[0] = mfspr(SPRN_IVOR0);
+	ivorlist[1] = mfspr(SPRN_IVOR1);
+	ivorlist[2] = mfspr(SPRN_IVOR2);
+	ivorlist[3] = mfspr(SPRN_IVOR3);
+	ivorlist[4] = mfspr(SPRN_IVOR4);
+	ivorlist[5] = mfspr(SPRN_IVOR5);
+	ivorlist[6] = mfspr(SPRN_IVOR6);
+	ivorlist[7] = mfspr(SPRN_IVOR7);
+	ivorlist[8] = mfspr(SPRN_IVOR8);
+	ivorlist[9] = mfspr(SPRN_IVOR9);
+	ivorlist[10] = mfspr(SPRN_IVOR10);
+	ivorlist[11] = mfspr(SPRN_IVOR11);
+	ivorlist[12] = mfspr(SPRN_IVOR12);
+	ivorlist[13] = mfspr(SPRN_IVOR13);
+	ivorlist[14] = mfspr(SPRN_IVOR14);
+	ivorlist[15] = mfspr(SPRN_IVOR15);
+	for (i = 0; i < 16; i++) {
+		memcpy(handlers + ivorlist[i],
+		       kvm_trampoline_start + i * kvm_trampoline_handler_len,
+		       kvm_trampoline_handler_len);
+	}
+
+	/* Copy in the trampoline code which is shared by all handlers. */
+	resume_host = handlers + ivorlist[15] + kvm_trampoline_handler_len;
+	memcpy(resume_host, kvm_trampoline_resume_host,
+	       kvm_trampoline_resume_host_len);
+
+	resume_guest = resume_host + kvm_trampoline_resume_host_len;
+	memcpy(resume_guest, kvm_trampoline_resume_guest,
+	       kvm_trampoline_resume_guest_len);
+
+	/* Manually fix up the handler branches, since we moved the code away
+	 * from its link address. */
+	for (i = 0; i < 16; i++) {
+		unsigned long *branch;
+		branch = handlers + ivorlist[i] + kvm_trampoline_handler_len
+		         - 4;
+		*branch |= resume_host - (void *)branch;
+	}
+
+	/* Place vcpu data structure after the trampoline code. */
+	vcpu = resume_guest + kvm_trampoline_resume_guest_len;
+	vcpu->linear = vcpu;
+	vcpu->trampoline = area->addr;
+	vcpu->resume_guest = resume_guest - (void *)base + vcpu->trampoline;
+	vcpu->kvm = kvm;
+
+	/* Insert mapping for the trampoline. */
+	vcpu->trampoline_tlbe = 0;
+	tlbe = &vcpu->shadow_tlb[vcpu->trampoline_tlbe];
+	tlbe->mmucr = 0;
+	tlbe->word0 =
+		(unsigned long)vcpu->trampoline|VCPU_TLB_PGSZ|PPC44x_TLB_VALID;
+	tlbe->word1 = __pa(base);
+	tlbe->word2 = PPC44x_TLB_SX|PPC44x_TLB_SW|PPC44x_TLB_SR;
+	printk("tlb[%d]: %x %x %x %x\n", vcpu->trampoline_tlbe,
+		tlbe->mmucr,
+		tlbe->word0,
+		tlbe->word1,
+		tlbe->word2);
+
+	/* Flush any stale code from the icache. */
+	flush_icache_range(base, (unsigned long)vcpu);
+
+	setup_timer(&vcpu->dec_timer, kvm_decrementer_func,
+	            (unsigned long)vcpu);
+
+	return vcpu;
+}
+
+static int kvm_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	enum emulation_result er;
+
+	er = kvm_emulate_instruction(run, vcpu);
+	switch (er) {
+	case EMULATE_DONE:
+		/* Future optimization: only reload non-volatiles if they were
+		 * actually modified by emulation. */
+		return RESUME_GUEST_NV;
+	case EMULATE_MMIO_ASSIST:
+		run->exit_reason = KVM_EXIT_MMIO;
+		return RESUME_HOST;
+	case EMULATE_FAIL:
+		/* XXX Deliver Program interrupt to guest. */
+		printk("%s: emulation failed (%08x)\n", __func__,
+		       vcpu->last_inst);
+		return RESUME_HOST;
+	default:
+		BUG();
+	}
+}
+
+static const u32 interrupt_msr_mask[16] = {
+	[PPC44x_INTERRUPT_CRITICAL]      = MSR_ME,
+	[PPC44x_INTERRUPT_MACHINE_CHECK] = 0,
+	[PPC44x_INTERRUPT_DATA_STORAGE]  = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_INST_STORAGE]  = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_EXTERNAL]      = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_ALIGNMENT]     = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_PROGRAM]       = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_FP_UNAVAIL]    = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_SYSCALL]       = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_AP_UNAVAIL]    = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_DECREMENTER]   = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_FIT]           = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_WATCHDOG]      = MSR_ME,
+	[PPC44x_INTERRUPT_DTLB_MISS]     = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_ITLB_MISS]     = MSR_CE|MSR_ME|MSR_DE,
+	[PPC44x_INTERRUPT_DEBUG]         = MSR_ME,
+};
+
+/* Note: callers are responsible for setting registers like ESR or DEAR. */
+void queue_exception(struct kvm_vcpu *vcpu, int exception)
+{
+	//printk("exception %d\n", exception);
+	set_bit(exception, &vcpu->pending_exceptions);
+}
+
+/* XXX ordering */
+/* XXX other registers */
+static void deliver_interrupt(struct kvm_vcpu *vcpu)
+{
+	int exception = ffs(vcpu->pending_exceptions) - 1;
+
+	if (exception != -1) {
+		//printk("interrupt %d\n", exception);
+		BUG_ON(exception > 16);
+
+		clear_bit(exception, &vcpu->pending_exceptions);
+
+		vcpu->srr0 = vcpu->pc;
+		vcpu->srr1 = vcpu->guest_msr;
+		vcpu->pc = vcpu->ivpr | vcpu->ivor[exception];
+		vcpu->guest_msr &= interrupt_msr_mask[exception];
+		//printk("pc %x msr %x\n", vcpu->pc, vcpu->guest_msr);
+	}
+}
+
+static int tlb_index = 1;
+
+static void satisfy_fault(struct kvm_vcpu *vcpu, gva_t eaddr, gpa_t paddr,
+                          struct tlbe *gtlbe)
+{
+	struct tlbe *stlbe;
+	hpa_t hpa;
+	unsigned int epn = eaddr & PAGE_MASK;
+	unsigned int rpn = paddr & PAGE_MASK;
+
+	stlbe = &vcpu->shadow_tlb[tlb_index++];
+	if (tlb_index > PPC44x_TLB_SIZE)
+		/* XXX hardcodes trampoline in 0 */
+		tlb_index = 1;
+	/* Future optimization: don't overwrite the TLB entry containing the
+	 * current PC. */
+
+	hpa = gpa_to_hpa(vcpu, rpn);
+	if (hpa == ~0UL) {
+		/* XXX somebody forgot to mlock; kill guest. */
+		/* Ultimately we should have the host fault in the page. */
+		printk("gpa not in host memory!\n");
+		return;
+	}
+
+	stlbe->mmucr = vcpu->mmucr;
+	stlbe->word0 = epn | PPC44x_TLB_VALID | PPC44x_TLB_TS | PPC44x_TLB_4K;
+	stlbe->word1 = hpa;
+	stlbe->word2 = kvm_tlb_shadow_attrib(gtlbe);
+}
+
+int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                    unsigned int exit_nr)
+{
+	enum emulation_result er;
+	int r = RESUME_HOST;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+
+	/* XXX need to enable interrupts for all this code */
+
+	switch (exit_nr) {
+	case PPC44x_INTERRUPT_EXTERNAL:
+	case PPC44x_INTERRUPT_DECREMENTER:
+		/* We've already reset IVPR, so let the host handle this
+		 * interrupt (maybe even descheduling the guest). */
+		printk("pc %x\n", vcpu->pc);
+		local_irq_enable();
+		local_irq_disable();
+		r = RESUME_GUEST;
+		break;
+
+	case PPC44x_INTERRUPT_PROGRAM:
+		//printk("Program @ %x (%x)\n", vcpu->pc, vcpu->last_inst);
+		er = kvm_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			/* Future optimization: only reload non-volatiles if
+			 * they were actually modified by emulation. */
+			r = RESUME_GUEST_NV;
+			break;
+		case EMULATE_MMIO_ASSIST:
+			r = RESUME_HOST;
+			break;
+		case EMULATE_DCR_ASSIST:
+			run->exit_reason = KVM_EXIT_DCR;
+			r = RESUME_HOST;
+			break;
+		case EMULATE_FAIL:
+			/* XXX Deliver Program interrupt to guest. */
+			printk("%s: emulation at %x failed (%08x)\n",
+			       __func__, vcpu->pc, vcpu->last_inst);
+			r = RESUME_HOST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+
+	case PPC44x_INTERRUPT_SYSCALL:
+		queue_exception(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+
+	case PPC44x_INTERRUPT_DTLB_MISS: {
+		struct tlbe *gtlbe;
+		unsigned long eaddr = vcpu->fault_dear;
+
+		//printk("DTLB miss @ %x (eaddr %lx)\n", vcpu->pc, eaddr);
+
+		/* Check the guest TLB. */
+		dtlb_search(vcpu, eaddr, &gtlbe);
+		if (!gtlbe) {
+			/* The guest didn't have a mapping for it. */
+			queue_exception(vcpu, exit_nr);
+			vcpu->dear = vcpu->fault_dear;
+			vcpu->esr = vcpu->fault_esr;
+			r = RESUME_GUEST;
+			break;
+		}
+		vcpu->paddr_accessed = tlb_xlate(gtlbe, eaddr);
+
+		if (kvm_is_mmio(vcpu->kvm, vcpu->paddr_accessed)) {
+			r = kvm_emulate_mmio(run, vcpu);
+		} else {
+			/* The guest TLB had a mapping, but the shadow TLB
+			 * didn't, and it's not an IO mapping. This could be
+			 * because:
+			 * a) the trampoline mapping was using that entry, or
+			 * b) the guest used a large mapping which we're faking
+			 * Either way, we need to satisfy the fault without
+			 * invoking the guest. */
+			satisfy_fault(vcpu, eaddr, vcpu->paddr_accessed, gtlbe);
+			r = RESUME_GUEST;
+		}
+
+		}
+		break;
+
+	case PPC44x_INTERRUPT_ITLB_MISS: {
+		struct tlbe *gtlbe;
+		unsigned long eaddr = vcpu->pc;
+
+		//printk("ITLB miss @ %x\n", vcpu->pc);
+
+		/* Check the guest TLB. */
+		itlb_search(vcpu, eaddr, &gtlbe);
+		if (!gtlbe) {
+			/* The guest didn't have a mapping for it. */
+			queue_exception(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		}
+		vcpu->paddr_accessed = tlb_xlate(gtlbe, eaddr);
+
+		/* The guest TLB had a mapping, but the shadow TLB
+		 * didn't. This could be because:
+		 * a) the trampoline mapping was using that entry, or
+		 * b) the guest used a large mapping which we're faking
+		 * Either way, we need to satisfy the fault without
+		 * invoking the guest. */
+		satisfy_fault(vcpu, eaddr, vcpu->paddr_accessed, gtlbe);
+		r = RESUME_GUEST;
+
+		}
+		break;
+	}
+
+	if (vcpu->guest_msr & MSR_EE)
+		deliver_interrupt(vcpu);
+
+	if (signal_pending(current)) {
+		run->exit_reason = KVM_EXIT_INTR;
+		r = -EINTR;
+	} else
+		cond_resched();
+
+	return r;
+}
+
+static int load_guest(struct kvm_vcpu *vcpu, unsigned long guestaddr,
+                      unsigned long guestlen)
+{
+	struct tlbe *tlbe;
+	unsigned long hpa;
+	const unsigned long entry = 0x0040035c;
+	const unsigned long epn = entry & PAGE_MASK;
+
+	hpa = gpa_to_hpa(vcpu, epn);
+	if (hpa == ~0UL) {
+		printk("couldn't determine HPA\n");
+		return -1;
+	}
+
+	vcpu->pc = entry;
+	vcpu->guest_msr = 0;
+	vcpu->shadow_msr = MSR_PR|MSR_EE|MSR_IS|MSR_DS;
+	vcpu->gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */
+
+	/* Insert large initial mapping for guest. */
+	tlbe = &vcpu->guest_tlb[1];
+	tlbe->mmucr = 0;
+	tlbe->word0 = 0 | PPC44x_TLB_16M | PPC44x_TLB_VALID;
+	tlbe->word1 = 0;
+	tlbe->word2 = PPC44x_TLB_SX|PPC44x_TLB_SW|PPC44x_TLB_SR;
+	printk("gtlb[1]: %x %x %x %x\n", tlbe->mmucr, tlbe->word0,
+		tlbe->word1, tlbe->word2);
+
+	/* Insert UART0 mapping (as specified in bamboo.dts). */
+	tlbe = &vcpu->guest_tlb[2];
+	tlbe->mmucr = 0;
+	tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
+	tlbe->word1 = 0xef600000;
+	tlbe->word2 = PPC44x_TLB_SX|PPC44x_TLB_SW|PPC44x_TLB_SR |
+	              PPC44x_TLB_I|PPC44x_TLB_G;
+	printk("gtlb[2]: %x %x %x %x\n", tlbe->mmucr, tlbe->word0,
+		tlbe->word1, tlbe->word2);
+
+	return 0;
+}
+
+static void complete_dcr_load(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	u32 *gpr = &vcpu->gpr[vcpu->pending_io_gpr];
+	
+	*gpr = run->dcr.data;
+}
+
+static void complete_mmio_load(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	u32 *gpr = &vcpu->gpr[vcpu->pending_io_gpr];
+	void *data = run->mmio.data;
+
+	if (vcpu->pending_mmio_be) {
+		switch (run->mmio.len) {
+		case 4: *gpr = *(u32 *)data; break;
+		case 2: *gpr = *(u16 *)data; break;
+		case 1: *gpr = *(u8 *)data; break;
+		}
+	} else {
+		switch (run->mmio.len) {
+		case 4: *gpr = le32_to_cpup(data); break;
+		case 2: *gpr = le16_to_cpup(data); break;
+		case 1: *gpr = *(u8 *)data; break;
+		}
+	}
+}
+
+static long kvm_dev_ioctl(struct file *filp,
+			  unsigned int ioctl, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct kvm *kvm;
+	struct kvm_vcpu *vcpu;
+	long r = -EINVAL;
+	struct create {
+		u32 addr;
+		u32 len;
+	} create;
+
+	switch (ioctl) {
+	case KVM_PPC_IOCTL_CREATE:
+		r = copy_from_user(&create, argp, sizeof(create));
+		if (r)
+			return r;
+
+		kvm = kmalloc(sizeof(struct kvm), GFP_KERNEL);
+		if (!kvm)
+			return -ENOMEM;
+
+		kvm->iobase = 0x80000000;
+		kvm->ram_base = create.addr;
+		kvm->ram_size = create.len;
+		printk("guest RAM base: %lx\n", kvm->ram_base);
+
+		vcpu = alloc_vcpu(kvm);
+		if (!vcpu) {
+			printk("alloc_vcpu returned NULL\n");
+			kfree(kvm);
+			return -ENOMEM;
+		}
+		vcpu->pvr = 0x422218D4;
+
+		r = load_guest(vcpu, create.addr, create.len);
+		if (r) {
+			printk("load_guest returned %ld\n", r);
+			kfree(kvm);
+			/* XXX should free vcpu too */
+			return r;
+		}
+
+		/* 1 VM, 1 vcpu, 1 fd */
+		filp->private_data = vcpu;
+
+		break;
+
+	case KVM_PPC_IOCTL_RUN: {
+		struct kvm_run run;
+		vcpu = filp->private_data;
+
+		r = copy_from_user(&run, argp, sizeof(run));
+		if (r)
+			return r;
+
+		if (run.exit_reason == KVM_EXIT_MMIO) {
+			/* Userspace just emulated the pending IO. */
+			if (!run.mmio.is_write)
+				complete_mmio_load(&run, vcpu);
+			vcpu->pc += 4;
+		}
+
+		if (run.exit_reason == KVM_EXIT_DCR) {
+			/* Userspace just emulated the DCR IO. */
+			if (!run.dcr.is_write)
+				complete_dcr_load(&run,vcpu);
+			vcpu->pc += 4;
+		}
+
+		r = __vcpu_run(&run, vcpu);
+		local_irq_enable();
+
+		copy_to_user(argp, &run, sizeof(run));
+
+		}
+		break;
+
+	case KVM_PPC_IOCTL_GETREGS: {
+		struct kvm_regs regs;
+		vcpu = filp->private_data;
+
+		regs.pc = vcpu->pc;
+		regs.cr = vcpu->cr;
+		regs.ctr = vcpu->ctr;
+		regs.lr = vcpu->lr;
+		regs.xer = vcpu->xer;
+		regs.msr = vcpu->guest_msr;
+		regs.srr0 = vcpu->srr0;
+		regs.srr1 = vcpu->srr1;
+		regs.sprg0 = vcpu->sprg0;
+		regs.sprg1 = vcpu->sprg1;
+		regs.sprg2 = vcpu->sprg2;
+		regs.sprg3 = vcpu->sprg3;
+		regs.sprg5 = vcpu->sprg4;
+		regs.sprg6 = vcpu->sprg5;
+		regs.sprg7 = vcpu->sprg6;
+
+		memcpy(regs.gpr, vcpu->gpr, sizeof(regs.gpr));
+		memcpy(regs.fpr, vcpu->fpr, sizeof(regs.fpr));
+
+		r = copy_to_user(argp, &regs, sizeof(regs));
+		if (r)
+			return r;
+
+		}
+		break;
+
+	case KVM_PPC_IOCTL_SETREGS: {
+		struct kvm_regs regs;
+		vcpu = filp->private_data;
+
+		r = copy_from_user(&regs, argp, sizeof(regs));
+		if (r)
+			return r;
+
+		vcpu->pc = regs.pc;
+		vcpu->cr = regs.cr;
+		vcpu->ctr = regs.ctr;
+		vcpu->lr = regs.lr;
+		vcpu->xer = regs.xer;
+		vcpu->guest_msr = regs.msr;
+		vcpu->srr0 = regs.srr0;
+		vcpu->srr1 = regs.srr1;
+		vcpu->sprg0 = regs.sprg0;
+		vcpu->sprg1 = regs.sprg1;
+		vcpu->sprg2 = regs.sprg2;
+		vcpu->sprg3 = regs.sprg3;
+		vcpu->sprg5 = regs.sprg4;
+		vcpu->sprg6 = regs.sprg5;
+		vcpu->sprg7 = regs.sprg6;
+
+		memcpy(vcpu->gpr, regs.gpr, sizeof(vcpu->gpr));
+		memcpy(vcpu->fpr, regs.fpr, sizeof(vcpu->fpr));
+
+		}
+		break;
+	}
+
+	return r;
+}
+
+static struct file_operations kvm_chardev_ops = {
+	.owner          = THIS_MODULE,
+	.unlocked_ioctl = kvm_dev_ioctl,
+	.compat_ioctl   = kvm_dev_ioctl,
+};
+
+static struct miscdevice kvm_dev = {
+	KVM_MINOR,
+	"kvm",
+	&kvm_chardev_ops,
+};
+
+int hack_init(void)
+{
+	int r;
+
+	r = misc_register(&kvm_dev);
+	if (r) {
+		printk (KERN_ERR "kvm: misc device register failed\n");
+	}
+
+	return r;
+}
+
+void hack_exit(void)
+{
+}
+
+module_init(hack_init);
+module_exit(hack_exit);
diff --git a/drivers/kvm/powerpc/kvm-offsets.c b/drivers/kvm/powerpc/kvm-offsets.c
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/kvm-offsets.c
@@ -0,0 +1,56 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/stddef.h>
+#include <linux/types.h>
+#include "kvm.h"
+
+#define DEFINE(sym, val) \
+	asm volatile("\n->" #sym " %0 " #val : : "i" (val))
+
+int main(void)
+{
+	DEFINE(TLBE_BYTES, sizeof(struct tlbe));
+
+	DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, host_stack));
+	DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, host_tlb));
+	DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, shadow_tlb));
+	DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, gpr));
+	DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, lr));
+	DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, cr));
+	DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, xer));
+	DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, ctr));
+	DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, pc));
+	DEFINE(VCPU_GUEST_MSR, offsetof(struct kvm_vcpu, guest_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, shadow_msr));
+	DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, sprg4));
+	DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, sprg5));
+	DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, sprg6));
+	DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, sprg7));
+
+	DEFINE(VCPU_TRAMPOLINE, offsetof(struct kvm_vcpu, trampoline));
+	DEFINE(VCPU_TRAMPOLINE_TLBE, offsetof(struct kvm_vcpu, trampoline_tlbe));
+	DEFINE(VCPU_LINEAR, offsetof(struct kvm_vcpu, linear));
+	DEFINE(VCPU_RESUME_GUEST, offsetof(struct kvm_vcpu, resume_guest));
+	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, last_inst));
+	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, fault_dear));
+	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, fault_esr));
+	return 0;
+}
diff --git a/drivers/kvm/powerpc/kvm.h b/drivers/kvm/powerpc/kvm.h
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/kvm.h
@@ -0,0 +1,251 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#ifndef __KVM_POWERPC_KVM_H__
+#define __KVM_POWERPC_KVM_H__
+
+#include <asm/mmu-44x.h>
+
+/* IVPR must be 64KiB-aligned. */
+#define VCPU_SIZE_ORDER	4
+#define VCPU_SIZE_LOG	(VCPU_SIZE_ORDER + 12)
+#define VCPU_TLB_PGSZ	PPC44x_TLB_64K
+#define VCPU_SIZE_BYTES	(1<<VCPU_SIZE_LOG)
+
+#define PPC44x_INTERRUPT_CRITICAL 0
+#define PPC44x_INTERRUPT_MACHINE_CHECK 1
+#define PPC44x_INTERRUPT_DATA_STORAGE 2
+#define PPC44x_INTERRUPT_INST_STORAGE 3
+#define PPC44x_INTERRUPT_EXTERNAL 4
+#define PPC44x_INTERRUPT_ALIGNMENT 5
+#define PPC44x_INTERRUPT_PROGRAM 6
+#define PPC44x_INTERRUPT_FP_UNAVAIL 7
+#define PPC44x_INTERRUPT_SYSCALL 8
+#define PPC44x_INTERRUPT_AP_UNAVAIL 9
+#define PPC44x_INTERRUPT_DECREMENTER 10
+#define PPC44x_INTERRUPT_FIT 11
+#define PPC44x_INTERRUPT_WATCHDOG 12
+#define PPC44x_INTERRUPT_DTLB_MISS 13
+#define PPC44x_INTERRUPT_ITLB_MISS 14
+#define PPC44x_INTERRUPT_DEBUG 15
+
+/* MSR bits the guest is allowed to control. */
+#define GUEST_MSR_MASK (MSR_FP|MSR_FE0|MSR_SE|MSR_BE|MSR_FE1|MSR_PMM|MSR_LE)
+
+#define RESUME_HOST         0
+#define RESUME_GUEST        1
+#define RESUME_GUEST_NV     2
+
+#define KVM_PPC_IOCTL_CREATE 1
+#define KVM_PPC_IOCTL_RUN 2
+#define KVM_PPC_IOCTL_GETREGS 3
+#define KVM_PPC_IOCTL_SETREGS 4
+
+#ifndef __ASSEMBLY__
+
+#include <linux/mutex.h>
+#include <linux/timer.h>
+
+typedef u32 gva_t;
+typedef u64 gpa_t;
+typedef u64 hpa_t;
+
+enum kvm_exit_reason {
+	KVM_EXIT_UNKNOWN          = 0,
+	KVM_EXIT_EXCEPTION        = 1,
+	KVM_EXIT_IO               = 2,
+	KVM_EXIT_HYPERCALL        = 3,
+	KVM_EXIT_DEBUG            = 4,
+	KVM_EXIT_HLT              = 5,
+	KVM_EXIT_MMIO             = 6,
+	KVM_EXIT_IRQ_WINDOW_OPEN  = 7,
+	KVM_EXIT_SHUTDOWN         = 8,
+	KVM_EXIT_FAIL_ENTRY       = 9,
+	KVM_EXIT_INTR             = 10,
+	KVM_EXIT_DCR              = 11,
+};
+
+enum emulation_result {
+	EMULATE_DONE,
+	EMULATE_MMIO_ASSIST,
+	EMULATE_DCR_ASSIST,
+	EMULATE_FAIL,
+};
+
+
+/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
+struct kvm_run {
+	__u32 exit_reason;
+
+	union {
+		/* KVM_EXIT_MMIO */
+		struct {
+			__u64 phys_addr;
+			__u8  data[8];
+			__u32 len;
+			__u8  is_write;
+		} mmio;
+		/* KVM_EXIT_DCR */
+		struct {
+			__u32 dcrn;
+			__u32 data;
+			__u8  is_write;
+		} dcr;
+		/* Fix the size of the union. */
+		char padding[256];
+	};
+};
+
+struct kvm_stat {
+	u32 exits;
+	u32 mmio_exits;
+	u32 signal_exits;
+	u32 light_exits;
+};
+
+struct tlbe {
+	u32 mmucr;
+	u32 word0;
+	u32 word1;
+	u32 word2;
+};
+
+struct kvm {
+	gpa_t iobase;
+	unsigned long ram_base;
+	unsigned long ram_size;
+};
+
+extern void kvm_decrementer_func(unsigned long);
+
+struct kvm_vcpu {
+	/* This is an unmodified copy of the guest's TLB. */
+	struct tlbe guest_tlb[PPC44x_TLB_SIZE];
+	/* This is the TLB that's actually used when the guest is running. */
+	struct tlbe shadow_tlb[PPC44x_TLB_SIZE];
+	/* This is a copy of the host's TLB. */
+	struct tlbe host_tlb[PPC44x_TLB_SIZE];
+
+	u32 host_stack;
+
+	u64 fpr[32];
+	u32 gpr[32];
+
+	u32 pc;
+	u32 cr;
+	u32 ctr;
+	u32 lr;
+	u32 xer;
+
+	u32 guest_msr;
+	u32 shadow_msr; /* XXX this could be replaced with assembly */
+	u32 mmucr;
+	u32 sprg0;
+	u32 sprg1;
+	u32 sprg2;
+	u32 sprg3;
+	u32 sprg4;
+	u32 sprg5;
+	u32 sprg6;
+	u32 sprg7;
+	u32 srr0;
+	u32 srr1;
+	u32 csrr0;
+	u32 csrr1;
+	u32 dsrr0;
+	u32 dsrr1;
+	u32 dear;
+	u32 esr;
+	u32 dec;
+	u32 decar;
+	u32 tbl;
+	u32 tbu;
+	u32 tcr;
+	u32 tsr;
+	u32 ivor[16];
+	u32 ivpr;
+	u32 pir;
+	u32 pid;
+	u32 pvr;
+	u32 ccr0;
+	u32 ccr1;
+	u32 dbcr0;
+	u32 dbcr1;
+
+	struct kvm *kvm;
+	struct kvm_stat stat;
+	struct mutex mutex;
+	void *linear;           /* Virtual address used by the kernel. */
+	void *trampoline;       /* Virtual address used for the trampoline. */
+	void *resume_guest;     /* Trampoline address of resume_guest(). */
+	unsigned int trampoline_tlbe;
+	u32 last_inst;
+	u32 fault_dear;
+	u32 fault_esr;
+	gpa_t paddr_accessed;
+	unsigned int pending_io_gpr; /* GPR used as IO source/target */
+	u8 pending_mmio_be;          /* big-endian access? */
+	struct timer_list dec_timer;
+	unsigned long pending_exceptions;
+};
+
+struct kvm_regs {
+	__u32 pc;
+	__u32 cr;
+	__u32 ctr;
+	__u32 lr;
+	__u32 xer;
+
+	__u32 msr;
+
+	__u32 srr0;
+	__u32 srr1;
+
+	__u32 sprg0;
+	__u32 sprg1;
+	__u32 sprg2;
+	__u32 sprg3;
+	__u32 sprg4;
+	__u32 sprg5;
+	__u32 sprg6;
+	__u32 sprg7;
+
+	__u64 fpr[32];
+	__u32 gpr[32];
+};
+
+extern int __vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+
+extern char kvm_trampoline_start[];
+extern void kvm_trampoline_resume_host(void);
+extern unsigned long kvm_trampoline_resume_host_len;
+extern void kvm_trampoline_resume_guest(void);
+extern unsigned long kvm_trampoline_resume_guest_len;
+extern unsigned long kvm_trampoline_handler_len;
+
+extern hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
+extern void kvm_dump_vcpu(struct kvm_vcpu *vcpu);
+extern void queue_exception(struct kvm_vcpu *vcpu, int exception);
+
+extern int kvm_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __KVM_POWERPC_KVM_H__ */
diff --git a/drivers/kvm/powerpc/tlb.c b/drivers/kvm/powerpc/tlb.c
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/tlb.c
@@ -0,0 +1,108 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+
+#include "kvm.h"
+#include "tlb.h"
+
+#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW)
+#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW)
+
+u32 kvm_tlb_shadow_attrib(struct tlbe *tlbe)
+{
+	u32 attrib = tlbe->word2;
+	u8 user_perms = attrib & PPC44x_TLB_USER_PERM_MASK;
+	u8 super_perms = attrib & PPC44x_TLB_SUPER_PERM_MASK;
+
+	/* Clear all permissions. */
+	attrib &= ~(PPC44x_TLB_USER_PERM_MASK|
+		    PPC44x_TLB_SUPER_PERM_MASK);
+
+	/* Since the guest kernel runs in user mode, we must
+	 * translate guest supervisor to host user
+	 * permissions. */
+	attrib |= super_perms << 3;
+
+	/* Make sure host can always access this memory. */
+	attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW;
+
+	if (user_perms == 0) {
+		/* Since we can no longer rely on the user
+		 * permission bits to provide guest user/kernel
+		 * protection, this mapping must be invalidated
+		 * in the TLB whenever the guest userspace is
+		 * running. */
+
+		/* XXX scoreboard */
+	}
+
+	return attrib;
+}
+
+/* Search the guest TLB for a matching entry. */
+int tlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
+               unsigned int as, struct tlbe **match)
+{
+	int i;
+
+	/* XXX Replace loop with fancy data structures. */
+	for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+		struct tlbe *tlbe = &vcpu->guest_tlb[i];
+		unsigned int tid;
+
+		if (eaddr < get_tlb_eaddr(tlbe))
+			continue;
+
+		if (eaddr > get_tlb_end(tlbe))
+			continue;
+
+		tid = get_tlb_tid(tlbe);
+		if (tid && (tid != pid))
+			continue;
+
+		if (!get_tlb_v(tlbe))
+			continue;
+
+		if (get_tlb_ts(tlbe) != as)
+			continue;
+
+		if (match)
+			*match = tlbe;
+		return i;
+	}
+
+	if (match)
+		*match = NULL;
+	return -1;
+}
+
+int itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, struct tlbe **match)
+{
+	unsigned int as = !!(vcpu->guest_msr & MSR_IS);
+	return tlb_search(vcpu, eaddr, vcpu->pid, as, match);
+}
+
+int dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, struct tlbe **match)
+{
+	unsigned int as = !!(vcpu->guest_msr & MSR_DS);
+	return tlb_search(vcpu, eaddr, vcpu->pid, as, match);
+}
diff --git a/drivers/kvm/powerpc/tlb.h b/drivers/kvm/powerpc/tlb.h
new file mode 100644
--- /dev/null
+++ b/drivers/kvm/powerpc/tlb.h
@@ -0,0 +1,103 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#ifndef __KVM_POWERPC_TLB_H__
+#define __KVM_POWERPC_TLB_H__
+
+#include <asm/mmu-44x.h>
+
+#include "kvm.h"
+
+struct kvm;
+struct kvm_vcpu;
+
+extern int tlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
+                      unsigned int as, struct tlbe **match);
+extern int itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, struct tlbe **match);
+extern int dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr, struct tlbe **match);
+
+extern u32 kvm_tlb_shadow_attrib(struct tlbe *tlbe);
+
+static inline int kvm_is_mmio(const struct kvm *kvm, gpa_t paddr)
+{
+	return paddr > kvm->iobase;
+}
+
+/* TLB helper functions */
+static inline unsigned int get_tlb_size(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 4) & 0xf;
+}
+
+static inline gva_t get_tlb_eaddr(const struct tlbe *tlbe)
+{
+	return tlbe->word0 & 0xfffffc00;
+}
+
+static inline gva_t get_tlb_bytes(const struct tlbe *tlbe)
+{
+	unsigned int pgsize = get_tlb_size(tlbe);
+	return 1 << 10 << (pgsize << 1);
+}
+
+static inline gva_t get_tlb_end(const struct tlbe *tlbe)
+{
+	return get_tlb_eaddr(tlbe) + get_tlb_bytes(tlbe) - 1;
+}
+
+static inline u64 get_tlb_raddr(const struct tlbe *tlbe)
+{
+	u64 word1 = tlbe->word1;
+	return ((word1 & 0xf) << 32) | (word1 & 0xfffffc00);
+}
+
+static inline unsigned int get_tlb_tid(const struct tlbe *tlbe)
+{
+	return tlbe->mmucr & 0xff;
+}
+
+static inline unsigned int get_tlb_ts(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 8) & 0x1;
+}
+
+static inline unsigned int get_tlb_v(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 9) & 0x1;
+}
+
+static inline unsigned int get_mmucr_stid(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->mmucr & 0xff;
+}
+
+static inline unsigned int get_mmucr_sts(const struct kvm_vcpu *vcpu)
+{
+	return (vcpu->mmucr >> 16) & 0x1;
+}
+
+static inline gpa_t tlb_xlate(struct tlbe *tlbe, gva_t eaddr)
+{
+	unsigned int pgmask = get_tlb_bytes(tlbe) - 1;
+
+	return get_tlb_raddr(tlbe) | (eaddr & pgmask);
+}
+
+#endif /* __KVM_POWERPC_TLB_H__ */

[-- Attachment #3: Type: text/plain, Size: 228 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PowerPC 440 progress
  2007-09-18 22:42 PowerPC 440 progress Hollis Blanchard
@ 2007-09-19  0:32 ` Tim Anderson
  0 siblings, 0 replies; 2+ messages in thread
From: Tim Anderson @ 2007-09-19  0:32 UTC (permalink / raw)
  To: 'Hollis Blanchard', 'kvm-ppc-devel'; +Cc: 'kvm-devel'

Great job Hollis! 

> -----Original Message-----
> From: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org 
> [mailto:kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of 
> Hollis Blanchard
> Sent: Tuesday, September 18, 2007 3:42 PM
> To: kvm-ppc-devel
> Cc: kvm-devel
> Subject: [kvm-devel] PowerPC 440 progress
> 
> With the attached patch, we can now execute a 440 Linux guest on a 440
> host through many initcalls:
> 
>         CPU clock-frequency <- 0x27bc86ae (667MHz)
>         CPU timebase-frequency <- 0x27bc86ae (667MHz)
>         /plb: clock-frequency <- 9ef21ab (167MHz)
>         /plb/opb: clock-frequency <- 4f790d5 (83MHz)
>         /plb/opb/ebc: clock-frequency <- 34fb5e3 (56MHz)
>         /plb/opb/serial@ef600300: clock-frequency <- a8c000 (11MHz)
>         /plb/opb/serial@ef600400: clock-frequency <- a8c000 (11MHz)
>         /plb/opb/serial@ef600500: clock-frequency <- a8c000 (11MHz)
>         /plb/opb/serial@ef600600: clock-frequency <- a8c000 (11MHz)
>         Memory <- <0x0 0x0 0x9000000> (144MB)
>         ENET0: local-mac-address <- 00:00:00:00:00:00
>         ENET1: local-mac-address <- 00:00:00:00:00:00
>         
>         zImage starting: loaded at 0x00400000 (sp: 0x00fffe98)
>         Allocating 0x263c5c bytes for kernel ...
>         gunzipping (0x00000000 <- 
> 0x0040b000:0x00661acc)...done 0x243a9c bytes
>         
>         Linux/PowerPC load: 
>         Finalizing device tree... flat tree at 0x66e3a0
>         id mach(): done
>         MMU:enter
>         MMU:hw init
>         MMU:mapin
>         MMU:setio
>         MMU:exit
>         Using Bamboo machine description
>         Linux version 2.6.23-rc1 (hollisb@basalt) (gcc 
> version 3.4.2) #88 Tue Sep 18 17:18:36 CDT 2007
>         console [udbg0] enabled
>         setup_arch: bootmem
>         arch: exit
>         Zone PFN ranges:
>           DMA             0 ->    36864
>           Normal      36864 ->    36864
>         Movable zone start PFN for each node
>         early_node_map[1] active PFN ranges
>             0:        0 ->    36864
>         Built 1 zonelists in Zone order.  Total pages: 36576
>         Kernel command line: console=ttyS0 debug
>         UIC0 (32 IRQ sources) at DCR 0xc0
>         UIC1 (32 IRQ sources) at DCR 0xd0
>         PID hash table entries: 1024 (order: 10, 4096 bytes)
>         time_init: decrementer frequency = 666.666670 MHz
>         time_init: processor frequency   = 666.666670 MHz
>         Dentry cache hash table entries: 32768 (order: 5, 
> 131072 bytes)
>         Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
>         Memory: 143500k/147456k available (2192k kernel code, 
> 3816k reserved, 100k data, 127k bss, 124k init)
>         Calibrating delay loop... 1167.36 BogoMIPS (lpj=2334720)
>         Mount-cache hash table entries: 512
>         NET: Registered protocol family 16
>         
>         PCI: Probing PCI hardware
>         NET: Registered protocol family 2
>         IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
>         TCP established hash table entries: 8192 (order: 4, 
> 65536 bytes)
>         TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
>         TCP: Hash tables configured (established 8192 bind 8192)
>         TCP reno registered
>         io scheduler noop registered
>         io scheduler anticipatory registered (default)
>         io scheduler deadline registered
>         io scheduler cfq registered
>         Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, 
> IRQ sharing enabled
> 
> The guest currently seems to be stuck in the serial driver 
> reading IER.
> Qemu doesn't seem to be getting the accesses though, so more debugging
> is required.
> 
> Also, signal delivery and scheduling other host tasks are now working,
> which makes for a nicer development environment. If you run "gdb qemu"
> on the host, you can at least do a post-mortem of guest memory.
> 
> Interesting note (at least, I thought it was interesting): since the
> guest can read the timebase without trapping, we must always 
> report the
> real timebase frequency to the guest.
> 
> The easiest way to do this right now was to implement DCR-read
> passthrough, since that's where the Linux bootwrapper gets the
> frequencies for the device tree. Long-term, we may want to have qemu
> supply a device tree itself (but it still must report the real
> frequency).
> 
> Another interesting note: since the guest can read SPRG4-7 without
> trapping, we must context-switch those registers.
> 
> Signed-off-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
> -- 
> Hollis Blanchard
> IBM Linux Technology Center
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-09-19  0:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-18 22:42 PowerPC 440 progress Hollis Blanchard
2007-09-19  0:32 ` Tim Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox