[PATCH V4 0/7] Add initial SPAPR PPC64 architecture support

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support
@ 2012-01-31  6:34 Matt Evans
  2012-01-31  6:34 ` [PATCH V4 1/7] kvm tools: PPC64, add HPT/SDR1 for -PR KVM Matt Evans
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This series adds support for a PPC64 platform, SPAPR, on top of the previous
more general PPC64 CPU support.  This platform is paravirtualised, with most
of the MMU hypercalls being dealt with in the kernel.  Userland needs to
describe the environment to the machine with a device tree, cope with some
hypercalls (and RTAS calls) for hvconsole, PCI config and IRQs, and emulate
the interrupt controller (XICS) and PCI Host Bridge of the SPAPR-esque
machine.

With this series, normal pSeries kernels boot fine (including RH kernels that
include virtio support out of the box).  There is no support for PPC32 or
'bare metal' PPC64 guests as yet.

Processor state is set up as a guest kernel would expect (both primary and
secondaries), and SMP is fully supported.

The kernels are loaded as flat binaries and booted directly.  The intention
is to later support loading firmware such as SLOF.

Some of the code within is borrowed/based upon code in QEMU, particularly
the XICS emulation, device tree construction and PCI setup.

-- Changes since V3:
   - Gained another patch to provide PPC KVM-PR support (widens userbase :p )
     This allocates a page table, sets up a few more regs etc.  Also,
     removes restriction that mandates use of hugepages.
     Requires KVM-PR to implement H_BULK_REMOVE (separate patch)
   - Cleaned up FDT generation to detect host CPU & generate based
     on its properties; now supports PPC970 as well as POWER7
   - Removed redefinition of hypercall #defines, using <asm/hvcall.h> instead
   - More sensible MMIO layering; CPU passes MMIO to PHB, PHB then decides
     which PCI window is accessed.
   - Die if kernel-handled hypercalls ever get to userland
   - Build in libfdt as sugggested -- it's small and present in our sourcetree,
     and removes external dependency on 64bit libfdt (which isn't in some
     distros).
   - Spit & polish

Thanks to David & Alex for the PPC-related reviews!

Matt Evans (7):
  kvm tools: PPC64, add HPT/SDR1 for -PR KVM
  kvm tools: Generate SPAPR PPC64 guest device tree
  kvm tools: Add SPAPR PPC64 hcall & rtascall structure
  kvm tools: Add SPAPR PPC64 HV console
  kvm tools: Add PPC64 XICS interrupt controller support
  kvm tools: Add PPC64 PCI Host Bridge
  kvm tools: Add PPC64 kvm_cpu__emulate_io()

 tools/kvm/Makefile                           |   31 ++-
 tools/kvm/powerpc/cpu_info.c                 |   83 ++++
 tools/kvm/powerpc/cpu_info.h                 |   43 +++
 tools/kvm/powerpc/include/kvm/kvm-arch.h     |   17 +
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   12 +-
 tools/kvm/powerpc/irq.c                      |   43 ++-
 tools/kvm/powerpc/kvm-cpu.c                  |   64 +++-
 tools/kvm/powerpc/kvm.c                      |  290 ++++++++++++++-
 tools/kvm/powerpc/spapr.h                    |   93 +++++
 tools/kvm/powerpc/spapr_hcall.c              |  134 +++++++
 tools/kvm/powerpc/spapr_hvcons.c             |  102 +++++
 tools/kvm/powerpc/spapr_hvcons.h             |   19 +
 tools/kvm/powerpc/spapr_pci.c                |  423 +++++++++++++++++++++
 tools/kvm/powerpc/spapr_pci.h                |   57 +++
 tools/kvm/powerpc/spapr_rtas.c               |  229 ++++++++++++
 tools/kvm/powerpc/xics.c                     |  514 ++++++++++++++++++++++++++
 tools/kvm/powerpc/xics.h                     |   23 ++
 17 files changed, 2150 insertions(+), 27 deletions(-)
 create mode 100644 tools/kvm/powerpc/cpu_info.c
 create mode 100644 tools/kvm/powerpc/cpu_info.h
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V4 1/7] kvm tools: PPC64, add HPT/SDR1 for -PR KVM
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  6:34 ` [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree Matt Evans
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

Allocate a page table and point SDR1 to it in order to support the -PR
PPC64 KVM mode.  (The alternative, -HV mode, is available only on a small
set of machines.)

This patch also removes the previous dependency on mapping guest RAM with
huge pages; PR KVM doesn't require them so the user isn't forced to use them.

A new option, '--hugetlbfs default', uses a default path for 16M pages for
HV mode, if required.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/powerpc/include/kvm/kvm-arch.h |    2 ++
 tools/kvm/powerpc/kvm-cpu.c              |   24 +++++++++++++++++++++---
 tools/kvm/powerpc/kvm.c                  |   25 ++++++++++++++++++-------
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index c4b493c..8653871 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -52,6 +52,8 @@ struct kvm {
 	u64			ram_size;
 	void			*ram_start;
 
+	u64			sdr1;
+
 	bool			nmi_disabled;
 
 	bool			single_step;
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index ea99666..60379d0 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -76,7 +76,8 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id)
 	if (vcpu->kvm_run == MAP_FAILED)
 		die("unable to mmap vcpu fd");
 
-	ioctl(vcpu->vcpu_fd, KVM_ENABLE_CAP, &papr_cap);
+	if (ioctl(vcpu->vcpu_fd, KVM_ENABLE_CAP, &papr_cap) < 0)
+		die("unable to enable PAPR capability");
 
 	/*
 	 * We start all CPUs, directing non-primary threads into the kernel's
@@ -121,9 +122,26 @@ static void kvm_cpu__setup_regs(struct kvm_cpu *vcpu)
 static void kvm_cpu__setup_sregs(struct kvm_cpu *vcpu)
 {
 	/*
-	 * No sregs setup is required on PPC64/SPAPR (but there may be setup
-	 * required for non-paravirtualised platforms, e.g. TLB/SLB setup).
+	 * Some sregs setup to initialise SDR1/PVR/HIOR on PPC64 SPAPR
+	 * platforms using PR KVM.  (Technically, this is all ignored on
+	 * SPAPR HV KVM.)  Different setup is required for non-PV non-SPAPR
+	 * platforms!  (FIXME.)
 	 */
+	struct kvm_sregs sregs;
+	struct kvm_one_reg reg = {};
+
+	if (ioctl(vcpu->vcpu_fd, KVM_GET_SREGS, &sregs) < 0)
+		die("KVM_GET_SREGS failed");
+
+	sregs.u.s.sdr1 = vcpu->kvm->sdr1;
+
+	if (ioctl(vcpu->vcpu_fd, KVM_SET_SREGS, &sregs) < 0)
+		die("KVM_SET_SREGS failed");
+
+	reg.id = KVM_ONE_REG_PPC_HIOR;
+	reg.u.reg64 = 0;
+	if (ioctl(vcpu->vcpu_fd, KVM_SET_ONE_REG, &reg) < 0)
+		die("KVM_SET_ONE_REG failed");
 }
 
 /**
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 58982ff..8bd1fe2 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -72,19 +72,24 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 {
 	int cap_ppc_rma;
+	unsigned long hpt;
 
 	kvm->ram_size		= ram_size;
 
 	/*
-	 * Currently, we must map from hugetlbfs; if --hugetlbfs not specified,
-	 * try a default path:
+	 * Currently, HV-mode PPC64 SPAPR requires that we map from hugetlfs.
+	 * Allow a 'default' option to assist.
+	 * PR-mode does not require this.
 	 */
-	if (!hugetlbfs_path) {
-		hugetlbfs_path = HUGETLBFS_PATH;
-		pr_info("Using default %s for memory", hugetlbfs_path);
+	if (hugetlbfs_path) {
+		if (!strcmp(hugetlbfs_path, "default"))
+			hugetlbfs_path = HUGETLBFS_PATH;
+		kvm->ram_start = mmap_hugetlbfs(hugetlbfs_path, kvm->ram_size);
+	} else {
+		kvm->ram_start = mmap(0, kvm->ram_size, PROT_READ | PROT_WRITE,
+				      MAP_ANON | MAP_PRIVATE,
+				      -1, 0);
 	}
-
-	kvm->ram_start = mmap_hugetlbfs(hugetlbfs_path, kvm->ram_size);
 	if (kvm->ram_start == MAP_FAILED)
 		die("Couldn't map %lld bytes for RAM (%d)\n",
 		    kvm->ram_size, errno);
@@ -95,6 +100,12 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	kvm->rtas_gra = kvm->fdt_gra - RTAS_MAX_SIZE;
 	madvise(kvm->ram_start, kvm->ram_size, MADV_MERGEABLE);
 
+	/* FIXME:  SPAPR-PR specific; allocate a guest HPT. */
+	if (posix_memalign((void **)&hpt, (1<<HPT_ORDER), (1<<HPT_ORDER)))
+		die("Can't allocate %d bytes for HPT\n", (1<<HPT_ORDER));
+
+	kvm->sdr1 = ((hpt + 0x3ffffULL) & ~0x3ffffULL) | (HPT_ORDER-18);
+
 	/* FIXME: This is book3s-specific */
 	cap_ppc_rma = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA);
 	if (cap_ppc_rma == 2)
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
  2012-01-31  6:34 ` [PATCH V4 1/7] kvm tools: PPC64, add HPT/SDR1 for -PR KVM Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  7:59   ` Pekka Enberg
  2012-02-01  3:39   ` David Gibson
  2012-01-31  6:34 ` [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure Matt Evans
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

The generated DT is the bare minimum structure required for SPAPR (on which
subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory.

The DT contains CPU-specific configuration; a very simple 'cpu info' mechanism
is added to recognise/differentiate DT entries for POWER7 and PPC970 host CPUs.
Future support of more CPUs is possible.

libfdt is included from scripts/dtc/libfdt.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/Makefile                       |   26 ++++-
 tools/kvm/powerpc/cpu_info.c             |   83 ++++++++++++++++
 tools/kvm/powerpc/cpu_info.h             |   43 ++++++++
 tools/kvm/powerpc/include/kvm/kvm-arch.h |   11 ++
 tools/kvm/powerpc/kvm-cpu.c              |    1 +
 tools/kvm/powerpc/kvm.c                  |  154 +++++++++++++++++++++++++++++-
 6 files changed, 312 insertions(+), 6 deletions(-)
 create mode 100644 tools/kvm/powerpc/cpu_info.c
 create mode 100644 tools/kvm/powerpc/cpu_info.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index e3250a9..917fe97 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -105,6 +105,8 @@ ifeq ($(uname_M),x86_64)
 	DEFINES      += -DCONFIG_X86_64
 endif
 
+LIBFDT_SRC = fdt.o fdt_ro.o fdt_wip.o fdt_sw.o fdt_rw.o fdt_strerror.o
+LIBFDT_OBJS = $(patsubst %,../../scripts/dtc/libfdt/%,$(LIBFDT_SRC))
 
 ### Arch-specific stuff
 
@@ -130,9 +132,14 @@ ifeq ($(uname_M), ppc64)
 	OBJS	+= powerpc/ioport.o
 	OBJS	+= powerpc/irq.o
 	OBJS	+= powerpc/kvm.o
+	OBJS	+= powerpc/cpu_info.o
 	OBJS	+= powerpc/kvm-cpu.o
+# We use libfdt, but it's sometimes not packaged 64bit.  It's small too,
+# so just build it in:
+	CFLAGS 	+= -I../../scripts/dtc/libfdt
+	OBJS	+= $(LIBFDT_OBJS)
 	ARCH_INCLUDE := powerpc/include
-	CFLAGS += -m64
+	CFLAGS 	+= -m64
 endif
 
 ###
@@ -198,10 +205,6 @@ DEFINES	+= -DBUILD_ARCH='"$(ARCH)"'
 KVM_INCLUDE := include
 CFLAGS	+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -I$(KINCL_PATH)/include -I$(KINCL_PATH)/arch/$(ARCH)/include/ -O2 -fno-strict-aliasing -g
 
-ifneq ($(WERROR),0)
-	WARNINGS += -Werror
-endif
-
 WARNINGS += -Wall
 WARNINGS += -Wcast-align
 WARNINGS += -Wformat=2
@@ -220,6 +223,13 @@ WARNINGS += -Wwrite-strings
 
 CFLAGS	+= $(WARNINGS)
 
+# Some targets may use 'external' sources that don't build totally cleanly.
+CFLAGS_EASYGOING := $(CFLAGS)
+
+ifneq ($(WERROR),0)
+	CFLAGS += -Werror
+endif
+
 all: arch_support_check $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_INIT_S2)
 
 arch_support_check:
@@ -262,6 +272,12 @@ builtin-help.d: $(KVM_INCLUDE)/common-cmds.h
 
 $(OBJS):
 
+# This rule relaxes the -Werror on libfdt, since for now it still has
+# a bunch of warnings. :(
+../../scripts/dtc/libfdt/%.o: ../../scripts/dtc/libfdt/%.c
+	$(E) "  CC      " $@
+	$(Q) $(CC) -c $(CFLAGS_EASYGOING) $< -o $@
+
 util/rbtree.o: ../../lib/rbtree.c
 	$(E) "  CC      " $@
 	$(Q) $(CC) -c $(CFLAGS) $< -o $@
diff --git a/tools/kvm/powerpc/cpu_info.c b/tools/kvm/powerpc/cpu_info.c
new file mode 100644
index 0000000..c364b74
--- /dev/null
+++ b/tools/kvm/powerpc/cpu_info.c
@@ -0,0 +1,83 @@
+/*
+ * PPC CPU identification
+ *
+ * This is a very simple "host CPU info" struct to get us going.
+ * For the little host information we need, I don't want to grub about
+ * parsing stuff in /proc/device-tree so just match host PVR to differentiate
+ * PPC970 and POWER7 (which is all that's currently supported).
+ *
+ * Qemu does something similar but this is MUCH simpler!
+ *
+ * Copyright 2012 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "cpu_info.h"
+#include "kvm/util.h"
+
+/* POWER7 */
+
+/*
+ * Basic set of pages for POWER7.  It actually supports more but there were some
+ * limitations as to which may be advertised to the guest.  FIXME when this
+ * settles down -- for now use basic set:
+ */
+static u32 power7_page_sizes_prop[] = {0xc, 0x0, 0x1, 0xc, 0x0, 0x18, 0x100, 0x1, 0x18, 0x0};
+/* POWER7 has 1T segments, so advertise these */
+static u32 power7_segment_sizes_prop[] = {0x1c, 0x28, 0xffffffff, 0xffffffff};
+
+static struct cpu_info cpu_power7_info = {
+	"POWER7",
+	power7_page_sizes_prop, sizeof(power7_page_sizes_prop),
+	power7_segment_sizes_prop, sizeof(power7_segment_sizes_prop),
+	32, 		/* SLB size */
+	512000000, 	/* TB frequency */
+	128,		/* d-cache block size */
+	128,		/* i-cache block size */
+	CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX
+};
+
+/* PPC970/G5 */
+
+static u32 g5_page_sizes_prop[] = {0xc, 0x0, 0x1, 0xc, 0x0, 0x18, 0x100, 0x1, 0x18, 0x0};
+
+static struct cpu_info cpu_970_info = {
+	"G5",
+	g5_page_sizes_prop, sizeof(g5_page_sizes_prop),
+	0 /* Null = no segment sizes prop, use defaults */, 0,
+	0, 		/* SLB size default */
+	33333333, 	/* TB frequency */
+	128,		/* d-cache block size */
+	128,		/* i-cache block size */
+	CPUINFO_FLAG_VMX
+};
+
+/* This is a default catchall for 'no match' on PVR: */
+static struct cpu_info cpu_dummy_info = { "unknown", 0, 0, 0, 0, 0, 0, 0, 0 };
+
+static struct pvr_info host_pvr_info[] = {
+	{ 0xffffffff, 0x0f000003, &cpu_power7_info },
+	{ 0xffff0000, 0x003f0000, &cpu_power7_info },
+	{ 0xffff0000, 0x004a0000, &cpu_power7_info },
+	{ 0xffff0000, 0x00390000, &cpu_970_info },
+	{ 0xffff0000, 0x003c0000, &cpu_970_info },
+        { 0xffff0000, 0x00440000, &cpu_970_info },
+        { 0xffff0000, 0x00450000, &cpu_970_info },
+};
+
+struct cpu_info *find_cpu_info(u32 pvr)
+{
+	unsigned int i;
+	for (i = 0; i < sizeof(host_pvr_info)/sizeof(struct pvr_info); i++) {
+		if ((pvr & host_pvr_info[i].pvr_mask) ==
+		    host_pvr_info[i].pvr) {
+			return host_pvr_info[i].cpu_info;
+		}
+	}
+	/* Didn't find anything? Rut-ro. */
+	pr_warning("Host CPU unsupported by kvmtool\n");
+	return &cpu_dummy_info;
+}
diff --git a/tools/kvm/powerpc/cpu_info.h b/tools/kvm/powerpc/cpu_info.h
new file mode 100644
index 0000000..4a43ed5
--- /dev/null
+++ b/tools/kvm/powerpc/cpu_info.h
@@ -0,0 +1,43 @@
+/*
+ * PPC CPU identification
+ *
+ * Copyright 2012 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef CPU_INFO_H
+#define CPU_INFO_H
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+
+struct cpu_info {
+	const char	*name;
+	u32 		*page_sizes_prop;
+	u32		page_sizes_prop_len;
+	u32 		*segment_sizes_prop;
+	u32		segment_sizes_prop_len;
+	u32		slb_size;
+	u32		tb_freq;
+	u32		d_bsize;
+	u32		i_bsize;
+	u32		flags;
+};
+
+struct pvr_info {
+	u32		pvr_mask;
+	u32		pvr;
+	struct cpu_info *cpu_info;
+};
+
+/* Misc capabilities/CPU properties */
+#define CPUINFO_FLAG_DFP	0x00000001
+#define CPUINFO_FLAG_VMX	0x00000002
+#define CPUINFO_FLAG_VSX	0x00000004
+
+struct cpu_info *find_cpu_info(u32 pvr);
+
+#endif
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index 8653871..ca15500 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -53,6 +53,7 @@ struct kvm {
 	void			*ram_start;
 
 	u64			sdr1;
+	u32			pvr;
 
 	bool			nmi_disabled;
 
@@ -70,4 +71,14 @@ struct kvm {
 	int			vm_state;
 };
 
+/* Helper for the various bits of code that generate FDT nodes */
+#define _FDT(exp)							\
+	do {								\
+		int ret = (exp);					\
+		if (ret < 0) {						\
+			die("Error creating device tree: %s: %s\n",	\
+			    #exp, fdt_strerror(ret));			\
+		}							\
+	} while (0)
+
 #endif /* KVM__KVM_ARCH_H */
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 60379d0..a2beaac 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -134,6 +134,7 @@ static void kvm_cpu__setup_sregs(struct kvm_cpu *vcpu)
 		die("KVM_GET_SREGS failed");
 
 	sregs.u.s.sdr1 = vcpu->kvm->sdr1;
+	sregs.pvr = vcpu->kvm->pvr;
 
 	if (ioctl(vcpu->vcpu_fd, KVM_SET_SREGS, &sregs) < 0)
 		die("KVM_SET_SREGS failed");
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 8bd1fe2..842af37 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -3,6 +3,9 @@
  *
  * Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
  *
+ * Portions of FDT setup borrowed from QEMU, copyright 2010 David Gibson, IBM
+ * Corporation.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
  * by the Free Software Foundation.
@@ -10,6 +13,8 @@
 
 #include "kvm/kvm.h"
 #include "kvm/util.h"
+#include "libfdt.h"
+#include "cpu_info.h"
 
 #include <linux/kvm.h>
 
@@ -26,7 +31,8 @@
 #include <errno.h>
 
 #include <linux/byteorder.h>
-#include <libfdt.h>
+
+#define HPT_ORDER 24
 
 #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/"
 
@@ -36,6 +42,13 @@ struct kvm_ext kvm_req_ext[] = {
 	{ 0, 0 }
 };
 
+static uint32_t mfpvr(void)
+{
+	uint32_t r;
+	asm volatile ("mfpvr %0" : "=r"(r));
+	return r;
+}
+
 bool kvm__arch_cpu_supports_vm(void)
 {
 	return true;
@@ -106,6 +119,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 
 	kvm->sdr1 = ((hpt + 0x3ffffULL) & ~0x3ffffULL) | (HPT_ORDER-18);
 
+	kvm->pvr = mfpvr();
+
 	/* FIXME: This is book3s-specific */
 	cap_ppc_rma = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA);
 	if (cap_ppc_rma == 2)
@@ -183,9 +198,146 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
 	return false;
 }
 
+#define SMT_THREADS 4
+
+/*
+ * Set up the FDT for the kernel: This function is currently fairly SPAPR-heavy,
+ * and whilst most PPC targets will require CPU/memory nodes, others like RTAS
+ * should eventually be added separately.
+ */
 static void setup_fdt(struct kvm *kvm)
 {
+	uint64_t 	mem_reg_property[] = { 0, cpu_to_be64(kvm->ram_size) };
+	int 		smp_cpus = kvm->nrcpus;
+	char 		hypertas_prop_kvm[] = "hcall-pft\0hcall-term\0"
+		"hcall-dabr\0hcall-interrupt\0hcall-tce\0hcall-vio\0"
+		"hcall-splpar\0hcall-bulk";
+	int 		i, j;
+	char 		cpu_name[30];
+	u8		staging_fdt[FDT_MAX_SIZE];
+	struct cpu_info *cpu_info = find_cpu_info(kvm->pvr);
+
+	/* Generate an appropriate DT at kvm->fdt_gra */
+	void *fdt_dest = guest_flat_to_host(kvm, kvm->fdt_gra);
+	void *fdt = staging_fdt;
+
+	_FDT(fdt_create(fdt, FDT_MAX_SIZE));
+	_FDT(fdt_finish_reservemap(fdt));
+
+	_FDT(fdt_begin_node(fdt, ""));
+
+	_FDT(fdt_property_string(fdt, "device_type", "chrp"));
+	_FDT(fdt_property_string(fdt, "model", "IBM pSeries (kvmtool)"));
+	_FDT(fdt_property_cell(fdt, "#address-cells", 0x2));
+	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
+
+	/* /chosen */
+	_FDT(fdt_begin_node(fdt, "chosen"));
+	/* cmdline */
+	_FDT(fdt_property_string(fdt, "bootargs", kern_cmdline));
+	/* Initrd */
+	if (kvm->initrd_size != 0) {
+		uint32_t ird_st_prop = cpu_to_be32(kvm->initrd_gra);
+		uint32_t ird_end_prop = cpu_to_be32(kvm->initrd_gra +
+						    kvm->initrd_size);
+		_FDT(fdt_property(fdt, "linux,initrd-start",
+				   &ird_st_prop, sizeof(ird_st_prop)));
+		_FDT(fdt_property(fdt, "linux,initrd-end",
+				   &ird_end_prop, sizeof(ird_end_prop)));
+	}
+	_FDT(fdt_end_node(fdt));
+
+	/*
+	 * Memory: We don't alloc. a separate RMA yet.  If we ever need to
+	 * (CAP_PPC_RMA == 2) then have one memory node for 0->RMAsize, and
+	 * another RMAsize->endOfMem.
+	 */
+	_FDT(fdt_begin_node(fdt, "memory@0"));
+	_FDT(fdt_property_string(fdt, "device_type", "memory"));
+	_FDT(fdt_property(fdt, "reg", mem_reg_property,
+			  sizeof(mem_reg_property)));
+	_FDT(fdt_end_node(fdt));
+
+	/* CPUs */
+	_FDT(fdt_begin_node(fdt, "cpus"));
+	_FDT(fdt_property_cell(fdt, "#address-cells", 0x1));
+	_FDT(fdt_property_cell(fdt, "#size-cells", 0x0));
+
+	for (i = 0; i < smp_cpus; i += SMT_THREADS) {
+		int32_t pft_size_prop[] = { 0, HPT_ORDER };
+		uint32_t servers_prop[SMT_THREADS];
+		uint32_t gservers_prop[SMT_THREADS * 2];
+		int threads = (smp_cpus - i) >= SMT_THREADS ? SMT_THREADS :
+			smp_cpus - i;
+
+		sprintf(cpu_name, "PowerPC,%s@%d", cpu_info->name, i);
+		_FDT(fdt_begin_node(fdt, cpu_name));
+		sprintf(cpu_name, "PowerPC,%s", cpu_info->name);
+		_FDT(fdt_property_string(fdt, "name", cpu_name));
+		_FDT(fdt_property_string(fdt, "device_type", "cpu"));
+
+		_FDT(fdt_property_cell(fdt, "reg", i));
+		_FDT(fdt_property_cell(fdt, "cpu-version", kvm->pvr));
+
+		_FDT(fdt_property_cell(fdt, "dcache-block-size", cpu_info->d_bsize));
+		_FDT(fdt_property_cell(fdt, "icache-block-size", cpu_info->i_bsize));
+
+		_FDT(fdt_property_cell(fdt, "timebase-frequency", cpu_info->tb_freq));
+		/* Lies, but safeish lies! */
+		_FDT(fdt_property_cell(fdt, "clock-frequency", 0xddbab200));
+
+		if (cpu_info->slb_size)
+			_FDT(fdt_property_cell(fdt, "ibm,slb-size", cpu_info->slb_size));
+		/*
+		 * HPT size is hardwired; KVM currently fixes it at 16MB but the
+		 * moment that changes we'll need to read it out of the kernel.
+		 */
+		_FDT(fdt_property(fdt, "ibm,pft-size", pft_size_prop,
+				  sizeof(pft_size_prop)));
+
+		_FDT(fdt_property_string(fdt, "status", "okay"));
+		_FDT(fdt_property(fdt, "64-bit", NULL, 0));
+		/* A server for each thread in this core */
+		for (j = 0; j < SMT_THREADS; j++) {
+			servers_prop[j] = cpu_to_be32(i+j);
+			/*
+			 * Hack borrowed from QEMU, direct the group queues back
+			 * to cpu 0:
+			 */
+			gservers_prop[j*2] = cpu_to_be32(i+j);
+			gservers_prop[j*2 + 1] = 0;
+		}
+		_FDT(fdt_property(fdt, "ibm,ppc-interrupt-server#s",
+				   servers_prop, threads * sizeof(uint32_t)));
+		_FDT(fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
+				  gservers_prop,
+				  threads * 2 * sizeof(uint32_t)));
+		if (cpu_info->page_sizes_prop)
+			_FDT(fdt_property(fdt, "ibm,segment-page-sizes",
+					  cpu_info->page_sizes_prop,
+					  cpu_info->page_sizes_prop_len));
+		if (cpu_info->segment_sizes_prop)
+			_FDT(fdt_property(fdt, "ibm,processor-segment-sizes",
+					  cpu_info->segment_sizes_prop,
+					  cpu_info->segment_sizes_prop_len));
+		/* VSX / DFP options: */
+		if (cpu_info->flags & CPUINFO_FLAG_VMX)
+			_FDT(fdt_property_cell(fdt, "ibm,vmx",
+					       (cpu_info->flags &
+						CPUINFO_FLAG_VSX) ? 2 : 1));
+		if (cpu_info->flags & CPUINFO_FLAG_DFP)
+			_FDT(fdt_property_cell(fdt, "ibm,dfp", 0x1));
+		_FDT(fdt_end_node(fdt));
+	}
+	_FDT(fdt_end_node(fdt));
+
+	/* Finalise: */
+	_FDT(fdt_end_node(fdt)); /* Root node */
+	_FDT(fdt_finish(fdt));
 
+	_FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE));
+	_FDT(fdt_add_mem_rsv(fdt_dest, kvm->rtas_gra, kvm->rtas_size));
+	_FDT(fdt_pack(fdt_dest));
 }
 
 /**
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
  2012-01-31  6:34 ` [PATCH V4 1/7] kvm tools: PPC64, add HPT/SDR1 for -PR KVM Matt Evans
  2012-01-31  6:34 ` [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  8:11   ` Pekka Enberg
  2012-01-31  6:34 ` [PATCH V4 4/7] kvm tools: Add SPAPR PPC64 HV console Matt Evans
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This patch adds the basic structure for HV calls, their registration and some of
the simpler calls.  A similar layout for RTAS calls is also added, again with
some of the simpler RTAS calls used by the guest.  The SPAPR RTAS stub is
generated inline.  Also, nodes for RTAS are added to the device tree.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/Makefile              |    2 +
 tools/kvm/powerpc/kvm-cpu.c     |    6 +
 tools/kvm/powerpc/kvm.c         |   41 +++++++-
 tools/kvm/powerpc/spapr.h       |   84 ++++++++++++++
 tools/kvm/powerpc/spapr_hcall.c |  134 +++++++++++++++++++++++
 tools/kvm/powerpc/spapr_rtas.c  |  229 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 495 insertions(+), 1 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 917fe97..f6a8ac2 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -134,6 +134,8 @@ ifeq ($(uname_M), ppc64)
 	OBJS	+= powerpc/kvm.o
 	OBJS	+= powerpc/cpu_info.o
 	OBJS	+= powerpc/kvm-cpu.o
+	OBJS	+= powerpc/spapr_hcall.o
+	OBJS	+= powerpc/spapr_rtas.o
 # We use libfdt, but it's sometimes not packaged 64bit.  It's small too,
 # so just build it in:
 	CFLAGS 	+= -I../../scripts/dtc/libfdt
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index a2beaac..51bb718 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -14,6 +14,8 @@
 #include "kvm/util.h"
 #include "kvm/kvm.h"
 
+#include "spapr.h"
+
 #include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <signal.h>
@@ -169,6 +171,10 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
 	bool ret = true;
 	struct kvm_run *run = vcpu->kvm_run;
 	switch(run->exit_reason) {
+	case KVM_EXIT_PAPR_HCALL:
+		run->papr_hcall.ret = spapr_hypercall(vcpu, run->papr_hcall.nr,
+						      (target_ulong*)run->papr_hcall.args);
+		break;
 	default:
 		ret = false;
 	}
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 842af37..891fa65 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -16,6 +16,8 @@
 #include "libfdt.h"
 #include "cpu_info.h"
 
+#include "spapr.h"
+
 #include <linux/kvm.h>
 
 #include <sys/types.h>
@@ -126,6 +128,11 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	if (cap_ppc_rma == 2)
 		die("Need contiguous RMA allocation on this hardware, "
 		    "which is not yet supported.");
+
+	/* Do these before FDT setup, IRQ setup, etc. */
+	/* FIXME: SPAPR-specific */
+	hypercall_init();
+	register_core_rtas();
 }
 
 void kvm__arch_delete_ram(struct kvm *kvm)
@@ -231,6 +238,20 @@ static void setup_fdt(struct kvm *kvm)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 
+	/* RTAS */
+	_FDT(fdt_begin_node(fdt, "rtas"));
+	/* This is what the kernel uses to switch 'We're an LPAR'! */
+        _FDT(fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop_kvm,
+                           sizeof(hypertas_prop_kvm)));
+	_FDT(fdt_property_cell(fdt, "linux,rtas-base", kvm->rtas_gra));
+	_FDT(fdt_property_cell(fdt, "linux,rtas-entry", kvm->rtas_gra));
+	_FDT(fdt_property_cell(fdt, "rtas-size", kvm->rtas_size));
+	/* Now add properties for all RTAS tokens: */
+	if (spapr_rtas_fdt_setup(kvm, fdt))
+		die("Couldn't create RTAS FDT properties\n");
+
+	_FDT(fdt_end_node(fdt));
+
 	/* /chosen */
 	_FDT(fdt_begin_node(fdt, "chosen"));
 	/* cmdline */
@@ -345,7 +366,25 @@ static void setup_fdt(struct kvm *kvm)
  */
 int kvm__arch_setup_firmware(struct kvm *kvm)
 {
-	/* Load RTAS */
+	/*
+	 * Set up RTAS stub.  All it is is a single hypercall:
+	 *  0:   7c 64 1b 78     mr      r4,r3
+	 *  4:   3c 60 00 00     lis     r3,0
+	 *  8:   60 63 f0 00     ori     r3,r3,61440
+	 *  c:   44 00 00 22     sc      1
+	 * 10:   4e 80 00 20     blr
+	 */
+	uint32_t *rtas = guest_flat_to_host(kvm, kvm->rtas_gra);
+
+	rtas[0] = 0x7c641b78;
+	rtas[1] = 0x3c600000;
+	rtas[2] = 0x6063f000;
+	rtas[3] = 0x44000022;
+	rtas[4] = 0x4e800020;
+	kvm->rtas_size = 20;
+
+	pr_info("Set up %ld bytes of RTAS at 0x%lx\n",
+		kvm->rtas_size, kvm->rtas_gra);
 
 	/* Load SLOF */
 
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
new file mode 100644
index 0000000..1271daa
--- /dev/null
+++ b/tools/kvm/powerpc/spapr.h
@@ -0,0 +1,84 @@
+/*
+ * SPAPR definitions and declarations
+ *
+ * Borrowed heavily from QEMU's spapr.h,
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Modifications by Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#if !defined(__HW_SPAPR_H__)
+#define __HW_SPAPR_H__
+
+#include <inttypes.h>
+
+/* We need some of the H_ hcall defs, but they're __KERNEL__ only. */
+#define __KERNEL__
+#include <asm/hvcall.h>
+#undef __KERNEL__
+
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+
+typedef unsigned long target_ulong;
+typedef uintptr_t target_phys_addr_t;
+
+/*
+ * The hcalls above are standardized in PAPR and implemented by pHyp
+ * as well.
+ *
+ * We also need some hcalls which are specific to qemu / KVM-on-POWER.
+ * So far we just need one for H_RTAS, but in future we'll need more
+ * for extensions like virtio.  We put those into the 0xf000-0xfffc
+ * range which is reserved by PAPR for "platform-specific" hcalls.
+ */
+#define KVMPPC_HCALL_BASE       0xf000
+#define KVMPPC_H_RTAS           (KVMPPC_HCALL_BASE + 0x0)
+#define KVMPPC_HCALL_MAX        KVMPPC_H_RTAS
+
+#define DEBUG_SPAPR_HCALLS
+
+#ifdef DEBUG_SPAPR_HCALLS
+#define hcall_dprintf(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define hcall_dprintf(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef target_ulong (*spapr_hcall_fn)(struct kvm_cpu *vcpu,
+				       target_ulong opcode,
+                                       target_ulong *args);
+
+void hypercall_init(void);
+void register_core_rtas(void);
+
+void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
+target_ulong spapr_hypercall(struct kvm_cpu *vcpu, target_ulong opcode,
+                             target_ulong *args);
+
+int spapr_rtas_fdt_setup(struct kvm *kvm, void *fdt);
+
+static inline uint32_t rtas_ld(struct kvm *kvm, target_ulong phys, int n)
+{
+	return *((uint32_t *)guest_flat_to_host(kvm, phys + 4*n));
+}
+
+static inline void rtas_st(struct kvm *kvm, target_ulong phys, int n, uint32_t val)
+{
+	*((uint32_t *)guest_flat_to_host(kvm, phys + 4*n)) = val;
+}
+
+typedef void (*spapr_rtas_fn)(struct kvm_cpu *vcpu, uint32_t token,
+                              uint32_t nargs, target_ulong args,
+                              uint32_t nret, target_ulong rets);
+void spapr_rtas_register(const char *name, spapr_rtas_fn fn);
+target_ulong spapr_rtas_call(struct kvm_cpu *vcpu,
+                             uint32_t token, uint32_t nargs, target_ulong args,
+                             uint32_t nret, target_ulong rets);
+
+#endif /* !defined (__HW_SPAPR_H__) */
diff --git a/tools/kvm/powerpc/spapr_hcall.c b/tools/kvm/powerpc/spapr_hcall.c
new file mode 100644
index 0000000..ff1d63a
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_hcall.c
@@ -0,0 +1,134 @@
+/*
+ * SPAPR hypercalls
+ *
+ * Borrowed heavily from QEMU's spapr_hcall.c,
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Copyright (c) 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "spapr.h"
+#include "kvm/util.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+
+#include <stdio.h>
+#include <assert.h>
+
+static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
+static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX -
+					     KVMPPC_HCALL_BASE + 1];
+
+static target_ulong h_set_dabr(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	/* FIXME:  Implement this for -PR.  (-HV does this in kernel.) */
+	return H_HARDWARE;
+}
+
+static target_ulong h_rtas(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	target_ulong rtas_r3 = args[0];
+	/*
+	 * Pointer read from phys mem; these ptrs cannot be MMIO (!) so just
+	 * reference guest RAM directly.
+	 */
+	uint32_t token, nargs, nret;
+
+	token = rtas_ld(vcpu->kvm, rtas_r3, 0);
+	nargs = rtas_ld(vcpu->kvm, rtas_r3, 1);
+	nret  = rtas_ld(vcpu->kvm, rtas_r3, 2);
+
+	return spapr_rtas_call(vcpu, token, nargs, rtas_r3 + 12,
+			       nret, rtas_r3 + 12 + 4*nargs);
+}
+
+static target_ulong h_logical_load(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	/* SLOF will require these, though kernel doesn't. */
+	die(__PRETTY_FUNCTION__);
+	return H_PARAMETER;
+}
+
+static target_ulong h_logical_store(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	/* SLOF will require these, though kernel doesn't. */
+	die(__PRETTY_FUNCTION__);
+	return H_PARAMETER;
+}
+
+static target_ulong h_logical_icbi(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	/* KVM will trap this in the kernel.  Die if it misses. */
+	die(__PRETTY_FUNCTION__);
+	return H_SUCCESS;
+}
+
+static target_ulong h_logical_dcbf(struct kvm_cpu *vcpu, target_ulong opcode, target_ulong *args)
+{
+	/* KVM will trap this in the kernel.  Die if it misses. */
+	die(__PRETTY_FUNCTION__);
+	return H_SUCCESS;
+}
+
+void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn)
+{
+	spapr_hcall_fn *slot;
+
+	if (opcode <= MAX_HCALL_OPCODE) {
+		assert((opcode & 0x3) == 0);
+
+		slot = &papr_hypercall_table[opcode / 4];
+	} else {
+		assert((opcode >= KVMPPC_HCALL_BASE) &&
+		       (opcode <= KVMPPC_HCALL_MAX));
+
+		slot = &kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
+	}
+
+	assert(!(*slot) || (fn == *slot));
+	*slot = fn;
+}
+
+target_ulong spapr_hypercall(struct kvm_cpu *vcpu, target_ulong opcode,
+			     target_ulong *args)
+{
+	if ((opcode <= MAX_HCALL_OPCODE)
+	    && ((opcode & 0x3) == 0)) {
+		spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
+
+		if (fn) {
+			return fn(vcpu, opcode, args);
+		}
+	} else if ((opcode >= KVMPPC_HCALL_BASE) &&
+		   (opcode <= KVMPPC_HCALL_MAX)) {
+		spapr_hcall_fn fn = kvmppc_hypercall_table[opcode -
+							   KVMPPC_HCALL_BASE];
+
+		if (fn) {
+			return fn(vcpu, opcode, args);
+		}
+	}
+
+	hcall_dprintf("Unimplemented hcall 0x%lx\n", opcode);
+	return H_FUNCTION;
+}
+
+void hypercall_init(void)
+{
+	/* hcall-dabr */
+	spapr_register_hypercall(H_SET_DABR, h_set_dabr);
+
+	spapr_register_hypercall(H_LOGICAL_CI_LOAD, h_logical_load);
+	spapr_register_hypercall(H_LOGICAL_CI_STORE, h_logical_store);
+	spapr_register_hypercall(H_LOGICAL_CACHE_LOAD, h_logical_load);
+	spapr_register_hypercall(H_LOGICAL_CACHE_STORE, h_logical_store);
+	spapr_register_hypercall(H_LOGICAL_ICBI, h_logical_icbi);
+	spapr_register_hypercall(H_LOGICAL_DCBF, h_logical_dcbf);
+
+	/* KVM-PPC specific hcalls */
+	spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
+}
diff --git a/tools/kvm/powerpc/spapr_rtas.c b/tools/kvm/powerpc/spapr_rtas.c
new file mode 100644
index 0000000..14a3462
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_rtas.c
@@ -0,0 +1,229 @@
+/*
+ * SPAPR base RTAS calls
+ *
+ * Borrowed heavily from QEMU's spapr_rtas.c
+ * Copyright (c) 2010-2011 David Gibson, IBM Corporation.
+ *
+ * Modifications copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+#include "kvm/util.h"
+#include "kvm/term.h"
+#include "libfdt.h"
+
+#include "spapr.h"
+
+#include <stdio.h>
+#include <assert.h>
+
+#define TOKEN_BASE      0x2000
+#define TOKEN_MAX       0x100
+
+#define RTAS_CONSOLE
+
+static struct rtas_call {
+	const char *name;
+	spapr_rtas_fn fn;
+} rtas_table[TOKEN_MAX];
+
+struct rtas_call *rtas_next = rtas_table;
+
+
+static void rtas_display_character(struct kvm_cpu *vcpu,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+	char c = rtas_ld(vcpu->kvm, args, 0);
+	term_putc(CONSOLE_HV, &c, 1, 0);
+	rtas_st(vcpu->kvm, rets, 0, 0);
+}
+
+#ifdef RTAS_CONSOLE
+static void rtas_put_term_char(struct kvm_cpu *vcpu,
+			       uint32_t token, uint32_t nargs,
+			       target_ulong args,
+			       uint32_t nret, target_ulong rets)
+{
+	char c = rtas_ld(vcpu->kvm, args, 0);
+	term_putc(CONSOLE_HV, &c, 1, 0);
+	rtas_st(vcpu->kvm, rets, 0, 0);
+}
+
+static void rtas_get_term_char(struct kvm_cpu *vcpu,
+			       uint32_t token, uint32_t nargs,
+			       target_ulong args,
+			       uint32_t nret, target_ulong rets)
+{
+	int c;
+	if (term_readable(CONSOLE_HV, 0) &&
+	    (c = term_getc(CONSOLE_HV, 0)) >= 0) {
+		rtas_st(vcpu->kvm, rets, 0, 0);
+		rtas_st(vcpu->kvm, rets, 1, c);
+	} else {
+		rtas_st(vcpu->kvm, rets, 0, -2);
+	}
+}
+#endif
+
+static void rtas_get_time_of_day(struct kvm_cpu *vcpu,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args,
+                                 uint32_t nret, target_ulong rets)
+{
+	struct tm tm;
+	time_t tnow;
+
+	if (nret != 8) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	tnow = time(NULL);
+	/* Guest time is currently not offset in any way. */
+	gmtime_r(&tnow, &tm);
+
+	rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
+	rtas_st(vcpu->kvm, rets, 1, tm.tm_year + 1900);
+	rtas_st(vcpu->kvm, rets, 2, tm.tm_mon + 1);
+	rtas_st(vcpu->kvm, rets, 3, tm.tm_mday);
+	rtas_st(vcpu->kvm, rets, 4, tm.tm_hour);
+	rtas_st(vcpu->kvm, rets, 5, tm.tm_min);
+	rtas_st(vcpu->kvm, rets, 6, tm.tm_sec);
+	rtas_st(vcpu->kvm, rets, 7, 0);
+}
+
+static void rtas_set_time_of_day(struct kvm_cpu *vcpu,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args,
+                                 uint32_t nret, target_ulong rets)
+{
+	pr_warning("%s called; TOD set ignored.\n", __FUNCTION__);
+}
+
+static void rtas_power_off(struct kvm_cpu *vcpu,
+                           uint32_t token, uint32_t nargs, target_ulong args,
+                           uint32_t nret, target_ulong rets)
+{
+	if (nargs != 2 || nret != 1) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+	kvm_cpu__reboot();
+}
+
+static void rtas_query_cpu_stopped_state(struct kvm_cpu *vcpu,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args,
+                                         uint32_t nret, target_ulong rets)
+{
+	if (nargs != 1 || nret != 2) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	/*
+	 * Can read id = rtas_ld(vcpu->kvm, args, 0), but
+	 * we currently start all CPUs.  So just return true.
+	 */
+	rtas_st(vcpu->kvm, rets, 0, 0);
+	rtas_st(vcpu->kvm, rets, 1, 2);
+}
+
+static void rtas_start_cpu(struct kvm_cpu *vcpu,
+                           uint32_t token, uint32_t nargs,
+                           target_ulong args,
+                           uint32_t nret, target_ulong rets)
+{
+	die(__FUNCTION__);
+}
+
+target_ulong spapr_rtas_call(struct kvm_cpu *vcpu,
+                             uint32_t token, uint32_t nargs, target_ulong args,
+                             uint32_t nret, target_ulong rets)
+{
+	if ((token >= TOKEN_BASE)
+	    && ((token - TOKEN_BASE) < TOKEN_MAX)) {
+		struct rtas_call *call = rtas_table + (token - TOKEN_BASE);
+
+		if (call->fn) {
+			call->fn(vcpu, token, nargs, args, nret, rets);
+			return H_SUCCESS;
+		}
+	}
+
+	/*
+	 * HACK: Some Linux early debug code uses RTAS display-character,
+	 * but assumes the token value is 0xa (which it is on some real
+	 * machines) without looking it up in the device tree.  This
+	 * special case makes this work
+	 */
+	if (token == 0xa) {
+		rtas_display_character(vcpu, 0xa, nargs, args, nret, rets);
+		return H_SUCCESS;
+	}
+
+	hcall_dprintf("Unknown RTAS token 0x%x\n", token);
+	rtas_st(vcpu->kvm, rets, 0, -3);
+	return H_PARAMETER;
+}
+
+void spapr_rtas_register(const char *name, spapr_rtas_fn fn)
+{
+	assert(rtas_next < (rtas_table + TOKEN_MAX));
+
+	rtas_next->name = name;
+	rtas_next->fn = fn;
+
+	rtas_next++;
+}
+
+/*
+ * This is called from the context of an open /rtas node, in order to add
+ * properties for the rtas call tokens.
+ */
+int spapr_rtas_fdt_setup(struct kvm *kvm, void *fdt)
+{
+	int ret;
+	int i;
+
+	for (i = 0; i < TOKEN_MAX; i++) {
+		struct rtas_call *call = &rtas_table[i];
+
+		if (!call->fn) {
+			continue;
+		}
+
+		ret = fdt_property_cell(fdt, call->name, i + TOKEN_BASE);
+
+		if (ret < 0) {
+			pr_warning("Couldn't add rtas token for %s: %s\n",
+				   call->name, fdt_strerror(ret));
+			return ret;
+		}
+
+	}
+	return 0;
+}
+
+void register_core_rtas(void)
+{
+	spapr_rtas_register("display-character", rtas_display_character);
+	spapr_rtas_register("get-time-of-day", rtas_get_time_of_day);
+	spapr_rtas_register("set-time-of-day", rtas_set_time_of_day);
+	spapr_rtas_register("power-off", rtas_power_off);
+	spapr_rtas_register("query-cpu-stopped-state",
+			    rtas_query_cpu_stopped_state);
+	spapr_rtas_register("start-cpu", rtas_start_cpu);
+#ifdef RTAS_CONSOLE
+	/* These are unused: We do console I/O via hcalls, not rtas. */
+	spapr_rtas_register("put-term-char", rtas_put_term_char);
+	spapr_rtas_register("get-term-char", rtas_get_term_char);
+#endif
+}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 4/7] kvm tools: Add SPAPR PPC64 HV console
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
                   ` (2 preceding siblings ...)
  2012-01-31  6:34 ` [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  6:34 ` [PATCH V4 5/7] kvm tools: Add PPC64 XICS interrupt controller support Matt Evans
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This adds the console code, plus VIO HV terminal nodes are added to
the device tree so the guest kernel will pick it up.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/Makefile               |    1 +
 tools/kvm/powerpc/kvm.c          |   33 ++++++++++++
 tools/kvm/powerpc/spapr_hvcons.c |  102 ++++++++++++++++++++++++++++++++++++++
 tools/kvm/powerpc/spapr_hvcons.h |   19 +++++++
 4 files changed, 155 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index f6a8ac2..648550a 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -136,6 +136,7 @@ ifeq ($(uname_M), ppc64)
 	OBJS	+= powerpc/kvm-cpu.o
 	OBJS	+= powerpc/spapr_hcall.o
 	OBJS	+= powerpc/spapr_rtas.o
+	OBJS	+= powerpc/spapr_hvcons.o
 # We use libfdt, but it's sometimes not packaged 64bit.  It's small too,
 # so just build it in:
 	CFLAGS 	+= -I../../scripts/dtc/libfdt
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 891fa65..bb98c10 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -17,6 +17,7 @@
 #include "cpu_info.h"
 
 #include "spapr.h"
+#include "spapr_hvcons.h"
 
 #include <linux/kvm.h>
 
@@ -133,6 +134,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	/* FIXME: SPAPR-specific */
 	hypercall_init();
 	register_core_rtas();
+	/* Now that hypercalls are initialised, register a couple for the console: */
+	spapr_hvcons_init();
 }
 
 void kvm__arch_delete_ram(struct kvm *kvm)
@@ -151,6 +154,12 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 	kvm__irq_line(kvm, irq, 0);
 }
 
+void kvm__arch_periodic_poll(struct kvm *kvm)
+{
+	/* FIXME: Should register callbacks to platform-specific polls */
+	spapr_hvcons_poll(kvm);
+}
+
 int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline)
 {
 	void *p;
@@ -266,6 +275,13 @@ static void setup_fdt(struct kvm *kvm)
 		_FDT(fdt_property(fdt, "linux,initrd-end",
 				   &ird_end_prop, sizeof(ird_end_prop)));
 	}
+
+	/*
+	 * stdout-path: This is assuming we're using the HV console.  Also, the
+	 * address is hardwired until we do a VIO bus.
+	 */
+	_FDT(fdt_property_string(fdt, "linux,stdout-path",
+				 "/vdevice/vty@30000000"));
 	_FDT(fdt_end_node(fdt));
 
 	/*
@@ -352,6 +368,23 @@ static void setup_fdt(struct kvm *kvm)
 	}
 	_FDT(fdt_end_node(fdt));
 
+	/*
+	 * VIO: See comment in linux,stdout-path; we don't yet represent a VIO
+	 * bus/address allocation so addresses are hardwired here.
+	 */
+	_FDT(fdt_begin_node(fdt, "vdevice"));
+	_FDT(fdt_property_cell(fdt, "#address-cells", 0x1));
+	_FDT(fdt_property_cell(fdt, "#size-cells", 0x0));
+	_FDT(fdt_property_string(fdt, "device_type", "vdevice"));
+	_FDT(fdt_property_string(fdt, "compatible", "IBM,vdevice"));
+	_FDT(fdt_begin_node(fdt, "vty@30000000"));
+	_FDT(fdt_property_string(fdt, "name", "vty"));
+	_FDT(fdt_property_string(fdt, "device_type", "serial"));
+	_FDT(fdt_property_string(fdt, "compatible", "hvterm1"));
+	_FDT(fdt_property_cell(fdt, "reg", 0x30000000));
+	_FDT(fdt_end_node(fdt));
+	_FDT(fdt_end_node(fdt));
+
 	/* Finalise: */
 	_FDT(fdt_end_node(fdt)); /* Root node */
 	_FDT(fdt_finish(fdt));
diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c
new file mode 100644
index 0000000..511dbe1
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_hvcons.c
@@ -0,0 +1,102 @@
+/*
+ * SPAPR HV console
+ *
+ * Borrowed lightly from QEMU's spapr_vty.c, Copyright (c) 2010 David Gibson,
+ * IBM Corporation.
+ *
+ * Copyright (c) 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "kvm/term.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+#include "kvm/util.h"
+#include "spapr.h"
+#include "spapr_hvcons.h"
+
+#include <stdio.h>
+#include <sys/uio.h>
+#include <errno.h>
+
+#include <linux/byteorder.h>
+
+union hv_chario {
+	struct {
+		uint64_t char0_7;
+		uint64_t char8_15;
+	} a;
+	uint8_t buf[16];
+};
+
+static unsigned long h_put_term_char(struct kvm_cpu *vcpu, unsigned long opcode, unsigned long *args)
+{
+	/* To do: Read register from args[0], and check it. */
+	unsigned long len = args[1];
+	union hv_chario data;
+	struct iovec iov;
+
+	if (len > 16) {
+		return H_PARAMETER;
+	}
+	data.a.char0_7 = cpu_to_be64(args[2]);
+	data.a.char8_15 = cpu_to_be64(args[3]);
+
+	iov.iov_base = data.buf;
+	iov.iov_len = len;
+	do {
+		int ret;
+
+		ret = term_putc_iov(CONSOLE_HV, &iov, 1, 0);
+		if (ret < 0) {
+			die("term_putc_iov error %d!\n", errno);
+		}
+		iov.iov_base += ret;
+		iov.iov_len -= ret;
+	} while (iov.iov_len > 0);
+
+	return H_SUCCESS;
+}
+
+
+static unsigned long h_get_term_char(struct kvm_cpu *vcpu, unsigned long opcode, unsigned long *args)
+{
+	/* To do: Read register from args[0], and check it. */
+	unsigned long *len = args + 0;
+	unsigned long *char0_7 = args + 1;
+	unsigned long *char8_15 = args + 2;
+	union hv_chario data;
+	struct iovec iov;
+
+	if (term_readable(CONSOLE_HV, 0)) {
+		iov.iov_base = data.buf;
+		iov.iov_len = 16;
+
+		*len = term_getc_iov(CONSOLE_HV, &iov, 1, 0);
+		*char0_7 = be64_to_cpu(data.a.char0_7);
+		*char8_15 = be64_to_cpu(data.a.char8_15);
+	} else {
+		*len = 0;
+	}
+
+	return H_SUCCESS;
+}
+
+void spapr_hvcons_poll(struct kvm *kvm)
+{
+	if (term_readable(CONSOLE_HV, 0)) {
+		/*
+		 * We can inject an IRQ to guest here if we want.  The guest
+		 * will happily poll, though, so not required.
+		 */
+	}
+}
+
+void spapr_hvcons_init(void)
+{
+	spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
+	spapr_register_hypercall(H_GET_TERM_CHAR, h_get_term_char);
+}
diff --git a/tools/kvm/powerpc/spapr_hvcons.h b/tools/kvm/powerpc/spapr_hvcons.h
new file mode 100644
index 0000000..d3e4414
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_hvcons.h
@@ -0,0 +1,19 @@
+/*
+ * SPAPR HV console
+ *
+ * Copyright (c) 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef spapr_hvcons_H
+#define spapr_hvcons_H
+
+#include "kvm/kvm.h"
+
+void spapr_hvcons_init(void);
+void spapr_hvcons_poll(struct kvm *kvm);
+
+#endif
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 5/7] kvm tools: Add PPC64 XICS interrupt controller support
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
                   ` (3 preceding siblings ...)
  2012-01-31  6:34 ` [PATCH V4 4/7] kvm tools: Add SPAPR PPC64 HV console Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  6:34 ` [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge Matt Evans
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This patch adds XICS emulation code (heavily borrowed from QEMU), and wires
this into kvm_cpu__irq() to fire a CPU IRQ via KVM.  A device tree entry is
also added.  IPIs work, xics_alloc_irqnum() is added to allocate an external
IRQ (which will later be used by the PHB PCI code) and finally, kvm__irq_line()
can be called to raise an IRQ on XICS.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/Makefile                           |    1 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h     |    1 +
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |    2 +
 tools/kvm/powerpc/irq.c                      |   25 ++-
 tools/kvm/powerpc/kvm-cpu.c                  |   11 +
 tools/kvm/powerpc/kvm.c                      |   26 +-
 tools/kvm/powerpc/xics.c                     |  514 ++++++++++++++++++++++++++
 tools/kvm/powerpc/xics.h                     |   23 ++
 8 files changed, 596 insertions(+), 7 deletions(-)
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 648550a..2d97b66 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -137,6 +137,7 @@ ifeq ($(uname_M), ppc64)
 	OBJS	+= powerpc/spapr_hcall.o
 	OBJS	+= powerpc/spapr_rtas.o
 	OBJS	+= powerpc/spapr_hvcons.o
+	OBJS	+= powerpc/xics.o
 # We use libfdt, but it's sometimes not packaged 64bit.  It's small too,
 # so just build it in:
 	CFLAGS 	+= -I../../scripts/dtc/libfdt
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index ca15500..7235067 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -69,6 +69,7 @@ struct kvm {
 	unsigned long		initrd_size;
 	const char		*name;
 	int			vm_state;
+	struct icp_state	*icp;
 };
 
 /* Helper for the various bits of code that generate FDT nodes */
diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
index 64e4510..c1c6539 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
@@ -36,6 +36,8 @@
 #define MSR_RI		(1UL<<1)
 #define MSR_LE		(1UL<<0)
 
+#define POWER7_EXT_IRQ	0
+
 struct kvm;
 
 struct kvm_cpu {
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 46aa64f..f8f12e5 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,6 +21,15 @@
 #include <stddef.h>
 #include <stdlib.h>
 
+#include "xics.h"
+
+#define XICS_IRQS               1024
+
+/*
+ * FIXME: The code in this file assumes an SPAPR guest, using XICS.  Make
+ * generic & cope with multiple PPC platform types.
+ */
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
 	fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n",
@@ -28,9 +37,21 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 	return 0;
 }
 
-void irq__init(struct kvm *kvm)
+int irq__init(struct kvm *kvm)
 {
-	fprintf(stderr, __func__);
+	/*
+	 * kvm->nr_cpus is now valid; for /now/, pass
+	 * this to xics_system_init(), which assumes servers
+	 * are numbered 0..nrcpus.  This may not really be true,
+	 * but it is OK currently.
+	 */
+	kvm->icp = xics_system_init(XICS_IRQS, kvm->nrcpus);
+	return 0;
+}
+
+int irq__exit(struct kvm *kvm)
+{
+	return 0;
 }
 
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 51bb718..7dd1679 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -15,6 +15,7 @@
 #include "kvm/kvm.h"
 
 #include "spapr.h"
+#include "xics.h"
 
 #include <sys/ioctl.h>
 #include <sys/mman.h>
@@ -90,6 +91,9 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id)
 	 */
 	vcpu->is_running = true;
 
+	/* Register with IRQ controller (FIXME, assumes XICS) */
+	xics_cpu_register(vcpu);
+
 	return vcpu;
 }
 
@@ -160,6 +164,13 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
 /* kvm_cpu__irq - set KVM's IRQ flag on this vcpu */
 void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level)
 {
+	unsigned int virq = level ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
+
+	/* FIXME: POWER-specific */
+	if (pin != POWER7_EXT_IRQ)
+		return;
+	if (ioctl(vcpu->vcpu_fd, KVM_INTERRUPT, &virq) < 0)
+		pr_warning("Could not KVM_INTERRUPT.");
 }
 
 void kvm_cpu__arch_nmi(struct kvm_cpu *cpu)
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index bb98c10..815108c 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -39,9 +39,13 @@
 
 #define HUGETLBFS_PATH "/var/lib/hugetlbfs/global/pagesize-16MB/"
 
+#define PHANDLE_XICP		0x00001111
+
 static char kern_cmdline[2048];
 
 struct kvm_ext kvm_req_ext[] = {
+	{ DEFINE_KVM_EXT(KVM_CAP_PPC_UNSET_IRQ) },
+	{ DEFINE_KVM_EXT(KVM_CAP_PPC_IRQ_LEVEL) },
 	{ 0, 0 }
 };
 
@@ -143,11 +147,6 @@ void kvm__arch_delete_ram(struct kvm *kvm)
 	munmap(kvm->ram_start, kvm->ram_size);
 }
 
-void kvm__irq_line(struct kvm *kvm, int irq, int level)
-{
-	fprintf(stderr, "irq_line(%d, %d)\n", irq, level);
-}
-
 void kvm__irq_trigger(struct kvm *kvm, int irq)
 {
 	kvm__irq_line(kvm, irq, 1);
@@ -225,6 +224,7 @@ static void setup_fdt(struct kvm *kvm)
 {
 	uint64_t 	mem_reg_property[] = { 0, cpu_to_be64(kvm->ram_size) };
 	int 		smp_cpus = kvm->nrcpus;
+	uint32_t	int_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
 	char 		hypertas_prop_kvm[] = "hcall-pft\0hcall-term\0"
 		"hcall-dabr\0hcall-interrupt\0hcall-tce\0hcall-vio\0"
 		"hcall-splpar\0hcall-bulk";
@@ -368,6 +368,22 @@ static void setup_fdt(struct kvm *kvm)
 	}
 	_FDT(fdt_end_node(fdt));
 
+	/* IRQ controller */
+	_FDT(fdt_begin_node(fdt, "interrupt-controller@0"));
+
+	_FDT(fdt_property_string(fdt, "device_type",
+				 "PowerPC-External-Interrupt-Presentation"));
+	_FDT(fdt_property_string(fdt, "compatible", "IBM,ppc-xicp"));
+	_FDT(fdt_property_cell(fdt, "reg", 0));
+	_FDT(fdt_property(fdt, "interrupt-controller", NULL, 0));
+	_FDT(fdt_property(fdt, "ibm,interrupt-server-ranges",
+			   int_server_ranges_prop,
+			   sizeof(int_server_ranges_prop)));
+	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 2));
+	_FDT(fdt_property_cell(fdt, "linux,phandle", PHANDLE_XICP));
+	_FDT(fdt_property_cell(fdt, "phandle", PHANDLE_XICP));
+	_FDT(fdt_end_node(fdt));
+
 	/*
 	 * VIO: See comment in linux,stdout-path; we don't yet represent a VIO
 	 * bus/address allocation so addresses are hardwired here.
diff --git a/tools/kvm/powerpc/xics.c b/tools/kvm/powerpc/xics.c
new file mode 100644
index 0000000..2d70d3c
--- /dev/null
+++ b/tools/kvm/powerpc/xics.c
@@ -0,0 +1,514 @@
+/*
+ * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics
+ *
+ * Borrowed heavily from QEMU's xics.c,
+ * Copyright (c) 2010,2011 David Gibson, IBM Corporation.
+ *
+ * Modifications copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "spapr.h"
+#include "xics.h"
+#include "kvm/util.h"
+
+#include <stdio.h>
+#include <malloc.h>
+
+
+/* #define DEBUG_XICS yes */
+#ifdef DEBUG_XICS
+#define xics_dprintf(fmt, ...)					\
+	do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define xics_dprintf(fmt, ...)			\
+	do { } while (0)
+#endif
+
+/*
+ * ICP: Presentation layer
+ */
+
+struct icp_server_state {
+	uint32_t xirr;
+	uint8_t pending_priority;
+	uint8_t mfrr;
+	struct kvm_cpu *cpu;
+};
+
+#define XICS_IRQ_OFFSET 16
+#define XISR_MASK	0x00ffffff
+#define CPPR_MASK	0xff000000
+
+#define XISR(ss)   (((ss)->xirr) & XISR_MASK)
+#define CPPR(ss)   (((ss)->xirr) >> 24)
+
+struct ics_state;
+
+struct icp_state {
+	unsigned long nr_servers;
+	struct icp_server_state *ss;
+	struct ics_state *ics;
+};
+
+static void ics_reject(struct ics_state *ics, int nr);
+static void ics_resend(struct ics_state *ics);
+static void ics_eoi(struct ics_state *ics, int nr);
+
+static inline void cpu_irq_raise(struct kvm_cpu *vcpu)
+{
+	xics_dprintf("INT1[%p]\n", vcpu);
+	kvm_cpu__irq(vcpu, POWER7_EXT_IRQ, 1);
+}
+
+static inline void cpu_irq_lower(struct kvm_cpu *vcpu)
+{
+	xics_dprintf("INT0[%p]\n", vcpu);
+	kvm_cpu__irq(vcpu, POWER7_EXT_IRQ, 0);
+}
+
+static void icp_check_ipi(struct icp_state *icp, int server)
+{
+	struct icp_server_state *ss = icp->ss + server;
+
+	if (XISR(ss) && (ss->pending_priority <= ss->mfrr)) {
+		return;
+	}
+
+	if (XISR(ss)) {
+		ics_reject(icp->ics, XISR(ss));
+	}
+
+	ss->xirr = (ss->xirr & ~XISR_MASK) | XICS_IPI;
+	ss->pending_priority = ss->mfrr;
+	cpu_irq_raise(ss->cpu);
+}
+
+static void icp_resend(struct icp_state *icp, int server)
+{
+	struct icp_server_state *ss = icp->ss + server;
+
+	if (ss->mfrr < CPPR(ss)) {
+		icp_check_ipi(icp, server);
+	}
+	ics_resend(icp->ics);
+}
+
+static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr)
+{
+	struct icp_server_state *ss = icp->ss + server;
+	uint8_t old_cppr;
+	uint32_t old_xisr;
+
+	old_cppr = CPPR(ss);
+	ss->xirr = (ss->xirr & ~CPPR_MASK) | (cppr << 24);
+
+	if (cppr < old_cppr) {
+		if (XISR(ss) && (cppr <= ss->pending_priority)) {
+			old_xisr = XISR(ss);
+			ss->xirr &= ~XISR_MASK; /* Clear XISR */
+			cpu_irq_lower(ss->cpu);
+			ics_reject(icp->ics, old_xisr);
+		}
+	} else {
+		if (!XISR(ss)) {
+			icp_resend(icp, server);
+		}
+	}
+}
+
+static void icp_set_mfrr(struct icp_state *icp, int nr, uint8_t mfrr)
+{
+	struct icp_server_state *ss = icp->ss + nr;
+
+	ss->mfrr = mfrr;
+	if (mfrr < CPPR(ss)) {
+		icp_check_ipi(icp, nr);
+	}
+}
+
+static uint32_t icp_accept(struct icp_server_state *ss)
+{
+	uint32_t xirr;
+
+	cpu_irq_lower(ss->cpu);
+	xirr = ss->xirr;
+	ss->xirr = ss->pending_priority << 24;
+	return xirr;
+}
+
+static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr)
+{
+	struct icp_server_state *ss = icp->ss + server;
+
+	ics_eoi(icp->ics, xirr & XISR_MASK);
+	/* Send EOI -> ICS */
+	ss->xirr = (ss->xirr & ~CPPR_MASK) | (xirr & CPPR_MASK);
+	if (!XISR(ss)) {
+		icp_resend(icp, server);
+	}
+}
+
+static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
+{
+	struct icp_server_state *ss = icp->ss + server;
+	xics_dprintf("icp_irq(nr %d, server %d, prio 0x%x)\n", nr, server, priority);
+	if ((priority >= CPPR(ss))
+	    || (XISR(ss) && (ss->pending_priority <= priority))) {
+		xics_dprintf("reject %d, CPPR 0x%x, XISR 0x%x, pprio 0x%x, prio 0x%x\n",
+			     nr, CPPR(ss), XISR(ss), ss->pending_priority, priority);
+		ics_reject(icp->ics, nr);
+	} else {
+		if (XISR(ss)) {
+			xics_dprintf("reject %d, CPPR 0x%x, XISR 0x%x, pprio 0x%x, prio 0x%x\n",
+				     nr, CPPR(ss), XISR(ss), ss->pending_priority, priority);
+			ics_reject(icp->ics, XISR(ss));
+		}
+		ss->xirr = (ss->xirr & ~XISR_MASK) | (nr & XISR_MASK);
+		ss->pending_priority = priority;
+		cpu_irq_raise(ss->cpu);
+	}
+}
+
+/*
+ * ICS: Source layer
+ */
+
+struct ics_irq_state {
+	int server;
+	uint8_t priority;
+	uint8_t saved_priority;
+	int rejected:1;
+	int masked_pending:1;
+};
+
+struct ics_state {
+	unsigned int nr_irqs;
+	unsigned int offset;
+	struct ics_irq_state *irqs;
+	struct icp_state *icp;
+};
+
+static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
+{
+	return (nr >= ics->offset)
+		&& (nr < (ics->offset + ics->nr_irqs));
+}
+
+static void ics_set_irq_msi(struct ics_state *ics, int srcno, int val)
+{
+	struct ics_irq_state *irq = ics->irqs + srcno;
+
+	if (val) {
+		if (irq->priority == 0xff) {
+			xics_dprintf(" irq pri ff, masked pending\n");
+			irq->masked_pending = 1;
+		} else	{
+			icp_irq(ics->icp, irq->server, srcno + ics->offset, irq->priority);
+		}
+	}
+}
+
+static void ics_reject_msi(struct ics_state *ics, int nr)
+{
+	struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
+
+	irq->rejected = 1;
+}
+
+static void ics_resend_msi(struct ics_state *ics)
+{
+	unsigned int i;
+
+	for (i = 0; i < ics->nr_irqs; i++) {
+		struct ics_irq_state *irq = ics->irqs + i;
+
+		/* FIXME: filter by server#? */
+		if (irq->rejected) {
+			irq->rejected = 0;
+			if (irq->priority != 0xff) {
+				icp_irq(ics->icp, irq->server, i + ics->offset, irq->priority);
+			}
+		}
+	}
+}
+
+static void ics_write_xive_msi(struct ics_state *ics, int nr, int server,
+			       uint8_t priority)
+{
+	struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
+
+	irq->server = server;
+	irq->priority = priority;
+	xics_dprintf("ics_write_xive_msi(nr %d, server %d, pri 0x%x)\n", nr, server, priority);
+
+	if (!irq->masked_pending || (priority == 0xff)) {
+		return;
+	}
+
+	irq->masked_pending = 0;
+	icp_irq(ics->icp, server, nr, priority);
+}
+
+static void ics_reject(struct ics_state *ics, int nr)
+{
+	ics_reject_msi(ics, nr);
+}
+
+static void ics_resend(struct ics_state *ics)
+{
+	ics_resend_msi(ics);
+}
+
+static void ics_eoi(struct ics_state *ics, int nr)
+{
+}
+
+/*
+ * Exported functions
+ */
+
+static int allocated_irqnum = XICS_IRQ_OFFSET;
+
+/*
+ * xics_alloc_irqnum(): This is hacky.  The problem boils down to the PCI device
+ * code which just calls kvm__irq_line( .. pcidev->pci_hdr.irq_line ..) at will.
+ * Each PCI device's IRQ line is allocated by irq__register_device() (which
+ * allocates an IRQ AND allocates a.. PCI device num..).
+ *
+ * In future I'd like to at least mimic some kind of 'upstream IRQ controller'
+ * whereby PCI devices let their PHB know when they want to IRQ, and that
+ * percolates up.
+ *
+ * For now, allocate a REAL xics irq number and (via irq__register_device) push
+ * that into the config space.	8 bits only though!
+ */
+int xics_alloc_irqnum(void)
+{
+	int irq = allocated_irqnum++;
+
+	if (irq > 255)
+		die("Huge numbers of IRQs aren't supported with the daft kvmtool IRQ system.");
+
+	return irq;
+}
+
+static target_ulong h_cppr(struct kvm_cpu *vcpu,
+			   target_ulong opcode, target_ulong *args)
+{
+	target_ulong cppr = args[0];
+
+	xics_dprintf("h_cppr(%lx)\n", cppr);
+	icp_set_cppr(vcpu->kvm->icp, vcpu->cpu_id, cppr);
+	return H_SUCCESS;
+}
+
+static target_ulong h_ipi(struct kvm_cpu *vcpu,
+			  target_ulong opcode, target_ulong *args)
+{
+	target_ulong server = args[0];
+	target_ulong mfrr = args[1];
+
+	xics_dprintf("h_ipi(%lx, %lx)\n", server, mfrr);
+	if (server >= vcpu->kvm->icp->nr_servers) {
+		return H_PARAMETER;
+	}
+
+	icp_set_mfrr(vcpu->kvm->icp, server, mfrr);
+	return H_SUCCESS;
+}
+
+static target_ulong h_xirr(struct kvm_cpu *vcpu,
+			   target_ulong opcode, target_ulong *args)
+{
+	uint32_t xirr = icp_accept(vcpu->kvm->icp->ss + vcpu->cpu_id);
+
+	xics_dprintf("h_xirr() = %x\n", xirr);
+	args[0] = xirr;
+	return H_SUCCESS;
+}
+
+static target_ulong h_eoi(struct kvm_cpu *vcpu,
+			  target_ulong opcode, target_ulong *args)
+{
+	target_ulong xirr = args[0];
+
+	xics_dprintf("h_eoi(%lx)\n", xirr);
+	icp_eoi(vcpu->kvm->icp, vcpu->cpu_id, xirr);
+	return H_SUCCESS;
+}
+
+static void rtas_set_xive(struct kvm_cpu *vcpu, uint32_t token,
+			  uint32_t nargs, target_ulong args,
+			  uint32_t nret, target_ulong rets)
+{
+	struct ics_state *ics = vcpu->kvm->icp->ics;
+	uint32_t nr, server, priority;
+
+	if ((nargs != 3) || (nret != 1)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	nr = rtas_ld(vcpu->kvm, args, 0);
+	server = rtas_ld(vcpu->kvm, args, 1);
+	priority = rtas_ld(vcpu->kvm, args, 2);
+
+	xics_dprintf("rtas_set_xive(%x,%x,%x)\n", nr, server, priority);
+	if (!ics_valid_irq(ics, nr) || (server >= ics->icp->nr_servers)
+	    || (priority > 0xff)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	ics_write_xive_msi(ics, nr, server, priority);
+
+	rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
+}
+
+static void rtas_get_xive(struct kvm_cpu *vcpu, uint32_t token,
+			  uint32_t nargs, target_ulong args,
+			  uint32_t nret, target_ulong rets)
+{
+	struct ics_state *ics = vcpu->kvm->icp->ics;
+	uint32_t nr;
+
+	if ((nargs != 1) || (nret != 3)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	nr = rtas_ld(vcpu->kvm, args, 0);
+
+	if (!ics_valid_irq(ics, nr)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
+	rtas_st(vcpu->kvm, rets, 1, ics->irqs[nr - ics->offset].server);
+	rtas_st(vcpu->kvm, rets, 2, ics->irqs[nr - ics->offset].priority);
+}
+
+static void rtas_int_off(struct kvm_cpu *vcpu, uint32_t token,
+			 uint32_t nargs, target_ulong args,
+			 uint32_t nret, target_ulong rets)
+{
+	struct ics_state *ics = vcpu->kvm->icp->ics;
+	uint32_t nr;
+
+	if ((nargs != 1) || (nret != 1)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	nr = rtas_ld(vcpu->kvm, args, 0);
+
+	if (!ics_valid_irq(ics, nr)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	/* ME: QEMU wrote xive_msi here, in #if 0.  Deleted. */
+
+	rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
+}
+
+static void rtas_int_on(struct kvm_cpu *vcpu, uint32_t token,
+			uint32_t nargs, target_ulong args,
+			uint32_t nret, target_ulong rets)
+{
+	struct ics_state *ics = vcpu->kvm->icp->ics;
+	uint32_t nr;
+
+	if ((nargs != 1) || (nret != 1)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	nr = rtas_ld(vcpu->kvm, args, 0);
+
+	if (!ics_valid_irq(ics, nr)) {
+		rtas_st(vcpu->kvm, rets, 0, -3);
+		return;
+	}
+
+	/* ME: QEMU wrote xive_msi here, in #if 0.  Deleted. */
+
+	rtas_st(vcpu->kvm, rets, 0, 0); /* Success */
+}
+
+void xics_cpu_register(struct kvm_cpu *vcpu)
+{
+	if (vcpu->cpu_id < vcpu->kvm->icp->nr_servers)
+		vcpu->kvm->icp->ss[vcpu->cpu_id].cpu = vcpu;
+	else
+		die("Setting invalid server for cpuid %ld\n", vcpu->cpu_id);
+}
+
+struct icp_state *xics_system_init(unsigned int nr_irqs, unsigned int nr_cpus)
+{
+	int max_server_num;
+	unsigned int i;
+	struct icp_state *icp;
+	struct ics_state *ics;
+
+	max_server_num = nr_cpus;
+
+	icp = malloc(sizeof(*icp));
+	icp->nr_servers = max_server_num + 1;
+	icp->ss = malloc(icp->nr_servers*sizeof(struct icp_server_state));
+
+	for (i = 0; i < icp->nr_servers; i++) {
+		icp->ss[i].xirr = 0;
+		icp->ss[i].pending_priority = 0;
+		icp->ss[i].cpu = 0;
+		icp->ss[i].mfrr = 0xff;
+	}
+
+	/*
+	 * icp->ss[env->cpu_index].cpu is set by CPUs calling in to
+	 * xics_cpu_register().
+	 */
+
+	ics = malloc(sizeof(*ics));
+	ics->nr_irqs = nr_irqs;
+	ics->offset = XICS_IRQ_OFFSET;
+	ics->irqs = malloc(nr_irqs * sizeof(struct ics_irq_state));
+
+	icp->ics = ics;
+	ics->icp = icp;
+
+	for (i = 0; i < nr_irqs; i++) {
+		ics->irqs[i].server = 0;
+		ics->irqs[i].priority = 0xff;
+		ics->irqs[i].saved_priority = 0xff;
+		ics->irqs[i].rejected = 0;
+		ics->irqs[i].masked_pending = 0;
+	}
+
+	spapr_register_hypercall(H_CPPR, h_cppr);
+	spapr_register_hypercall(H_IPI, h_ipi);
+	spapr_register_hypercall(H_XIRR, h_xirr);
+	spapr_register_hypercall(H_EOI, h_eoi);
+
+	spapr_rtas_register("ibm,set-xive", rtas_set_xive);
+	spapr_rtas_register("ibm,get-xive", rtas_get_xive);
+	spapr_rtas_register("ibm,int-off", rtas_int_off);
+	spapr_rtas_register("ibm,int-on", rtas_int_on);
+
+	return icp;
+}
+
+void kvm__irq_line(struct kvm *kvm, int irq, int level)
+{
+	/*
+	 * Route event to ICS, which routes to ICP, which eventually does a
+	 * kvm_cpu__irq(vcpu, POWER7_EXT_IRQ, 1)
+	 */
+	xics_dprintf("Raising IRQ %d -> %d\n", irq, level);
+	ics_set_irq_msi(kvm->icp->ics, irq - kvm->icp->ics->offset, level);
+}
diff --git a/tools/kvm/powerpc/xics.h b/tools/kvm/powerpc/xics.h
new file mode 100644
index 0000000..144915b
--- /dev/null
+++ b/tools/kvm/powerpc/xics.h
@@ -0,0 +1,23 @@
+/*
+ * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics
+ *
+ * Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef XICS_H
+#define XICS_H
+
+#define XICS_IPI        0x2
+
+struct kvm_cpu;
+struct icp_state;
+
+struct icp_state *xics_system_init(unsigned int nr_irqs, unsigned int nr_cpus);
+void xics_cpu_register(struct kvm_cpu *vcpu);
+int xics_alloc_irqnum(void);
+
+#endif
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
                   ` (4 preceding siblings ...)
  2012-01-31  6:34 ` [PATCH V4 5/7] kvm tools: Add PPC64 XICS interrupt controller support Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-02-01  3:40   ` David Gibson
  2012-01-31  6:34 ` [PATCH V4 7/7] kvm tools: Add PPC64 kvm_cpu__emulate_io() Matt Evans
  2012-01-31  8:29 ` [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Pekka Enberg
  7 siblings, 1 reply; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This provides the PCI bridge, definitions for the address layout of the windows
and wires in IRQs.  Once PCI devices are all registered, they are enumerated and
DT nodes generated for each.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/powerpc/include/kvm/kvm-arch.h |    3 +
 tools/kvm/powerpc/irq.c                  |   18 ++-
 tools/kvm/powerpc/kvm.c                  |   11 +
 tools/kvm/powerpc/spapr.h                |    9 +
 tools/kvm/powerpc/spapr_pci.c            |  423 ++++++++++++++++++++++++++++++
 tools/kvm/powerpc/spapr_pci.h            |   57 ++++
 6 files changed, 519 insertions(+), 2 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h

diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index 7235067..6dd7082 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -40,6 +40,8 @@
  */
 #define KVM_PCI_MMIO_AREA		0x1000000
 
+struct spapr_phb;
+
 struct kvm {
 	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
 	int			vm_fd;		/* For VM ioctls() */
@@ -70,6 +72,7 @@ struct kvm {
 	const char		*name;
 	int			vm_state;
 	struct icp_state	*icp;
+	struct spapr_phb	*phb;
 };
 
 /* Helper for the various bits of code that generate FDT nodes */
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index f8f12e5..7da4012 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,7 +21,10 @@
 #include <stddef.h>
 #include <stdlib.h>
 
+#include "kvm/pci.h"
+
 #include "xics.h"
+#include "spapr_pci.h"
 
 #define XICS_IRQS               1024
 
@@ -30,10 +33,21 @@
  * generic & cope with multiple PPC platform types.
  */
 
+static int pci_devs = 0;
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
-	fprintf(stderr, "irq__register_device(%d, [%d], [%d], [%d]\n",
-		dev, *num, *pin, *line);
+	if (pci_devs >= PCI_MAX_DEVICES)
+		die("Hit PCI device limit!\n");
+
+	*num = pci_devs++;
+
+	*pin = 1;
+	/*
+	 * Have I said how nasty I find this?  Line should be dontcare... PHB
+	 * should determine which CPU/XICS IRQ to fire.
+	 */
+	*line = xics_alloc_irqnum();
 	return 0;
 }
 
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 815108c..cbc0d8f 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -18,6 +18,7 @@
 
 #include "spapr.h"
 #include "spapr_hvcons.h"
+#include "spapr_pci.h"
 
 #include <linux/kvm.h>
 
@@ -140,6 +141,11 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	register_core_rtas();
 	/* Now that hypercalls are initialised, register a couple for the console: */
 	spapr_hvcons_init();
+	spapr_create_phb(kvm, "pci", SPAPR_PCI_BUID,
+			 SPAPR_PCI_MEM_WIN_ADDR,
+			 SPAPR_PCI_MEM_WIN_SIZE,
+			 SPAPR_PCI_IO_WIN_ADDR,
+			 SPAPR_PCI_IO_WIN_SIZE);
 }
 
 void kvm__arch_delete_ram(struct kvm *kvm)
@@ -406,6 +412,11 @@ static void setup_fdt(struct kvm *kvm)
 	_FDT(fdt_finish(fdt));
 
 	_FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE));
+
+	/* PCI */
+	if (spapr_populate_pci_devices(kvm, PHANDLE_XICP, fdt_dest))
+		die("Fail populating PCI device nodes");
+
 	_FDT(fdt_add_mem_rsv(fdt_dest, kvm->rtas_gra, kvm->rtas_size));
 	_FDT(fdt_pack(fdt_dest));
 }
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
index 1271daa..0537f88 100644
--- a/tools/kvm/powerpc/spapr.h
+++ b/tools/kvm/powerpc/spapr.h
@@ -81,4 +81,13 @@ target_ulong spapr_rtas_call(struct kvm_cpu *vcpu,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets);
 
+#define SPAPR_PCI_BUID          0x800000020000001ULL
+#define SPAPR_PCI_MEM_WIN_ADDR  (KVM_MMIO_START + 0xA0000000)
+#define SPAPR_PCI_MEM_WIN_SIZE  0x20000000
+#define SPAPR_PCI_IO_WIN_ADDR   (SPAPR_PCI_MEM_WIN_ADDR + SPAPR_PCI_MEM_WIN_SIZE)
+#define SPAPR_PCI_IO_WIN_SIZE	0x2000000
+
+#define SPAPR_PCI_WIN_START	SPAPR_PCI_MEM_WIN_ADDR
+#define SPAPR_PCI_WIN_END	(SPAPR_PCI_IO_WIN_ADDR + SPAPR_PCI_IO_WIN_SIZE)
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/tools/kvm/powerpc/spapr_pci.c b/tools/kvm/powerpc/spapr_pci.c
new file mode 100644
index 0000000..f9d29f0
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_pci.c
@@ -0,0 +1,423 @@
+/*
+ * SPAPR PHB emulation, RTAS interface to PCI config space, device tree nodes
+ * for enumerated devices.
+ *
+ * Borrowed heavily from QEMU's spapr_pci.c,
+ * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation.
+ * Copyright (c) 2011 David Gibson, IBM Corporation.
+ *
+ * Modifications copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "spapr.h"
+#include "spapr_pci.h"
+#include "kvm/util.h"
+#include "kvm/pci.h"
+#include "libfdt.h"
+
+#include <linux/pci_regs.h>
+#include <linux/byteorder.h>
+
+
+/* #define DEBUG_PHB yes */
+#ifdef DEBUG_PHB
+#define phb_dprintf(fmt, ...)					\
+	do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define phb_dprintf(fmt, ...)			\
+	do { } while (0)
+#endif
+
+static const uint32_t bars[] = {
+	PCI_BASE_ADDRESS_0, PCI_BASE_ADDRESS_1,
+	PCI_BASE_ADDRESS_2, PCI_BASE_ADDRESS_3,
+	PCI_BASE_ADDRESS_4, PCI_BASE_ADDRESS_5
+	/*, PCI_ROM_ADDRESS*/
+};
+
+#define PCI_NUM_REGIONS		7
+
+/* Macros to operate with address in OF binding to PCI */
+#define b_x(x, p, l)	(((x) & ((1<<(l))-1)) << (p))
+#define b_n(x)		b_x((x), 31, 1) /* 0 if relocatable */
+#define b_p(x)		b_x((x), 30, 1) /* 1 if prefetchable */
+#define b_t(x)		b_x((x), 29, 1) /* 1 if the address is aliased */
+#define b_ss(x)		b_x((x), 24, 2) /* the space code */
+#define b_bbbbbbbb(x)	b_x((x), 16, 8) /* bus number */
+#define b_ddddd(x)	b_x((x), 11, 5) /* device number */
+#define b_fff(x)	b_x((x), 8, 3)	/* function number */
+#define b_rrrrrrrr(x)	b_x((x), 0, 8)	/* register number */
+
+#define SS_M64		3
+#define SS_M32		2
+#define SS_IO		1
+#define SS_CONFIG	0
+
+
+static struct spapr_phb phb;
+
+
+static void rtas_ibm_read_pci_config(struct kvm_cpu *vcpu,
+				     uint32_t token, uint32_t nargs,
+				     target_ulong args,
+				     uint32_t nret, target_ulong rets)
+{
+	uint32_t val = 0;
+	uint64_t buid = ((uint64_t)rtas_ld(vcpu->kvm, args, 1) << 32) | rtas_ld(vcpu->kvm, args, 2);
+	union pci_config_address addr = { .w = rtas_ld(vcpu->kvm, args, 0) };
+	struct pci_device_header *dev = pci__find_dev(addr.device_number);
+	uint32_t size = rtas_ld(vcpu->kvm, args, 3);
+
+	if (buid != phb.buid || !dev || (size > 4)) {
+		phb_dprintf("- cfgRd buid 0x%lx cfg addr 0x%x size %d not found\n",
+			    buid, addr.w, size);
+
+		rtas_st(vcpu->kvm, rets, 0, -1);
+		return;
+	}
+	pci__config_rd(vcpu->kvm, addr, &val, size);
+	/* It appears this wants a byteswapped result... */
+	switch (size) {
+	case 4:
+		val = le32_to_cpu(val);
+		break;
+	case 2:
+		val = le16_to_cpu(val>>16);
+		break;
+	case 1:
+		val = val >> 24;
+		break;
+	}
+	phb_dprintf("- cfgRd buid 0x%lx addr 0x%x (/%d): b%d,d%d,f%d,r0x%x, val 0x%x\n",
+		    buid, addr.w, size, addr.bus_number, addr.device_number, addr.function_number,
+		    addr.register_number, val);
+
+	rtas_st(vcpu->kvm, rets, 0, 0);
+	rtas_st(vcpu->kvm, rets, 1, val);
+}
+
+static void rtas_read_pci_config(struct kvm_cpu *vcpu,
+				 uint32_t token, uint32_t nargs,
+				 target_ulong args,
+				 uint32_t nret, target_ulong rets)
+{
+	uint32_t val;
+	union pci_config_address addr = { .w = rtas_ld(vcpu->kvm, args, 0) };
+	struct pci_device_header *dev = pci__find_dev(addr.device_number);
+	uint32_t size = rtas_ld(vcpu->kvm, args, 1);
+
+	if (!dev || (size > 4)) {
+		rtas_st(vcpu->kvm, rets, 0, -1);
+		return;
+	}
+	pci__config_rd(vcpu->kvm, addr, &val, size);
+	switch (size) {
+	case 4:
+		val = le32_to_cpu(val);
+		break;
+	case 2:
+		val = le16_to_cpu(val>>16); /* We're yuck-endian. */
+		break;
+	case 1:
+		val = val >> 24;
+		break;
+	}
+	phb_dprintf("- cfgRd addr 0x%x size %d, val 0x%x\n", addr.w, size, val);
+	rtas_st(vcpu->kvm, rets, 0, 0);
+	rtas_st(vcpu->kvm, rets, 1, val);
+}
+
+static void rtas_ibm_write_pci_config(struct kvm_cpu *vcpu,
+				      uint32_t token, uint32_t nargs,
+				      target_ulong args,
+				      uint32_t nret, target_ulong rets)
+{
+	uint64_t buid = ((uint64_t)rtas_ld(vcpu->kvm, args, 1) << 32) | rtas_ld(vcpu->kvm, args, 2);
+	union pci_config_address addr = { .w = rtas_ld(vcpu->kvm, args, 0) };
+	struct pci_device_header *dev = pci__find_dev(addr.device_number);
+	uint32_t size = rtas_ld(vcpu->kvm, args, 3);
+	uint32_t val = rtas_ld(vcpu->kvm, args, 4);
+
+	if (buid != phb.buid || !dev || (size > 4)) {
+		phb_dprintf("- cfgWr buid 0x%lx cfg addr 0x%x/%d error (val 0x%x)\n",
+			    buid, addr.w, size, val);
+
+		rtas_st(vcpu->kvm, rets, 0, -1);
+		return;
+	}
+	phb_dprintf("- cfgWr buid 0x%lx addr 0x%x (/%d): b%d,d%d,f%d,r0x%x, val 0x%x\n",
+		    buid, addr.w, size, addr.bus_number, addr.device_number, addr.function_number,
+		    addr.register_number, val);
+	switch (size) {
+	case 4:
+		val = le32_to_cpu(val);
+		break;
+	case 2:
+		val = le16_to_cpu(val) << 16;
+		break;
+	case 1:
+		val = val >> 24;
+		break;
+	}
+	pci__config_wr(vcpu->kvm, addr, &val, size);
+	rtas_st(vcpu->kvm, rets, 0, 0);
+}
+
+static void rtas_write_pci_config(struct kvm_cpu *vcpu,
+				  uint32_t token, uint32_t nargs,
+				  target_ulong args,
+				  uint32_t nret, target_ulong rets)
+{
+	union pci_config_address addr = { .w = rtas_ld(vcpu->kvm, args, 0) };
+	struct pci_device_header *dev = pci__find_dev(addr.device_number);
+	uint32_t size = rtas_ld(vcpu->kvm, args, 1);
+	uint32_t val = rtas_ld(vcpu->kvm, args, 2);
+
+	if (!dev || (size > 4)) {
+		rtas_st(vcpu->kvm, rets, 0, -1);
+		return;
+	}
+
+	phb_dprintf("- cfgWr addr 0x%x (/%d): b%d,d%d,f%d,r0x%x, val 0x%x\n",
+		    addr.w, size, addr.bus_number, addr.device_number, addr.function_number,
+		    addr.register_number, val);
+	switch (size) {
+	case 4:
+		val = le32_to_cpu(val);
+		break;
+	case 2:
+		val = le16_to_cpu(val) << 16;
+		break;
+	case 1:
+		val = val >> 24;
+		break;
+	}
+	pci__config_wr(vcpu->kvm, addr, &val, size);
+	rtas_st(vcpu->kvm, rets, 0, 0);
+}
+
+void spapr_create_phb(struct kvm *kvm,
+		      const char *busname, uint64_t buid,
+		      uint64_t mem_win_addr, uint64_t mem_win_size,
+		      uint64_t io_win_addr, uint64_t io_win_size)
+{
+	/*
+	 * Since kvmtool doesn't really have any concept of buses etc.,
+	 * there's nothing to register here.  Just register RTAS.
+	 */
+	spapr_rtas_register("read-pci-config", rtas_read_pci_config);
+	spapr_rtas_register("write-pci-config", rtas_write_pci_config);
+	spapr_rtas_register("ibm,read-pci-config", rtas_ibm_read_pci_config);
+	spapr_rtas_register("ibm,write-pci-config", rtas_ibm_write_pci_config);
+
+	phb.buid = buid;
+	phb.mem_addr = mem_win_addr;
+	phb.mem_size = mem_win_size;
+	phb.io_addr  = io_win_addr;
+	phb.io_size  = io_win_size;
+
+	kvm->phb = &phb;
+}
+
+static uint32_t bar_to_ss(unsigned long bar)
+{
+	if ((bar & PCI_BASE_ADDRESS_SPACE) ==
+	    PCI_BASE_ADDRESS_SPACE_IO)
+		return SS_IO;
+	else if (bar & PCI_BASE_ADDRESS_MEM_TYPE_64)
+		return SS_M64;
+	else
+		return SS_M32;
+}
+
+static unsigned long bar_to_addr(unsigned long bar)
+{
+	if ((bar & PCI_BASE_ADDRESS_SPACE) ==
+	    PCI_BASE_ADDRESS_SPACE_IO)
+		return bar & PCI_BASE_ADDRESS_IO_MASK;
+	else
+		return bar & PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+int spapr_populate_pci_devices(struct kvm *kvm,
+			       uint32_t xics_phandle,
+			       void *fdt)
+{
+	int bus_off, node_off = 0, devid, fn, i, n, devices;
+	char nodename[256];
+	struct {
+		uint32_t hi;
+		uint64_t addr;
+		uint64_t size;
+	} __attribute__((packed)) reg[PCI_NUM_REGIONS + 1],
+		  assigned_addresses[PCI_NUM_REGIONS];
+	uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
+	struct {
+		uint32_t hi;
+		uint64_t child;
+		uint64_t parent;
+		uint64_t size;
+	} __attribute__((packed)) ranges[] = {
+		{
+			cpu_to_be32(b_ss(1)), cpu_to_be64(0),
+			cpu_to_be64(phb.io_addr),
+			cpu_to_be64(phb.io_size),
+		},
+		{
+			cpu_to_be32(b_ss(2)), cpu_to_be64(0),
+			cpu_to_be64(phb.mem_addr),
+			cpu_to_be64(phb.mem_size),
+		},
+	};
+	uint64_t bus_reg[] = { cpu_to_be64(phb.buid), 0 };
+	uint32_t interrupt_map_mask[] = {
+		cpu_to_be32(b_ddddd(-1)|b_fff(-1)), 0x0, 0x0, 0x0};
+	uint32_t interrupt_map[SPAPR_PCI_NUM_LSI][7];
+
+	/* Start populating the FDT */
+	sprintf(nodename, "pci@%" PRIx64, phb.buid);
+	bus_off = fdt_add_subnode(fdt, 0, nodename);
+	if (bus_off < 0) {
+		die("error making bus subnode, %s\n", fdt_strerror(bus_off));
+		return bus_off;
+	}
+
+	/* Write PHB properties */
+	_FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
+	_FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
+	_FDT(fdt_setprop_cell(fdt, bus_off, "#address-cells", 0x3));
+	_FDT(fdt_setprop_cell(fdt, bus_off, "#size-cells", 0x2));
+	_FDT(fdt_setprop_cell(fdt, bus_off, "#interrupt-cells", 0x1));
+	_FDT(fdt_setprop(fdt, bus_off, "used-by-rtas", NULL, 0));
+	_FDT(fdt_setprop(fdt, bus_off, "bus-range", &bus_range, sizeof(bus_range)));
+	_FDT(fdt_setprop(fdt, bus_off, "ranges", &ranges, sizeof(ranges)));
+	_FDT(fdt_setprop(fdt, bus_off, "reg", &bus_reg, sizeof(bus_reg)));
+	_FDT(fdt_setprop(fdt, bus_off, "interrupt-map-mask",
+			 &interrupt_map_mask, sizeof(interrupt_map_mask)));
+
+	/* Populate PCI devices and allocate IRQs */
+	devices = 0;
+
+	for (devid = 0; devid < PCI_MAX_DEVICES; devid++) {
+		uint32_t *irqmap = interrupt_map[devices];
+		struct pci_device_header *hdr = pci__find_dev(devid);
+
+		if (!hdr)
+			continue;
+
+		fn = 0; /* kvmtool doesn't yet do multifunction devices */
+
+		sprintf(nodename, "pci@%u,%u", devid, fn);
+
+		/* Allocate interrupt from the map */
+		if (devid > SPAPR_PCI_NUM_LSI)	{
+			die("Unexpected behaviour in spapr_populate_pci_devices,"
+			    "wrong devid %u\n", devid);
+		}
+		irqmap[0] = cpu_to_be32(b_ddddd(devid)|b_fff(fn));
+		irqmap[1] = 0;
+		irqmap[2] = 0;
+		irqmap[3] = 0;
+		irqmap[4] = cpu_to_be32(xics_phandle);
+		/*
+		 * This is nasty; the PCI devs are set up such that their own
+		 * header's irq_line indicates the direct XICS IRQ number to
+		 * use.  There REALLY needs to be a hierarchical system in place
+		 * to 'raise' an IRQ on the bridge which indexes/looks up which
+		 * XICS IRQ to fire.
+		 */
+		irqmap[5] = cpu_to_be32(hdr->irq_line);
+		irqmap[6] = cpu_to_be32(0x8);
+
+		/* Add node to FDT */
+		node_off = fdt_add_subnode(fdt, bus_off, nodename);
+		if (node_off < 0) {
+			die("error making node subnode, %s\n", fdt_strerror(bus_off));
+			return node_off;
+		}
+
+		_FDT(fdt_setprop_cell(fdt, node_off, "vendor-id",
+				      le16_to_cpu(hdr->vendor_id)));
+		_FDT(fdt_setprop_cell(fdt, node_off, "device-id",
+				      le16_to_cpu(hdr->device_id)));
+		_FDT(fdt_setprop_cell(fdt, node_off, "revision-id",
+				      hdr->revision_id));
+		_FDT(fdt_setprop_cell(fdt, node_off, "class-code",
+				      hdr->class[0] | (hdr->class[1] << 8) | (hdr->class[2] << 16)));
+		_FDT(fdt_setprop_cell(fdt, node_off, "subsystem-id",
+				      le16_to_cpu(hdr->subsys_id)));
+		_FDT(fdt_setprop_cell(fdt, node_off, "subsystem-vendor-id",
+				      le16_to_cpu(hdr->subsys_vendor_id)));
+
+		/* Config space region comes first */
+		reg[0].hi = cpu_to_be32(
+			b_n(0) |
+			b_p(0) |
+			b_t(0) |
+			b_ss(SS_CONFIG) |
+			b_bbbbbbbb(0) |
+			b_ddddd(devid) |
+			b_fff(fn));
+		reg[0].addr = 0;
+		reg[0].size = 0;
+
+		n = 0;
+		/* Six BARs, no ROM supported, addresses are 32bit */
+		for (i = 0; i < 6; ++i) {
+			if (0 == hdr->bar[i]) {
+				continue;
+			}
+
+			reg[n+1].hi = cpu_to_be32(
+				b_n(0) |
+				b_p(0) |
+				b_t(0) |
+				b_ss(bar_to_ss(le32_to_cpu(hdr->bar[i]))) |
+				b_bbbbbbbb(0) |
+				b_ddddd(devid) |
+				b_fff(fn) |
+				b_rrrrrrrr(bars[i]));
+			reg[n+1].addr = 0;
+			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
+
+			assigned_addresses[n].hi = cpu_to_be32(
+				b_n(1) |
+				b_p(0) |
+				b_t(0) |
+				b_ss(bar_to_ss(le32_to_cpu(hdr->bar[i]))) |
+				b_bbbbbbbb(0) |
+				b_ddddd(devid) |
+				b_fff(fn) |
+				b_rrrrrrrr(bars[i]));
+
+			/*
+			 * Writing zeroes to assigned_addresses causes the guest kernel to
+			 * reassign BARs
+			 */
+			assigned_addresses[n].addr = cpu_to_be64(bar_to_addr(le32_to_cpu(hdr->bar[i])));
+			assigned_addresses[n].size = reg[n+1].size;
+
+			++n;
+		}
+		_FDT(fdt_setprop(fdt, node_off, "reg", reg, sizeof(reg[0])*(n+1)));
+		_FDT(fdt_setprop(fdt, node_off, "assigned-addresses",
+				 assigned_addresses,
+				 sizeof(assigned_addresses[0])*(n)));
+		_FDT(fdt_setprop_cell(fdt, node_off, "interrupts",
+				      hdr->irq_pin));
+
+		/* We don't set ibm,dma-window property as we don't have an IOMMU. */
+
+		++devices;
+	}
+
+	/* Write interrupt map */
+	_FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
+			 devices * sizeof(interrupt_map[0])));
+
+	return 0;
+}
diff --git a/tools/kvm/powerpc/spapr_pci.h b/tools/kvm/powerpc/spapr_pci.h
new file mode 100644
index 0000000..48b221c
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_pci.h
@@ -0,0 +1,57 @@
+/*
+ * SPAPR PHB definitions
+ *
+ * Modifications by Matt Evans <matt@ozlabs.org>, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef SPAPR_PCI_H
+#define SPAPR_PCI_H
+
+#include "kvm/kvm.h"
+#include "spapr.h"
+#include <inttypes.h>
+
+/* With XICS, we can easily accomodate 1 IRQ per PCI device. */
+
+#define SPAPR_PCI_NUM_LSI 256
+
+struct spapr_phb {
+	uint64_t buid;
+	uint64_t mem_addr;
+	uint64_t mem_size;
+	uint64_t io_addr;
+	uint64_t io_size;
+};
+
+void spapr_create_phb(struct kvm *kvm,
+                      const char *busname, uint64_t buid,
+                      uint64_t mem_win_addr, uint64_t mem_win_size,
+                      uint64_t io_win_addr, uint64_t io_win_size);
+
+int spapr_populate_pci_devices(struct kvm *kvm,
+                               uint32_t xics_phandle,
+                               void *fdt);
+
+static inline bool spapr_phb_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write)
+{
+	if ((phys_addr >= SPAPR_PCI_IO_WIN_ADDR) &&
+	    (phys_addr < SPAPR_PCI_IO_WIN_ADDR +
+	     SPAPR_PCI_IO_WIN_SIZE)) {
+		return kvm__emulate_io(kvm, phys_addr - SPAPR_PCI_IO_WIN_ADDR,
+				       data, is_write ? KVM_EXIT_IO_OUT :
+				       KVM_EXIT_IO_IN,
+				       len, 1);
+	} else if ((phys_addr >= SPAPR_PCI_MEM_WIN_ADDR) &&
+		   (phys_addr < SPAPR_PCI_MEM_WIN_ADDR +
+		    SPAPR_PCI_MEM_WIN_SIZE)) {
+		return kvm__emulate_mmio(kvm, phys_addr - SPAPR_PCI_MEM_WIN_ADDR,
+					 data, len, is_write);
+	}
+	return false;
+}
+
+#endif
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 7/7] kvm tools: Add PPC64 kvm_cpu__emulate_io()
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
                   ` (5 preceding siblings ...)
  2012-01-31  6:34 ` [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge Matt Evans
@ 2012-01-31  6:34 ` Matt Evans
  2012-01-31  8:29 ` [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Pekka Enberg
  7 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31  6:34 UTC (permalink / raw)
  To: kvm, kvm-ppc; +Cc: penberg, asias.hejun, levinsasha928, gorcunov, david, aik

This is the final piece of the puzzle for PPC SPAPR PCI; this
function splits MMIO accesses into the two PHB windows & directs
things to MMIO/IO emulation as appropriate.

Signed-off-by: Matt Evans <matt@ozlabs.org>
---
 tools/kvm/Makefile                           |    1 +
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   10 +++++++++-
 tools/kvm/powerpc/kvm-cpu.c                  |   22 ++++++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 2d97b66..332b374 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -137,6 +137,7 @@ ifeq ($(uname_M), ppc64)
 	OBJS	+= powerpc/spapr_hcall.o
 	OBJS	+= powerpc/spapr_rtas.o
 	OBJS	+= powerpc/spapr_hvcons.o
+	OBJS	+= powerpc/spapr_pci.o
 	OBJS	+= powerpc/xics.o
 # We use libfdt, but it's sometimes not packaged 64bit.  It's small too,
 # so just build it in:
diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
index c1c6539..7520c04 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
@@ -14,7 +14,7 @@
 /* Architecture-specific kvm_cpu definitions. */
 
 #include <linux/kvm.h>	/* for struct kvm_regs */
-
+#include <stdbool.h>
 #include <pthread.h>
 
 #define MSR_SF		(1UL<<63)
@@ -65,4 +65,12 @@ struct kvm_cpu {
 
 void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level);
 
+/* This is never actually called on PPC. */
+static inline bool kvm_cpu__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int size, u32 count)
+{
+	return false;
+}
+
+bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write);
+
 #endif /* KVM__KVM_CPU_ARCH_H */
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 7dd1679..2505c69 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -15,6 +15,7 @@
 #include "kvm/kvm.h"
 
 #include "spapr.h"
+#include "spapr_pci.h"
 #include "xics.h"
 
 #include <sys/ioctl.h>
@@ -24,6 +25,7 @@
 #include <string.h>
 #include <errno.h>
 #include <stdio.h>
+#include <assert.h>
 
 static int debug_fd;
 
@@ -192,6 +194,26 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
 	return ret;
 }
 
+bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write)
+{
+	/*
+	 * FIXME: This function will need to be split in order to support
+	 * various PowerPC platforms/PHB types, etc.  It currently assumes SPAPR
+	 * PPC64 guest.
+	 */
+	bool ret = false;
+
+	if ((phys_addr >= SPAPR_PCI_WIN_START) &&
+	    (phys_addr < SPAPR_PCI_WIN_END)) {
+		ret = spapr_phb_mmio(kvm, phys_addr, data, len, is_write);
+	} else {
+		pr_warning("MMIO %s unknown address %llx (size %d)!\n",
+			   is_write ? "write to" : "read from",
+			   phys_addr, len);
+	}
+	return ret;
+}
+
 #define CONDSTR_BIT(m, b) (((m) & MSR_##b) ? #b" " : "")
 
 void kvm_cpu__show_registers(struct kvm_cpu *vcpu)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
  2012-01-31  6:34 ` [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree Matt Evans
@ 2012-01-31  7:59   ` Pekka Enberg
  2012-01-31 10:00     ` Matt Evans
  2012-02-01  3:39   ` David Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Pekka Enberg @ 2012-01-31  7:59 UTC (permalink / raw)
  To: Matt Evans; +Cc: kvm, kvm-ppc, asias.hejun, levinsasha928, gorcunov, david, aik

On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans <matt@ozlabs.org> wrote:
> +static struct cpu_info cpu_power7_info = {
> +       "POWER7",
> +       power7_page_sizes_prop, sizeof(power7_page_sizes_prop),
> +       power7_segment_sizes_prop, sizeof(power7_segment_sizes_prop),
> +       32,             /* SLB size */
> +       512000000,      /* TB frequency */
> +       128,            /* d-cache block size */
> +       128,            /* i-cache block size */
> +       CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX
> +};

Why don't you use C99 initializers here?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure
  2012-01-31  6:34 ` [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure Matt Evans
@ 2012-01-31  8:11   ` Pekka Enberg
  2012-01-31 10:22     ` Matt Evans
  0 siblings, 1 reply; 17+ messages in thread
From: Pekka Enberg @ 2012-01-31  8:11 UTC (permalink / raw)
  To: Matt Evans
  Cc: kvm, kvm-ppc, asias.hejun, levinsasha928, gorcunov, david, aik,
	Ingo Molnar

On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans <matt@ozlabs.org> wrote:
> +#define DEBUG_SPAPR_HCALLS

I suppose this shouldn't be defined by default?

> +#ifdef DEBUG_SPAPR_HCALLS
> +#define hcall_dprintf(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define hcall_dprintf(fmt, ...) \
> +    do { } while (0)
> +#endif

I don't expect you to fix this but this sort of thing really cries out
for userspace tracepoints.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support
  2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
                   ` (6 preceding siblings ...)
  2012-01-31  6:34 ` [PATCH V4 7/7] kvm tools: Add PPC64 kvm_cpu__emulate_io() Matt Evans
@ 2012-01-31  8:29 ` Pekka Enberg
  7 siblings, 0 replies; 17+ messages in thread
From: Pekka Enberg @ 2012-01-31  8:29 UTC (permalink / raw)
  To: Matt Evans
  Cc: kvm, kvm-ppc, asias.hejun, levinsasha928, gorcunov, david, aik,
	Ingo Molnar

On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans <matt@ozlabs.org> wrote:
> This series adds support for a PPC64 platform, SPAPR, on top of the previous
> more general PPC64 CPU support.  This platform is paravirtualised, with most
> of the MMU hypercalls being dealt with in the kernel.  Userland needs to
> describe the environment to the machine with a device tree, cope with some
> hypercalls (and RTAS calls) for hvconsole, PCI config and IRQs, and emulate
> the interrupt controller (XICS) and PCI Host Bridge of the SPAPR-esque
> machine.

I think the patches are good to go once we've sorted out the minor
nits I pointed out.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
  2012-01-31  7:59   ` Pekka Enberg
@ 2012-01-31 10:00     ` Matt Evans
  0 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31 10:00 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kvm, kvm-ppc, asias.hejun, levinsasha928, gorcunov, david, aik


On 31 Jan 2012, at 18:59, Pekka Enberg wrote:

> On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans <matt@ozlabs.org> wrote:
>> +static struct cpu_info cpu_power7_info = {
>> +       "POWER7",
>> +       power7_page_sizes_prop, sizeof(power7_page_sizes_prop),
>> +       power7_segment_sizes_prop, sizeof(power7_segment_sizes_prop),
>> +       32,             /* SLB size */
>> +       512000000,      /* TB frequency */
>> +       128,            /* d-cache block size */
>> +       128,            /* i-cache block size */
>> +       CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX
>> +};
> 
> Why don't you use C99 initializers here?

...just out of bad bad habit :)  Initialisers'd be cleaner, I'll convert them.


Thanks!


Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure
  2012-01-31  8:11   ` Pekka Enberg
@ 2012-01-31 10:22     ` Matt Evans
  0 siblings, 0 replies; 17+ messages in thread
From: Matt Evans @ 2012-01-31 10:22 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kvm, kvm-ppc, asias.hejun, levinsasha928, gorcunov, david, aik,
	Ingo Molnar

On 31 Jan 2012, at 19:11, Pekka Enberg wrote:

> On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans <matt@ozlabs.org> wrote:
>> +#define DEBUG_SPAPR_HCALLS
> 
> I suppose this shouldn't be defined by default?

Well, I had a bit of a debate about it.  I left it on as it is actually interesting whilst experimenting with different SPAPR guests.  For instance (and this is really out there, there are other reasons why this won't work) if someone tries an AIX guest it'd be good to see lots of "Unsupported!" spew.  (The 'unsupported' message could be a warning but that might also get annoying, OTOH.)  I would prefer to leave it on for the moment.

> 
>> +#ifdef DEBUG_SPAPR_HCALLS
>> +#define hcall_dprintf(fmt, ...) \
>> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define hcall_dprintf(fmt, ...) \
>> +    do { } while (0)
>> +#endif
> 
> I don't expect you to fix this but this sort of thing really cries out
> for userspace tracepoints.

Ha!  Yes...

Cheers,

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
  2012-01-31  6:34 ` [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree Matt Evans
  2012-01-31  7:59   ` Pekka Enberg
@ 2012-02-01  3:39   ` David Gibson
  2012-02-01  4:13     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 17+ messages in thread
From: David Gibson @ 2012-02-01  3:39 UTC (permalink / raw)
  To: Matt Evans
  Cc: kvm, kvm-ppc, penberg, asias.hejun, levinsasha928, gorcunov, aik

On Tue, Jan 31, 2012 at 05:34:37PM +1100, Matt Evans wrote:
> The generated DT is the bare minimum structure required for SPAPR (on which
> subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory.
> 
> The DT contains CPU-specific configuration; a very simple 'cpu info' mechanism
> is added to recognise/differentiate DT entries for POWER7 and PPC970 host CPUs.
> Future support of more CPUs is possible.
> 
> libfdt is included from scripts/dtc/libfdt.
> 
> Signed-off-by: Matt Evans <matt@ozlabs.org>

For the bits derived from my code:

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge
  2012-01-31  6:34 ` [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge Matt Evans
@ 2012-02-01  3:40   ` David Gibson
  2012-02-01  4:14     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2012-02-01  3:40 UTC (permalink / raw)
  To: Matt Evans
  Cc: kvm, kvm-ppc, penberg, asias.hejun, levinsasha928, gorcunov, aik

On Tue, Jan 31, 2012 at 05:34:41PM +1100, Matt Evans wrote:
> This provides the PCI bridge, definitions for the address layout of the windows
> and wires in IRQs.  Once PCI devices are all registered, they are enumerated and
> DT nodes generated for each.
> 
> Signed-off-by: Matt Evans <matt@ozlabs.org>

For the bits derived from my qemu code:

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
  2012-02-01  3:39   ` David Gibson
@ 2012-02-01  4:13     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2012-02-01  4:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Matt Evans, kvm, kvm-ppc, penberg, asias.hejun, levinsasha928,
	gorcunov

On 01/02/12 14:39, David Gibson wrote:
> On Tue, Jan 31, 2012 at 05:34:37PM +1100, Matt Evans wrote:
>> The generated DT is the bare minimum structure required for SPAPR (on which
>> subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory.
>>
>> The DT contains CPU-specific configuration; a very simple 'cpu info' mechanism
>> is added to recognise/differentiate DT entries for POWER7 and PPC970 host CPUs.
>> Future support of more CPUs is possible.
>>
>> libfdt is included from scripts/dtc/libfdt.
>>
>> Signed-off-by: Matt Evans <matt@ozlabs.org>
> 
> For the bits derived from my code:
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 

For the bits derived from my code:

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>



-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge
  2012-02-01  3:40   ` David Gibson
@ 2012-02-01  4:14     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2012-02-01  4:14 UTC (permalink / raw)
  To: David Gibson
  Cc: Matt Evans, kvm, kvm-ppc, penberg, asias.hejun, levinsasha928,
	gorcunov

On 01/02/12 14:40, David Gibson wrote:
> On Tue, Jan 31, 2012 at 05:34:41PM +1100, Matt Evans wrote:
>> This provides the PCI bridge, definitions for the address layout of the windows
>> and wires in IRQs.  Once PCI devices are all registered, they are enumerated and
>> DT nodes generated for each.
>>
>> Signed-off-by: Matt Evans <matt@ozlabs.org>
> 
> For the bits derived from my qemu code:
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 

For the bits derived from my qemu code:

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>


-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-02-01  4:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-31  6:34 [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Matt Evans
2012-01-31  6:34 ` [PATCH V4 1/7] kvm tools: PPC64, add HPT/SDR1 for -PR KVM Matt Evans
2012-01-31  6:34 ` [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree Matt Evans
2012-01-31  7:59   ` Pekka Enberg
2012-01-31 10:00     ` Matt Evans
2012-02-01  3:39   ` David Gibson
2012-02-01  4:13     ` Alexey Kardashevskiy
2012-01-31  6:34 ` [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall & rtascall structure Matt Evans
2012-01-31  8:11   ` Pekka Enberg
2012-01-31 10:22     ` Matt Evans
2012-01-31  6:34 ` [PATCH V4 4/7] kvm tools: Add SPAPR PPC64 HV console Matt Evans
2012-01-31  6:34 ` [PATCH V4 5/7] kvm tools: Add PPC64 XICS interrupt controller support Matt Evans
2012-01-31  6:34 ` [PATCH V4 6/7] kvm tools: Add PPC64 PCI Host Bridge Matt Evans
2012-02-01  3:40   ` David Gibson
2012-02-01  4:14     ` Alexey Kardashevskiy
2012-01-31  6:34 ` [PATCH V4 7/7] kvm tools: Add PPC64 kvm_cpu__emulate_io() Matt Evans
2012-01-31  8:29 ` [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support Pekka Enberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).