[Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
@ 2011-09-30  7:39 David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 1/4] pseries: Support SMT systems for KVM Book3S-HV David Gibson
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: David Gibson @ 2011-09-30  7:39 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

Alex Graf has added support for KVM acceleration of the pseries
machine, using his Book3S-PR KVM variant, which runs the guest in
userspace, emulating supervisor operations.  Recent kernels now have
the Book3S-HV KVM variant which uses the hardware hypervisor features
of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
working roughly with Book3S-HV, but taking full advantage of this mode
needs more work.  This patch series makes a start on better exploiting
Book3S-HV.

Even with these patches, qemu won't quite be able to run on a current
Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
memory to be backed by hugepages, but qemu refuses to use hugepages
for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
does not currently do.  We're working on improvements to the KVM code
which will implement CAP_SYNC_MMU and allow smallpage backing of
guests, but they're not there yet.  So, in order to test Book3S-HV for
now you need to either:

 * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
   it doesn't really implement it.

or

 * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
   option is used.

Bot approaches are ugly and unsafe, but it seems we can generally get
away with it in practice.  Obviously this is only an interim hack
until the proper CAP_SYNC_MMU support is ready.

Changes in v2:
 * Bug fixes and style fixes based on feedback.
 * Added new, fixed, SLOF update

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/4] pseries: Support SMT systems for KVM Book3S-HV
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
@ 2011-09-30  7:39 ` David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 2/4] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2011-09-30  7:39 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

Alex Graf has already made qemu support KVM for the pseries machine
when using the Book3S-PR KVM variant (which runs the guest in
usermode, emulating supervisor operations).  This code allows gets us
very close to also working with KVM Book3S-HV (using the hypervisor
capabilities of recent POWER CPUs).

This patch moves us another step towards Book3S-HV support by
correctly handling SMT (multithreaded) POWER CPUs.  There are two
parts to this:

 * Querying KVM to check SMT capability, and if present, adjusting the
   cpu numbers that qemu assigns to cause KVM to assign guest threads
   to cores in the right way (this isn't automatic, because the POWER
   HV support has a limitation that different threads on a single core
   cannot be in different guests at the same time).

 * Correctly informing the guest OS of the SMT thread to core mappings
   via the device tree.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |   24 +++++++++++++++++++++---
 target-ppc/helper.c  |   11 +++++++++++
 target-ppc/kvm.c     |    8 ++++++++
 target-ppc/kvm_ppc.h |    6 ++++++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index b118975..8d6d76e 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -29,6 +29,9 @@
 #include "elf.h"
 #include "net.h"
 #include "blockdev.h"
+#include "cpus.h"
+#include "kvm.h"
+#include "kvm_ppc.h"
 
 #include "hw/boards.h"
 #include "hw/ppc.h"
@@ -103,6 +106,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
+    int smt = kvmppc_smt_threads();
 
 #define _FDT(exp) \
     do { \
@@ -162,13 +166,18 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         int index = env->cpu_index;
-        uint32_t gserver_prop[] = {cpu_to_be32(index), 0}; /* HACK! */
+        uint32_t servers_prop[smp_threads];
+        uint32_t gservers_prop[smp_threads * 2];
         char *nodename;
         uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
                            0xffffffff, 0xffffffff};
         uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
         uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
 
+        if ((index % smt) != 0) {
+            continue;
+        }
+
         if (asprintf(&nodename, "%s@%x", modelname, index) < 0) {
             fprintf(stderr, "Allocation failure\n");
             exit(1);
@@ -193,9 +202,18 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
                            pft_size_prop, sizeof(pft_size_prop))));
         _FDT((fdt_property_string(fdt, "status", "okay")));
         _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
-        _FDT((fdt_property_cell(fdt, "ibm,ppc-interrupt-server#s", index)));
+
+        /* Build interrupt servers and gservers properties */
+        for (i = 0; i < smp_threads; i++) {
+            servers_prop[i] = cpu_to_be32(index + i);
+            /* Hack, direct the group queues back to cpu 0 */
+            gservers_prop[i*2] = cpu_to_be32(index + i);
+            gservers_prop[i*2 + 1] = 0;
+        }
+        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s",
+                           servers_prop, sizeof(servers_prop))));
         _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
-                           gserver_prop, sizeof(gserver_prop))));
+                           gservers_prop, sizeof(gservers_prop))));
 
         if (env->mmu_model & POWERPC_MMU_1TSEG) {
             _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 6339be3..137a494 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -26,6 +26,8 @@
 #include "helper_regs.h"
 #include "qemu-common.h"
 #include "kvm.h"
+#include "kvm_ppc.h"
+#include "cpus.h"
 
 //#define DEBUG_MMU
 //#define DEBUG_BATS
@@ -3189,6 +3191,15 @@ CPUPPCState *cpu_ppc_init (const char *cpu_model)
     if (tcg_enabled()) {
         ppc_translate_init();
     }
+    /* Adjust cpu index for SMT */
+#if !defined(CONFIG_USER_ONLY)
+    if (kvm_enabled()) {
+        int smt = kvmppc_smt_threads();
+
+        env->cpu_index = (env->cpu_index / smp_threads)*smt
+            + (env->cpu_index % smp_threads);
+    }
+#endif /* !CONFIG_USER_ONLY */
     env->cpu_model_str = cpu_model;
     cpu_ppc_register_internal(env, def);
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 75832d8..6c7ca6f 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -28,6 +28,7 @@
 #include "kvm_ppc.h"
 #include "cpu.h"
 #include "device_tree.h"
+#include "hw/spapr.h"
 
 #include "hw/sysbus.h"
 #include "hw/spapr.h"
@@ -53,6 +54,7 @@ static int cap_interrupt_unset = false;
 static int cap_interrupt_level = false;
 static int cap_segstate;
 static int cap_booke_sregs;
+static int cap_ppc_smt;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -76,6 +78,7 @@ int kvm_arch_init(KVMState *s)
     cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
     cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
     cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
+    cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -750,6 +753,11 @@ fail:
     cpu_abort(env, "This KVM version does not support PAPR\n");
 }
 
+int kvmppc_smt_threads(void)
+{
+    return cap_ppc_smt ? cap_ppc_smt : 1;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index c484e60..c298411 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -18,6 +18,7 @@ uint64_t kvmppc_get_clockfreq(void);
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
+int kvmppc_smt_threads(void);
 
 #else
 
@@ -45,6 +46,11 @@ static inline void kvmppc_set_papr(CPUState *env)
 {
 }
 
+static inline int kvmppc_smt_threads(void)
+{
+    return 1;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/4] pseries: Allow KVM Book3S-HV on PPC970 CPUS
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 1/4] pseries: Support SMT systems for KVM Book3S-HV David Gibson
@ 2011-09-30  7:39 ` David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 3/4] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2011-09-30  7:39 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

At present, using the hypervisor aware Book3S-HV KVM will only work
with qemu on POWER7 CPUs.  PPC970 CPUs also have hypervisor
capability, but they lack the VRMA feature which makes assigning guest
memory easier.

In order to allow KVM Book3S-HV on PPC970, we need to specially
allocate the first chunk of guest memory (the "Real Mode Area" or
RMA), so that it is physically contiguous.

Sufficiently recent host kernels allow such contiguous RMAs to be
allocated, with a kvm capability advertising whether the feature is
available and/or necessary on this hardware.  This patch enables qemu
to use this support, thus allowing kvm acceleration of pseries qemu
machines on PPC970 hardware.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |   56 +++++++++++++++++++++++++++++++++++++++----------
 target-ppc/kvm.c     |   43 ++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h |    6 +++++
 3 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 8d6d76e..9a3a1ea 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -89,6 +89,7 @@ qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num)
 }
 
 static void *spapr_create_fdt_skel(const char *cpu_model,
+                                   target_phys_addr_t rma_size,
                                    target_phys_addr_t initrd_base,
                                    target_phys_addr_t initrd_size,
                                    const char *boot_device,
@@ -97,7 +98,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 {
     void *fdt;
     CPUState *env;
-    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ram_size) };
+    uint64_t mem_reg_property_rma[] = { 0, cpu_to_be64(rma_size) };
+    uint64_t mem_reg_property_nonrma[] = { cpu_to_be64(rma_size),
+                                           cpu_to_be64(ram_size - rma_size) };
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
@@ -143,15 +146,25 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 
     _FDT((fdt_end_node(fdt)));
 
-    /* memory node */
+    /* memory node(s) */
     _FDT((fdt_begin_node(fdt, "memory@0")));
 
     _FDT((fdt_property_string(fdt, "device_type", "memory")));
-    _FDT((fdt_property(fdt, "reg",
-                       mem_reg_property, sizeof(mem_reg_property))));
-
+    _FDT((fdt_property(fdt, "reg", mem_reg_property_rma,
+                       sizeof(mem_reg_property_rma))));
     _FDT((fdt_end_node(fdt)));
 
+    if (ram_size > rma_size) {
+        char mem_name[32];
+
+        sprintf(mem_name, "memory@%" PRIx64, (uint64_t)rma_size);
+        _FDT((fdt_begin_node(fdt, mem_name)));
+        _FDT((fdt_property_string(fdt, "device_type", "memory")));
+        _FDT((fdt_property(fdt, "reg", mem_reg_property_nonrma,
+                           sizeof(mem_reg_property_nonrma))));
+        _FDT((fdt_end_node(fdt)));
+    }
+
     /* cpus */
     _FDT((fdt_begin_node(fdt, "cpus")));
 
@@ -342,6 +355,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 {
     CPUState *env;
     int i;
+    target_phys_addr_t rma_alloc_size, rma_size;
     ram_addr_t ram_offset;
     uint32_t initrd_base;
     long kernel_size, initrd_size, fw_size;
@@ -351,10 +365,23 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     spapr = g_malloc(sizeof(*spapr));
     cpu_ppc_hypercall = emulate_spapr_hypercall;
 
-    /* We place the device tree just below either the top of RAM, or
-     * 2GB, so that it can be processed with 32-bit code if
-     * necessary */
-    spapr->fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
+    /* Allocate RMA if necessary */
+    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma");
+
+    if (rma_alloc_size == -1) {
+        hw_error("qemu: Unable to create RMA\n");
+        exit(1);
+    }
+    if (rma_alloc_size && (rma_alloc_size < ram_size)) {
+        rma_size = rma_alloc_size;
+    } else {
+        rma_size = ram_size;
+    }
+
+    /* We place the device tree just below either the top of the RMA,
+     * or just below 2GB, whichever is lowere, so that it can be
+     * processed with 32-bit real mode code if necessary */
+    spapr->fdt_addr = MIN(rma_size, 0x80000000) - FDT_MAX_SIZE;
     spapr->rtas_addr = spapr->fdt_addr - RTAS_MAX_SIZE;
 
     /* init CPUs */
@@ -379,8 +406,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
     /* allocate RAM */
     spapr->ram_limit = ram_size;
-    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", spapr->ram_limit);
-    cpu_register_physical_memory(0, ram_size, ram_offset);
+    if (spapr->ram_limit > rma_alloc_size) {
+        ram_addr_t nonrma_base = rma_alloc_size;
+        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
+
+        ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", nonrma_size);
+        cpu_register_physical_memory(nonrma_base, nonrma_size, ram_offset);
+    }
 
     /* allocate hash page table.  For now we always make this 16mb,
      * later we should probably make it scale to the size of guest
@@ -504,7 +536,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     }
 
     /* Prepare the device tree */
-    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
+    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
                                             initrd_base, initrd_size,
                                             boot_device, kernel_cmdline,
                                             pteg_shift + 7);
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 6c7ca6f..cee767b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -55,6 +55,7 @@ static int cap_interrupt_level = false;
 static int cap_segstate;
 static int cap_booke_sregs;
 static int cap_ppc_smt;
+static int cap_ppc_rma;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -79,6 +80,7 @@ int kvm_arch_init(KVMState *s)
     cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
     cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
+    cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -758,6 +760,47 @@ int kvmppc_smt_threads(void)
     return cap_ppc_smt ? cap_ppc_smt : 1;
 }
 
+off_t kvmppc_alloc_rma(const char *name)
+{
+    void *rma;
+    ram_addr_t rma_offset;
+    off_t size;
+    int fd;
+    struct kvm_allocate_rma ret;
+
+    /* If cap_ppc_rma == 0, contiguous RMA allocation is not supported
+     * if cap_ppc_rma == 1, contiguous RMA allocation is supported, but
+     *                      not necessary on this hardware
+     * if cap_ppc_rma == 2, contiguous RMA allocation is needed on this hardware
+     *
+     * FIXME: We should allow the user to force contiguous RMA
+     * allocation in the cap_ppc_rma==1 case.
+     */
+    if (cap_ppc_rma < 2) {
+        return 0;
+    }
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_ALLOCATE_RMA, &ret);
+    if (fd < 0) {
+        fprintf(stderr, "KVM: Error on KVM_ALLOCATE_RMA: %s\n",
+                strerror(errno));
+        return -1;
+    }
+
+    size = MIN(ret.rma_size, 256ul << 20);
+
+    rma = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+    if (rma == MAP_FAILED) {
+        fprintf(stderr, "KVM: Error mapping RMA: %s\n", strerror(errno));
+        return -1;
+    };
+
+    rma_offset = qemu_ram_alloc_from_ptr(NULL, name, size, rma);
+    cpu_register_physical_memory(0, size, rma_offset);
+
+    return size;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index c298411..ad9903c 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -19,6 +19,7 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
 int kvmppc_smt_threads(void);
+off_t kvmppc_alloc_rma(const char *name);
 
 #else
 
@@ -51,6 +52,11 @@ static inline int kvmppc_smt_threads(void)
     return 1;
 }
 
+static inline off_t kvmppc_alloc_rma(const char *name)
+{
+    return 0;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 3/4] pseries: Use Book3S-HV TCE acceleration capabilities
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 1/4] pseries: Support SMT systems for KVM Book3S-HV David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 2/4] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
@ 2011-09-30  7:39 ` David Gibson
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 4/4] pseries: Update SLOF firmware image David Gibson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2011-09-30  7:39 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

The pseries machine of qemu implements the TCE mechanism used as a
virtual IOMMU for the PAPR defined virtual IO devices.  Because the
PAPR spec only defines a small DMA address space, the guest VIO
drivers need to update TCE mappings very frequently - the virtual
network device is particularly bad.  This means many slow exits to
qemu to emulate the H_PUT_TCE hypercall.

Sufficiently recent kernels allow this to be mitigated by implementing
H_PUT_TCE in the host kernel.  To make use of this, however, qemu
needs to initialize the necessary TCE tables, and map them into itself
so that the VIO device implementations can retrieve the mappings when
they access guest memory (which is treated as a virtual DMA
operation).

This patch adds the necessary calls to use the KVM TCE acceleration.
If the kernel does not support acceleration, or there is some other
error creating the accelerated TCE table, then it will still fall back
to full userspace TCE implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr_vio.c       |    8 ++++++-
 hw/spapr_vio.h       |    1 +
 target-ppc/kvm.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h |   14 +++++++++++++
 4 files changed, 76 insertions(+), 1 deletions(-)

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 35818e1..1da3032 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -165,7 +165,13 @@ static void rtce_init(VIOsPAPRDevice *dev)
         * sizeof(VIOsPAPR_RTCE);
 
     if (size) {
-        dev->rtce_table = g_malloc0(size);
+        dev->rtce_table = kvmppc_create_spapr_tce(dev->reg,
+                                                  dev->rtce_window_size,
+                                                  &dev->kvmtce_fd);
+
+        if (!dev->rtce_table) {
+            dev->rtce_table = g_malloc0(size);
+        }
     }
 }
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 4fe5f74..a325a5f 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -57,6 +57,7 @@ typedef struct VIOsPAPRDevice {
     target_ulong signal_state;
     uint32_t rtce_window_size;
     VIOsPAPR_RTCE *rtce_table;
+    int kvmtce_fd;
     VIOsPAPR_CRQ crq;
 } VIOsPAPRDevice;
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index cee767b..26165b6 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -28,6 +28,7 @@
 #include "kvm_ppc.h"
 #include "cpu.h"
 #include "device_tree.h"
+#include "hw/sysbus.h"
 #include "hw/spapr.h"
 
 #include "hw/sysbus.h"
@@ -56,6 +57,7 @@ static int cap_segstate;
 static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
+static int cap_spapr_tce;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -81,6 +83,7 @@ int kvm_arch_init(KVMState *s)
     cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
+    cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -801,6 +804,57 @@ off_t kvmppc_alloc_rma(const char *name)
     return size;
 }
 
+void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
+{
+    struct kvm_create_spapr_tce args = {
+        .liobn = liobn,
+        .window_size = window_size,
+    };
+    long len;
+    int fd;
+    void *table;
+
+    if (!cap_spapr_tce) {
+        return NULL;
+    }
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
+    if (fd < 0) {
+        return NULL;
+    }
+
+    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE) * sizeof(VIOsPAPR_RTCE);
+    /* FIXME: round this up to page size */
+
+    table = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
+    if (table == MAP_FAILED) {
+        close(fd);
+        return NULL;
+    }
+
+    *pfd = fd;
+    return table;
+}
+
+int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
+{
+    long len;
+
+    if (fd < 0) {
+        return -1;
+    }
+
+    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE)*sizeof(VIOsPAPR_RTCE);
+    if ((munmap(table, len) < 0) ||
+        (close(fd) < 0)) {
+        fprintf(stderr, "KVM: Unexpected error removing KVM SPAPR TCE "
+                "table: %s", strerror(errno));
+        /* Leak the table */
+    }
+
+    return 0;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index ad9903c..9e8a7b5 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -20,6 +20,8 @@ int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
 int kvmppc_smt_threads(void);
 off_t kvmppc_alloc_rma(const char *name);
+void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd);
+int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 
 #else
 
@@ -57,6 +59,18 @@ static inline off_t kvmppc_alloc_rma(const char *name)
     return 0;
 }
 
+static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
+                                            uint32_t window_size, int *fd)
+{
+    return NULL;
+}
+
+static inline int kvmppc_remove_spapr_tce(void *table, int pfd,
+                                          uint32_t window_size)
+{
+    return -1;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 4/4] pseries: Update SLOF firmware image
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
                   ` (2 preceding siblings ...)
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 3/4] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
@ 2011-09-30  7:39 ` David Gibson
  2011-10-07  6:57 ` [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) Alexander Graf
  2011-10-07  7:06 ` Alexander Graf
  5 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2011-09-30  7:39 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

This patch updates the SLOF submodule and precompiled image.  The new
SLOF versions contains two changes of note:

 * The previous SLOF has a bug in SCSI condition handling that was
   exposed by recent updates to qemu's SCSI emulation.  This update
   fixes the bug.

 * The previous SLOF has a bug in its addressing of SCSI devices,
   which can be exposed under certain conditions.  The new SLOF also
   fixes this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pc-bios/README   |    2 +-
 pc-bios/slof.bin |  Bin 579072 -> 578968 bytes
 roms/SLOF        |    2 +-
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pc-bios/README b/pc-bios/README
index 02651fe..aada33b 100644
--- a/pc-bios/README
+++ b/pc-bios/README
@@ -17,7 +17,7 @@
 - SLOF (Slimline Open Firmware) is a free IEEE 1275 Open Firmware
   implementation for certain IBM POWER hardware.  The sources are at
   https://github.com/dgibson/SLOF, and the image currently in qemu is
-  built from git tag qemu-slof-20110323.
+  built from git tag qemu-slof-20110930.
 
 - The PXE roms come from the iPXE project. Built with BANNER_TIME 0.
   Sources available at http://ipxe.org.  Vendor:Device ID -> ROM mapping:
diff --git a/pc-bios/slof.bin b/pc-bios/slof.bin
index 22c4c7f5c448e3002aefecf3438f5d080586d666..b9d4f35053be2cd6f599bffddf0ee99627eb00a9 100644
GIT binary patch
delta 2772
zcma);e{2)i9l+o74|2XG5K@<<1)3MqK~gx|fsh^n6Pqj{At51z6uKcbm%Fol<9v7c
z&W*h$vgy(>`o~IQX_r4TNT;pcw2HB%n%7A!-o^m!M_aZb(fu*CV^hiKB&JqXO0;?J
z{Ns`F&vw$g@8^Br_rAa0JD=09HO~H^@obyz#_TlP>Tvno!_%7Eag*KpntAa$wJwu#
z2B!bK6SA4<&_^@2kS(}0Jye%TmubdMRAriR5LKCGoR$i^n4MWgsDcnh$VG@I^f^M6
zgsKRwCbWi7HKDbHY6v|<$W7>BLhA_C5~?Hg2%$#_ttYgB&_+U!5vnJI2sIGeM5vL_
z<Agp>XfvTLgqjF76MBNsRzlkdF@!vXycR9sJ*u7wG~yi}1ywx&e1;{B<v+0;#&QwM
zek?!0auCb+u<XY2Iv#Iijs+51hD8nS@S*L#mKM~~+P-r~drMmtjk=Gbr^J{psS5HV
zVLY5!4s39KmmTe`%xnlY*v7*PoxwDOyB03*yR-@}W~O?NR`vJg@4W{~SLXfRTk!jt
z%Llf=Q<;wrY=t8Wb$yp0OlEHOUxNOH*@Jf=JeipqJOKZld1o+RV*?wmfa-~>+uiIu
z7-G{Poq7Mzm!U1wH<ZP9`{5mrl}x_2$-O5Bz~~hK5?3m&2v}lHFC00%z6SO$Xu_8B
zF0GJ;U0}_5I~;^RHSgNtlkj-9$pNW7ql>omERFzhH^AsrKEG!kz{$$a1fBwGU_g^i
z;DYxI00++jw5t(7HU~hm8KBgeCJ;OasEOb9Twrej+hhb>U;$X2_z<88m;IHA#odYQ
z*#Nv))!#8MjirA1abgJqgMTx+4Tl^`y>KhR=IZZeUItoU{*xcw;XwMfEq&3_Teg-B
zo2w|SThBfB96;FajCpndzO?a?+uu&-N@g0HTXVNB|MB1Y@}I6{8$iws3`3vUGvlZ_
z9Up{v*zmAt<AaV@A9n5U>}*F(`v*p{BckICX#4CFvpVeXu04*+1;EY~B>-#9Ls(tz
zAE!+Z)tcwSjwV;p;H#H+nL{DRqvqekj*U<=sfc6T6)E_DKGytt#IYZ;r{9e@vhPNm
zKVeqmfX%4Y)9Tr2*<|Ow={x~pA;x;Man|_|%`6okX7Dc+judiKn*6NuOEzXHo11n1
z!)A)pj@nd5It`Xz&fi}CdHyaxM}g%J@=MDN`D;rJ`QV+ur0Y`uT27?vfqN7?E04tv
zP<N^xpaj_b*U4U+dr|_PJ5K;=1RJE%0T4I`tCOR(w|3_M2tE&vrC$V&)YQ#D@LS-T
z_d5VgoU3>pZ*+j2%DeN)6AqM^!lxqnMK?PIV5;{I=FvIlW@iI1GN;Vv=bT=bA0U5o
z8ch4mpUpWtp#Su}IVT>g@Tkb%o1++;>k!TlU)yT3zVp;Ks;iK79Y=Cpp;@1KGP=rb
z`rdR>`*cwjc>~F6SmHSu#TCgwoFK$RU6+(Fii$==6=*%Cp}qLV@KJ%mJ`mqfaA@7K
zfBGRsJ0I^I7(Nml=<GsI1^Y+35Fd}lM8#kvf%c*DTK3b|sckOv>9?qQ_k9Cos3mjr
z#kZ)JJUUKDjiDAF5@Vb$(lL<}7@pHOUNVwMQ4Q4Fjh*^X*RkFsv?*PpB3XNmI_b>5
z@+;~jG{=5H)ixoY-!k)TLSK;d6NRTNKgotoh>y`bgqW()M|!#jvg@u=KZa|3NJ6@3
zFs0I8y?u?cby+dOSgELjh*V8fiqs**wW9k%2FP+lN0SV0WeDw!(bh69siZMK{592b
zke5|mEIdxf_O4i^chk@6ye=^kZal*&0;BVsLKinLU%HAj$9zdXlkNEp^@7cWm8;g9
zO>a}x=GEU(fv2Nff*F%l{sg1MqgH;XIMb!Ch}LxQaH%2PoFt2al>{yh308Z`-7R*f
zJJ>(mRqSq|MNhs>b---m9csX4vv-@<u2aE)SJezJA5(SR8|HbY@Z#b6WND1|YU;Qc
z)A+VlMwXO#!aF7@59%1x^?S;bDx1)qsuGgI@fgl)Trwi<$Qx0kqSsKBq<1VX$pR<K
zUOmQJ!MuFg7%_~&#c&n=2SVZnQQ^j9v9zxH@$*=z#l(=5@RyyZR(dZK<4Le>&$PAv
z@0#vMkU1qB=fa}y=dseto>9bc+=$XljT=k^A3gk$E3fNG-4LUUoRY@k%DAKmcmeN2
zAz?+f%*rz(Vx8KR8a={UUAk|ABSqsvNanQvR_gi#im)18sS!k;mZF@@gk+U7%<Lbj
zjj2NRQ7sfBEn3+jtN%E4rr0AKtzaIGYX6^gxu!8yHFP7!;TNz_9g5=jOKwNH)UJeO
zk;Yq$o-O?)D4Z+&HR<eZMl}A3N3u}(wxE6ZWEqmIH>2&I?LJS7dHM#`<ZNrzG~WE)
Q4T`A&VCCzx5WKqkKRH*`cmMzZ

delta 2943
zcmb`Ie{2)i9l+nSeaYEtN}A?Zfs~hK<j0cpkB~5v#1KORB!PrLXwlTFKHr_s2N&DB
zJA*UA?6NRAX<D0D>SbNVoT_TKbz@XfXKh-WOiNQV+JP!DQBA8t=|n14DO0xzHQT&*
zKEH<2{j+Jd-hDss`~AM}d++<c?{4bly3|kVp4n>oJ(%#e*{pRHhbLm?_jvjqU&s}H
z>90})6YsUduJp{p&4~atX$ird6GIQAQw5l`5=8+fZA4LkNeV0Q@b69*5hx}=6R;C-
z5cmRtdkB;exR<~>0;L4*BTz=*egfqLDhN~(s3K5J-~j^b32Y#+kw6WB2MKH<fC$tQ
zc!)q9fiDu+OrV}X1A#^YO$5F~pqW4m0fvB+fXjrN=?e?j{!i`nWOrRV<qd$6y#V|Z
zO9z&J!}1W8A7Qx<%RgZ`h~*!#+>PZ`++Ir`=}zq1&nl>O3u<+HJjm1D(ca$C<}Goc
z-H)P21Vxpi5#&R>C<WDMq`NVuNXJ=CK*6{q^GziV0Kiz-^5R}V{m2{WEl*ebYl}?Z
z*-n27LeK24cFq;S>*=xHN2%fd+$Zm*mwIo&SJDf84e(p(Kle4m$7iei=OBDCeQn<y
z?3ta~e+$CV^!Q*OTuNUa%#|UqaS@b`z1dps&43|qiUR4S14kiB_YY<9(|T~rg9Vpw
zK2+W{4?tiMfW%_)B9CQYkpi>B2RD?#;n^79@B@1WuNVL;AJ#KgI0%>Z6)XHQOl2Ew
zknRe2QviHuNloD-fxvh!*OdVfBOuLd1<*T>!vlblsYc*G2k5bj;CvzgkoN|#ByIvU
z_F-q@R2ZE10&wq`-TERKYsl;~-S9%m0lZ4MBO?WGL!fc5jkE+7b1TW;e4N1352a^7
zEP3WeqCfZ7pWb3YYT1&yVDyRIJ3qK$_!whPo&*T{ozcHN0H3a@Sbjb=UvP_g=hv<D
z=RUsPpZj}>w-(In-G`uCe@nI5&&CHKZWi3E+4!LCrHXBMWj5T(akK6jI1Dm$+4$Vi
z#TVACmhAs=<-7AW?|)FgXV<O{)cCLvl-LMrbGA9#8IL==S+M;AT0W`PmqIqzeJLCO
zhhR+!K#h(>n~UU}%}fkc=~2<vXg3|cv9L|A4cXS~zZ7jXFr!}+Z71w8BLsl%Zv9Bu
zwg+~f&4g{)OqlvPQ+e0qP3AJP{Ik?C2y65!I_1>2Pf;{<P7GD+o2Dqy;sn{Jrl^lB
z+iG(cSN*vwW1|}qW6#wkaP~@9+1#}qffQJMGPkrklUwFa(qQ#^?)K`hb1S!NbN*ZJ
zr>c|x(9eI5s_J|KpaiJ@;FR4`eo6w)!vO%2sRN+<94sFVRNd&D2f%+CJeGP2*plN{
zy8Yh=Z@FFo;MNZT{p>VV;W+h`a_=~R$)-#CrfI64ngLpx)%&I?m%R%hUsDQ9bm^z3
zsU5{#04xVnd-c?F6gs;+P2pCATXA-Inl853tBvdc(EMKZ_Br}Hr5o*!Avqp#s7efN
z#}9@Z;TfzzT&>-n{maj22UYfnf8XIA#Kjdwh-i$&JKWh#uhK2{N;m2qI5g}Z*wuqf
zAB|N+K`(uku5{#4GeYTouhXck(3g0mMUm<E_1EdA4}3<yHizkwA8!mG24ks;!14^o
z##m0$Mo}cHq2ArdjsHDIdWVtTIY);*|8>;t^K-O9{SV{re}h(F_KijQ6^Nuemx_1l
zzO#>){?eI?v_-%4TiWgJbR1VXRbr%wq%mxSXH<@j;I|?9NSmi*!!KFmXQQ%vFVSZ#
z`oG?$JM=f-rajro-_hG41lh}%=|PLdR-~hMXum%54(&`bvJ{CY7%>ue?21N0k{DM6
z9-WZ1a0hZaU8=&lIHE_)al|l&jfou-kDD)SU&CEDrGfYa<2ab?3rVuj;c~g+s^U_^
ztRlGhDCZJ6j=>i@={nfse|Ue7b7x#s@jxNM24#Vf*@zftMM3p(EnIVRODL*n3L9yO
z29I*agTYGd9aDsml<*m$^S!93up^8b4;fk`{945P%=}t^Bra<bt0cMbnhvMpn4mEE
z&kZhJY{>6b9aS}9g!!5*1+fZ0g@nNIl9Hd-Woiw_XdD-Ogp=dEz%!ztX+lC<3$BHY
zT2y_ps468$Xfc^ckTJ}sAqzoVc#|%36L^wOt>%P<L`CE~vc}0#j+F%;KFqawvAe4A
z9G-_4@QJJ$j9fBFkz3QK0=_yzOg2`{OKSck)M!Wx$x-&sofsEaGj?)9P-2`}K`n*e
z);6<`L*$dk#o|6*2(dWbTUv=yjz&a=kH&*?QeZr--nAugVSzg)MMR$>$O5Yh*pfev
z&rW8PE4GBAs`eiZ=8gWm$uXQi1T+%m1-S#^qu4AZ9Jmxwp_t8={O7(rddXF~YCXbL
zh|FSdLaokLx6{LzwV}6Mr5p9YRl2?zAtcD5t!=Rw=P-Fo>l11$e%E!K{KL!He*^Sp
BHl6?g

diff --git a/roms/SLOF b/roms/SLOF
index d1d6b53..b94bde0 160000
--- a/roms/SLOF
+++ b/roms/SLOF
@@ -1 +1 @@
-Subproject commit d1d6b53b713a2b7c2c25685268fa932d28a4b4c0
+Subproject commit b94bde008b0d49ec4bfe933e110d0952d032ac28
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
                   ` (3 preceding siblings ...)
  2011-09-30  7:39 ` [Qemu-devel] [PATCH 4/4] pseries: Update SLOF firmware image David Gibson
@ 2011-10-07  6:57 ` Alexander Graf
  2011-10-10 23:39   ` David Gibson
  2011-10-07  7:06 ` Alexander Graf
  5 siblings, 1 reply; 10+ messages in thread
From: Alexander Graf @ 2011-10-07  6:57 UTC (permalink / raw)
  To: David Gibson; +Cc: QEMU Developers, Avi Kivity


On 30.09.2011, at 09:39, David Gibson wrote:

> Alex Graf has added support for KVM acceleration of the pseries
> machine, using his Book3S-PR KVM variant, which runs the guest in
> userspace, emulating supervisor operations.  Recent kernels now have
> the Book3S-HV KVM variant which uses the hardware hypervisor features
> of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
> working roughly with Book3S-HV, but taking full advantage of this mode
> needs more work.  This patch series makes a start on better exploiting
> Book3S-HV.
> 
> Even with these patches, qemu won't quite be able to run on a current
> Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
> memory to be backed by hugepages, but qemu refuses to use hugepages
> for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
> does not currently do.  We're working on improvements to the KVM code
> which will implement CAP_SYNC_MMU and allow smallpage backing of
> guests, but they're not there yet.  So, in order to test Book3S-HV for
> now you need to either:
> 
> * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
>   it doesn't really implement it.
> 
> or
> 
> * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
>   option is used.
> 
> Bot approaches are ugly and unsafe, but it seems we can generally get
> away with it in practice.  Obviously this is only an interim hack
> until the proper CAP_SYNC_MMU support is ready.

I would prefer the latter. We could even #ifdef it for TARGET_PPC.


Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
  2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
                   ` (4 preceding siblings ...)
  2011-10-07  6:57 ` [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) Alexander Graf
@ 2011-10-07  7:06 ` Alexander Graf
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Graf @ 2011-10-07  7:06 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel


On 30.09.2011, at 09:39, David Gibson wrote:

> Alex Graf has added support for KVM acceleration of the pseries
> machine, using his Book3S-PR KVM variant, which runs the guest in
> userspace, emulating supervisor operations.  Recent kernels now have
> the Book3S-HV KVM variant which uses the hardware hypervisor features
> of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
> working roughly with Book3S-HV, but taking full advantage of this mode
> needs more work.  This patch series makes a start on better exploiting
> Book3S-HV.
> 
> Even with these patches, qemu won't quite be able to run on a current
> Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
> memory to be backed by hugepages, but qemu refuses to use hugepages
> for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
> does not currently do.  We're working on improvements to the KVM code
> which will implement CAP_SYNC_MMU and allow smallpage backing of
> guests, but they're not there yet.  So, in order to test Book3S-HV for
> now you need to either:
> 
> * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
>   it doesn't really implement it.
> 
> or
> 
> * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
>   option is used.
> 
> Bot approaches are ugly and unsafe, but it seems we can generally get
> away with it in practice.  Obviously this is only an interim hack
> until the proper CAP_SYNC_MMU support is ready.

Thanks, applied all to my local ppc-next tree. Will push to repo.or.cz when Blue pulls the current ppc-next tree from there.


Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
  2011-10-07  6:57 ` [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) Alexander Graf
@ 2011-10-10 23:39   ` David Gibson
  2011-10-11  0:20     ` Alexander Graf
  0 siblings, 1 reply; 10+ messages in thread
From: David Gibson @ 2011-10-10 23:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: QEMU Developers, Avi Kivity

On Fri, Oct 07, 2011 at 08:57:49AM +0200, Alexander Graf wrote:
> 
> On 30.09.2011, at 09:39, David Gibson wrote:
> 
> > Alex Graf has added support for KVM acceleration of the pseries
> > machine, using his Book3S-PR KVM variant, which runs the guest in
> > userspace, emulating supervisor operations.  Recent kernels now have
> > the Book3S-HV KVM variant which uses the hardware hypervisor features
> > of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
> > working roughly with Book3S-HV, but taking full advantage of this mode
> > needs more work.  This patch series makes a start on better exploiting
> > Book3S-HV.
> > 
> > Even with these patches, qemu won't quite be able to run on a current
> > Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
> > memory to be backed by hugepages, but qemu refuses to use hugepages
> > for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
> > does not currently do.  We're working on improvements to the KVM code
> > which will implement CAP_SYNC_MMU and allow smallpage backing of
> > guests, but they're not there yet.  So, in order to test Book3S-HV for
> > now you need to either:
> > 
> > * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
> >   it doesn't really implement it.
> > 
> > or
> > 
> > * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
> >   option is used.
> > 
> > Bot approaches are ugly and unsafe, but it seems we can generally get
> > away with it in practice.  Obviously this is only an interim hack
> > until the proper CAP_SYNC_MMU support is ready.
> 
> I would prefer the latter. We could even #ifdef it for TARGET_PPC.

Well, I don't see either approach as being remotely mergable.  So it's
really up to each individual person playing with it which hack is
easier for them to apply temporarily while waiting for the proper
solution to come along.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
  2011-10-10 23:39   ` David Gibson
@ 2011-10-11  0:20     ` Alexander Graf
  2011-10-11  0:39       ` David Gibson
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Graf @ 2011-10-11  0:20 UTC (permalink / raw)
  To: David Gibson; +Cc: QEMU Developers, Avi Kivity


On 11.10.2011, at 01:39, David Gibson wrote:

> On Fri, Oct 07, 2011 at 08:57:49AM +0200, Alexander Graf wrote:
>> 
>> On 30.09.2011, at 09:39, David Gibson wrote:
>> 
>>> Alex Graf has added support for KVM acceleration of the pseries
>>> machine, using his Book3S-PR KVM variant, which runs the guest in
>>> userspace, emulating supervisor operations.  Recent kernels now have
>>> the Book3S-HV KVM variant which uses the hardware hypervisor features
>>> of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
>>> working roughly with Book3S-HV, but taking full advantage of this mode
>>> needs more work.  This patch series makes a start on better exploiting
>>> Book3S-HV.
>>> 
>>> Even with these patches, qemu won't quite be able to run on a current
>>> Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
>>> memory to be backed by hugepages, but qemu refuses to use hugepages
>>> for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
>>> does not currently do.  We're working on improvements to the KVM code
>>> which will implement CAP_SYNC_MMU and allow smallpage backing of
>>> guests, but they're not there yet.  So, in order to test Book3S-HV for
>>> now you need to either:
>>> 
>>> * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
>>>  it doesn't really implement it.
>>> 
>>> or
>>> 
>>> * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
>>>  option is used.
>>> 
>>> Bot approaches are ugly and unsafe, but it seems we can generally get
>>> away with it in practice.  Obviously this is only an interim hack
>>> until the proper CAP_SYNC_MMU support is ready.
>> 
>> I would prefer the latter. We could even #ifdef it for TARGET_PPC.
> 
> Well, I don't see either approach as being remotely mergable.  So it's
> really up to each individual person playing with it which hack is
> easier for them to apply temporarily while waiting for the proper
> solution to come along.

Not sure. Why not make it a warning instead of failure on PPC and give people at least the chance to play with it?


Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
  2011-10-11  0:20     ` Alexander Graf
@ 2011-10-11  0:39       ` David Gibson
  0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2011-10-11  0:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: QEMU Developers, Avi Kivity

On Tue, Oct 11, 2011 at 02:20:48AM +0200, Alexander Graf wrote:
> 
> On 11.10.2011, at 01:39, David Gibson wrote:
> 
> > On Fri, Oct 07, 2011 at 08:57:49AM +0200, Alexander Graf wrote:
> >> 
> >> On 30.09.2011, at 09:39, David Gibson wrote:
> >> 
> >>> Alex Graf has added support for KVM acceleration of the pseries
> >>> machine, using his Book3S-PR KVM variant, which runs the guest in
> >>> userspace, emulating supervisor operations.  Recent kernels now have
> >>> the Book3S-HV KVM variant which uses the hardware hypervisor features
> >>> of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
> >>> working roughly with Book3S-HV, but taking full advantage of this mode
> >>> needs more work.  This patch series makes a start on better exploiting
> >>> Book3S-HV.
> >>> 
> >>> Even with these patches, qemu won't quite be able to run on a current
> >>> Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
> >>> memory to be backed by hugepages, but qemu refuses to use hugepages
> >>> for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
> >>> does not currently do.  We're working on improvements to the KVM code
> >>> which will implement CAP_SYNC_MMU and allow smallpage backing of
> >>> guests, but they're not there yet.  So, in order to test Book3S-HV for
> >>> now you need to either:
> >>> 
> >>> * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
> >>>  it doesn't really implement it.
> >>> 
> >>> or
> >>> 
> >>> * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
> >>>  option is used.
> >>> 
> >>> Bot approaches are ugly and unsafe, but it seems we can generally get
> >>> away with it in practice.  Obviously this is only an interim hack
> >>> until the proper CAP_SYNC_MMU support is ready.
> >> 
> >> I would prefer the latter. We could even #ifdef it for TARGET_PPC.
> > 
> > Well, I don't see either approach as being remotely mergable.  So it's
> > really up to each individual person playing with it which hack is
> > easier for them to apply temporarily while waiting for the proper
> > solution to come along.
> 
> Not sure. Why not make it a warning instead of failure on PPC and
> give people at least the chance to play with it?

Because it can trigger a serious kernel bug...

As far as I can tell, the fact that we haven't hit the crash/race on
PPC so far is pure luck, not that we're actually less vulnerable to
it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-10-11  0:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-30  7:39 [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) David Gibson
2011-09-30  7:39 ` [Qemu-devel] [PATCH 1/4] pseries: Support SMT systems for KVM Book3S-HV David Gibson
2011-09-30  7:39 ` [Qemu-devel] [PATCH 2/4] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
2011-09-30  7:39 ` [Qemu-devel] [PATCH 3/4] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
2011-09-30  7:39 ` [Qemu-devel] [PATCH 4/4] pseries: Update SLOF firmware image David Gibson
2011-10-07  6:57 ` [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2) Alexander Graf
2011-10-10 23:39   ` David Gibson
2011-10-11  0:20     ` Alexander Graf
2011-10-11  0:39       ` David Gibson
2011-10-07  7:06 ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).