[Qemu-devel] [0/7] pseries: Patches to fix system reset

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [0/7] pseries: Patches to fix system reset
@ 2012-08-15  4:33 David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 1/7] Allow QEMUMachine to override reset sequencing David Gibson
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus

Hi Alex,

Here is a string of patches which fix most of the many problems with
system reset on the pseriss machine.  They apply on top of my other
string of pseries patches which you already merged.  They apply before
Li Zhang's usb and vga patches, since it looks like those will go
another iteration, I can easily rebase after those if that would be
more convenient.

1/7 is a generic patch which I have already sent to Anthony, but it
hasn't gone into mainline yet, the rest of the series is dependent on
it, though, so it's included here.  It's also dependent on newer
kernel headers than are in mainline, but which I think you already
have in your tree

This does quite a pit of rework to the pseries reset sequence, with
some influence on the ppc target at large.  It fixes both general and
kvm specific bugs, although a number of the general bugs were very
difficult to actually trigger without kvm anyway (because full emu SMP
is so achingly slow, and I think has some other bugs I haven't had
time to investigate yet).  There are some known reset problems still
remaining, specifically:
	* We need to reset the VPA registration in KVM as well.  I
have a tentative patch for that, but I'm waiting for Paul to send the
necessary KVM bits upstream.
	* We should reset the TCE table on the emulated PCI host
bridge as well as VIO devices.  I just haven't yet had a chance to
figure out the right place to wire in that reset yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/7] Allow QEMUMachine to override reset sequencing
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

qemu_system_reset() function always performs the same basic actions on
all machines.  This includes running all the reset handler hooks,
however the order in which these will run is not always easily predictable.

This patch splits the core of qemu_system_reset() - the invocation of
the reset handlers - out into a new qemu_devices_reset() function.
qemu_system_reset() will usually call qemu_devices_reset(), but that
can be now overriden by a new reset method in the QEMUMachine
structure.

Individual machines can use this reset method, if necessary, to
perform any extra, machine specific initializations which have to
occur before or after the bulk of the reset handlers.  It's expected
that the method will call qemu_devices_reset() at some point, but if
the machine has really strange ordering requirements between devices
resets it could even override that with it's own reset sequence (with
great care, obviously).

For a specific example of when this might be needed: a number of
machines (but not PC) load images specified with -kernel or -initrd
directly into the machine RAM before booting the guest.  This mostly
works at the moment, but to make this actually safe requires that this
load occurs after peripheral devices are reset - otherwise they could
have active DMAs in progress which would clobber the in memory images.
Some machines (notably pseries) also have other entry conditions which
need to be set up as the last thing before executing in guest space -
some of this could be considered "emulated firmware" in the sense that
the actions of the firmware are emulated directly by qemu rather than
by executing a firmware image within the guest.  When the platform's
firmware to OS interface is sufficiently well specified, this saves
time both in implementing the "firmware" and executing it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/boards.h |    3 +++
 sysemu.h    |    1 +
 vl.c        |   11 ++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/boards.h b/hw/boards.h
index 59c01d0..a2e0a54 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -12,11 +12,14 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size,
                                  const char *initrd_filename,
                                  const char *cpu_model);

+typedef void QEMUMachineResetFunc(void);
+
 typedef struct QEMUMachine {
     const char *name;
     const char *alias;
     const char *desc;
     QEMUMachineInitFunc *init;
+    QEMUMachineResetFunc *reset;
     int use_scsi;
     int max_cpus;
     unsigned int no_serial:1,
diff --git a/sysemu.h b/sysemu.h
index 4669348..65552ac 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -62,6 +62,7 @@ int qemu_powerdown_requested(void);
 void qemu_system_killed(int signal, pid_t pid);
 void qemu_kill_report(void);
 extern qemu_irq qemu_system_powerdown;
+void qemu_devices_reset(void);
 void qemu_system_reset(bool report);

 void qemu_add_exit_notifier(Notifier *notify);
diff --git a/vl.c b/vl.c
index d01256a..757d84a 100644
--- a/vl.c
+++ b/vl.c
@@ -1439,7 +1439,7 @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque)
     }
 }

-void qemu_system_reset(bool report)
+void qemu_devices_reset(void)
 {
     QEMUResetEntry *re, *nre;

@@ -1447,6 +1447,15 @@ void qemu_system_reset(bool report)
     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
         re->func(re->opaque);
     }
+}
+
+void qemu_system_reset(bool report)
+{
+    if (current_machine->reset) {
+        current_machine->reset();
+    } else {
+        qemu_devices_reset();
+    }
     if (report) {
         monitor_protocol_event(QEVENT_RESET, NULL);
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 1/7] Allow QEMUMachine to override reset sequencing David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-17 13:58   ` Alexander Graf
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 3/7] pseries: Fix and cleanup CPU initialization and reset David Gibson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

At least when invoked with high enough 'level' arguments,
kvm_arch_put_registers() is supposed to copy essentially all the cpu state
as encoded in qemu's internal structures into the kvm state.  Currently
the ppc version does not do this - it never calls KVM_SET_SREGS, for
example, and therefore never sets the SDR1 and various other important
though rarely changed registers.

Instead, the code paths which need to set these registers need to
explicitly make (conditional) kvm calls which transfer the changes to kvm.
This breaks the usual model of handling state updates in qemu, where code
just changes the internal model and has it flushed out to kvm automatically
at some later point.

This patch fixes this for Book S ppc CPUs by adding a suitable call to
KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
that is set with that call so far).  This lets us remove the hacks to
explicitly set these registers from the kvmppc_set_papr() function.

The problem still exists for Book E CPUs (which use a different version of
the kvm_sregs structure).  But fixing that has some complications of its
own so can be left to another day.

Lkewise, there is still some ugly code for setting the PVR through special
calls to SET_SREGS which is left in for now.  The PVR needs to be set
especially early because it can affect what other features are available
on the CPU, so I need to do more thinking to see if it can be integrated
into the normal paths or not.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 target-ppc/kvm.c |   89 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 50 insertions(+), 39 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index a31d278..1a7489b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -60,6 +60,7 @@ static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
 static int cap_spapr_tce;
+static int cap_hior;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -86,6 +87,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+    cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -469,6 +471,53 @@ int kvm_arch_put_registers(CPUPPCState *env, int level)
         env->tlb_dirty = false;
     }
 
+    if (cap_segstate && (level >= KVM_PUT_RESET_STATE)) {
+        struct kvm_sregs sregs;
+
+        sregs.pvr = env->spr[SPR_PVR];
+
+        sregs.u.s.sdr1 = env->spr[SPR_SDR1];
+
+        /* Sync SLB */
+#ifdef TARGET_PPC64
+        for (i = 0; i < 64; i++) {
+            sregs.u.s.ppc64.slb[i].slbe = env->slb[i].esid;
+            sregs.u.s.ppc64.slb[i].slbv = env->slb[i].vsid;
+        }
+#endif
+
+        /* Sync SRs */
+        for (i = 0; i < 16; i++) {
+            sregs.u.s.ppc32.sr[i] = env->sr[i];
+        }
+
+        /* Sync BATs */
+        for (i = 0; i < 8; i++) {
+            sregs.u.s.ppc32.dbat[i] = ((uint64_t)env->DBAT[1][i] << 32)
+                | env->DBAT[0][i];
+            sregs.u.s.ppc32.ibat[i] = ((uint64_t)env->IBAT[1][i] << 32)
+                | env->IBAT[0][i];
+        }
+
+        ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    if (cap_hior && (level >= KVM_PUT_RESET_STATE)) {
+        uint64_t hior = env->spr[SPR_HIOR];
+        struct kvm_one_reg reg = {
+            .id = KVM_REG_PPC_HIOR,
+            .addr = (uintptr_t) &hior,
+        };
+
+        ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, &reg);
+        if (ret) {
+            return ret;
+        }
+    }
+
     return ret;
 }
 
@@ -946,52 +995,14 @@ int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len)
 void kvmppc_set_papr(CPUPPCState *env)
 {
     struct kvm_enable_cap cap = {};
-    struct kvm_one_reg reg = {};
-    struct kvm_sregs sregs = {};
     int ret;
-    uint64_t hior = env->spr[SPR_HIOR];
 
     cap.cap = KVM_CAP_PPC_PAPR;
     ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, &cap);
 
     if (ret) {
-        goto fail;
-    }
-
-    /*
-     * XXX We set HIOR here. It really should be a qdev property of
-     *     the CPU node, but we don't have CPUs converted to qdev yet.
-     *
-     *     Once we have qdev CPUs, move HIOR to a qdev property and
-     *     remove this chunk.
-     */
-    reg.id = KVM_REG_PPC_HIOR;
-    reg.addr = (uintptr_t)&hior;
-    ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, &reg);
-    if (ret) {
-        fprintf(stderr, "Couldn't set HIOR. Maybe you're running an old \n"
-                        "kernel with support for HV KVM but no PAPR PR \n"
-                        "KVM in which case things will work. If they don't \n"
-                        "please update your host kernel!\n");
-    }
-
-    /* Set SDR1 so kernel space finds the HTAB */
-    ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
-    if (ret) {
-        goto fail;
-    }
-
-    sregs.u.s.sdr1 = env->spr[SPR_SDR1];
-
-    ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
-    if (ret) {
-        goto fail;
+        cpu_abort(env, "This KVM version does not support PAPR\n");
     }
-
-    return;
-
-fail:
-    cpu_abort(env, "This KVM version does not support PAPR\n");
 }
 
 int kvmppc_smt_threads(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson
@ 2012-08-17 13:58   ` Alexander Graf
  2012-08-17 16:24     ` [Qemu-devel] [Qemu-ppc] " David Gibson
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Graf @ 2012-08-17 13:58 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, paulus

On 08/15/2012 06:33 AM, David Gibson wrote:
> At least when invoked with high enough 'level' arguments,
> kvm_arch_put_registers() is supposed to copy essentially all the cpu state
> as encoded in qemu's internal structures into the kvm state.  Currently
> the ppc version does not do this - it never calls KVM_SET_SREGS, for
> example, and therefore never sets the SDR1 and various other important
> though rarely changed registers.
>
> Instead, the code paths which need to set these registers need to
> explicitly make (conditional) kvm calls which transfer the changes to kvm.
> This breaks the usual model of handling state updates in qemu, where code
> just changes the internal model and has it flushed out to kvm automatically
> at some later point.
>
> This patch fixes this for Book S ppc CPUs by adding a suitable call to
> KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
> that is set with that call so far).  This lets us remove the hacks to
> explicitly set these registers from the kvmppc_set_papr() function.

HIOR is a read-only register from the guest's point of view when running 
in PAPR mode, so we don't need to sync it back again. The same goes for 
SDR1, though resetting that is valid for non-PAPR guests.

Overall, does a normal system reset on PPC guarantee that the SRs and 
SLBs are reset? At least OpenBIOS boots up in real mode and overwrites 
all SR/SLB entries while still in real mode.


Alex

>
> The problem still exists for Book E CPUs (which use a different version of
> the kvm_sregs structure).  But fixing that has some complications of its
> own so can be left to another day.
>
> Lkewise, there is still some ugly code for setting the PVR through special
> calls to SET_SREGS which is left in for now.  The PVR needs to be set
> especially early because it can affect what other features are available
> on the CPU, so I need to do more thinking to see if it can be integrated
> into the normal paths or not.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   target-ppc/kvm.c |   89 ++++++++++++++++++++++++++++++------------------------
>   1 file changed, 50 insertions(+), 39 deletions(-)
>
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index a31d278..1a7489b 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -60,6 +60,7 @@ static int cap_booke_sregs;
>   static int cap_ppc_smt;
>   static int cap_ppc_rma;
>   static int cap_spapr_tce;
> +static int cap_hior;
>   
>   /* XXX We have a race condition where we actually have a level triggered
>    *     interrupt, but the infrastructure can't expose that yet, so the guest
> @@ -86,6 +87,7 @@ int kvm_arch_init(KVMState *s)
>       cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
>       cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
>       cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> +    cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>   
>       if (!cap_interrupt_level) {
>           fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
> @@ -469,6 +471,53 @@ int kvm_arch_put_registers(CPUPPCState *env, int level)
>           env->tlb_dirty = false;
>       }
>   
> +    if (cap_segstate && (level >= KVM_PUT_RESET_STATE)) {
> +        struct kvm_sregs sregs;
> +
> +        sregs.pvr = env->spr[SPR_PVR];
> +
> +        sregs.u.s.sdr1 = env->spr[SPR_SDR1];
> +
> +        /* Sync SLB */
> +#ifdef TARGET_PPC64
> +        for (i = 0; i < 64; i++) {
> +            sregs.u.s.ppc64.slb[i].slbe = env->slb[i].esid;
> +            sregs.u.s.ppc64.slb[i].slbv = env->slb[i].vsid;
> +        }
> +#endif
> +
> +        /* Sync SRs */
> +        for (i = 0; i < 16; i++) {
> +            sregs.u.s.ppc32.sr[i] = env->sr[i];
> +        }
> +
> +        /* Sync BATs */
> +        for (i = 0; i < 8; i++) {
> +            sregs.u.s.ppc32.dbat[i] = ((uint64_t)env->DBAT[1][i] << 32)
> +                | env->DBAT[0][i];
> +            sregs.u.s.ppc32.ibat[i] = ((uint64_t)env->IBAT[1][i] << 32)
> +                | env->IBAT[0][i];
> +        }
> +
> +        ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
> +    if (cap_hior && (level >= KVM_PUT_RESET_STATE)) {
> +        uint64_t hior = env->spr[SPR_HIOR];
> +        struct kvm_one_reg reg = {
> +            .id = KVM_REG_PPC_HIOR,
> +            .addr = (uintptr_t) &hior,
> +        };
> +
> +        ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, &reg);
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
>       return ret;
>   }
>   
> @@ -946,52 +995,14 @@ int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len)
>   void kvmppc_set_papr(CPUPPCState *env)
>   {
>       struct kvm_enable_cap cap = {};
> -    struct kvm_one_reg reg = {};
> -    struct kvm_sregs sregs = {};
>       int ret;
> -    uint64_t hior = env->spr[SPR_HIOR];
>   
>       cap.cap = KVM_CAP_PPC_PAPR;
>       ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, &cap);
>   
>       if (ret) {
> -        goto fail;
> -    }
> -
> -    /*
> -     * XXX We set HIOR here. It really should be a qdev property of
> -     *     the CPU node, but we don't have CPUs converted to qdev yet.
> -     *
> -     *     Once we have qdev CPUs, move HIOR to a qdev property and
> -     *     remove this chunk.
> -     */
> -    reg.id = KVM_REG_PPC_HIOR;
> -    reg.addr = (uintptr_t)&hior;
> -    ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, &reg);
> -    if (ret) {
> -        fprintf(stderr, "Couldn't set HIOR. Maybe you're running an old \n"
> -                        "kernel with support for HV KVM but no PAPR PR \n"
> -                        "KVM in which case things will work. If they don't \n"
> -                        "please update your host kernel!\n");
> -    }
> -
> -    /* Set SDR1 so kernel space finds the HTAB */
> -    ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
> -    if (ret) {
> -        goto fail;
> -    }
> -
> -    sregs.u.s.sdr1 = env->spr[SPR_SDR1];
> -
> -    ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
> -    if (ret) {
> -        goto fail;
> +        cpu_abort(env, "This KVM version does not support PAPR\n");
>       }
> -
> -    return;
> -
> -fail:
> -    cpu_abort(env, "This KVM version does not support PAPR\n");
>   }
>   
>   int kvmppc_smt_threads(void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers
  2012-08-17 13:58   ` Alexander Graf
@ 2012-08-17 16:24     ` David Gibson
  0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-17 16:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-ppc, qemu-devel

On Fri, Aug 17, 2012 at 03:58:08PM +0200, Alexander Graf wrote:
> On 08/15/2012 06:33 AM, David Gibson wrote:
> >At least when invoked with high enough 'level' arguments,
> >kvm_arch_put_registers() is supposed to copy essentially all the cpu state
> >as encoded in qemu's internal structures into the kvm state.  Currently
> >the ppc version does not do this - it never calls KVM_SET_SREGS, for
> >example, and therefore never sets the SDR1 and various other important
> >though rarely changed registers.
> >
> >Instead, the code paths which need to set these registers need to
> >explicitly make (conditional) kvm calls which transfer the changes to kvm.
> >This breaks the usual model of handling state updates in qemu, where code
> >just changes the internal model and has it flushed out to kvm automatically
> >at some later point.
> >
> >This patch fixes this for Book S ppc CPUs by adding a suitable call to
> >KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
> >that is set with that call so far).  This lets us remove the hacks to
> >explicitly set these registers from the kvmppc_set_papr() function.
> 
> HIOR is a read-only register from the guest's point of view when
> running in PAPR mode, so we don't need to sync it back again. The
> same goes for SDR1, though resetting that is valid for non-PAPR
> guests.

*When running in PAPR mode*, which we aren't always.  System resets
are so rare that there's really no point optimizing register sets out
of it, so we might as well set HIOR and SDR1 on every reset, then it's
correct both for PAPR and non-PAPR mode.

> Overall, does a normal system reset on PPC guarantee that the SRs
> and SLBs are reset? At least OpenBIOS boots up in real mode and
> overwrites all SR/SLB entries while still in real mode.

I don't know, but it's not really relevant here.  What this does is
make sure that KVM state is synced with qemu state on reset.  That
means that whatever reset handlers do to the qemu state - and that's
what they usually operate on - will get reflected in KVM state when
the guest executes again.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 3/7] pseries: Fix and cleanup CPU initialization and reset
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 1/7] Allow QEMUMachine to override reset sequencing David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 4/7] pseries: Use new method to correct reset sequence David Gibson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

The current pseries machine init function iterates over the CPUs at several
points, doing various bits of initialization.  This is messy; these can
and should be merged into a single iteration doing all the necessary per
cpu initialization.  Worse, some of these initializations were setting up
state which should be set on every reset, not just at machine init time.
A few of the initializations simply weren't necessary at all.

This patch, therefore, moves those things that need to be to the
per-cpu reset handler, and combines the remainder into two loops over
the cpus (which also creates them).  The second loop is for setting up
hash table information, and will be removed in a subsequent patch also
making other fixes to the hash table setup.

This exposes a bug in our start-cpu RTAS routine (called by the guest to
start up CPUs other than CPU0) under kvm.  Previously, this function did
not make a call to ensure that it's changes to the new cpu's state were
pushed into KVM in-kernel state.  We sort-of got away with this because
some of the initializations had already placed the secondary CPUs into the
right starting state for the sorts of Linux guests we've been running.

Nonetheless the start-cpu RTAS call's behaviour was not correct and could
easily have been broken by guest changes.  This patch also fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c      |   33 +++++++++++++++++++--------------
 hw/spapr_rtas.c |    5 +++++
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 9efed0e..603674d 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -574,8 +574,15 @@ static void spapr_reset(void *opaque)
 static void spapr_cpu_reset(void *opaque)
 {
     PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
 
     cpu_reset(CPU(cpu));
+
+    /* All CPUs start halted, except CPU0, the rest are explicitly
+     * started up by the guest using an RTAS call */
+    env->halted = 1;
+
+    env->spr[SPR_HIOR] = 0;
 }
 
 /* pSeries LPAR / sPAPR hardware init */
@@ -640,11 +647,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
         /* Set time-base frequency to 512 MHz */
         cpu_ppc_tb_init(env, TIMEBASE_FREQ);
-        qemu_register_reset(spapr_cpu_reset, cpu);
 
-        env->hreset_vector = 0x60;
+        /* PAPR always has exception vectors in RAM not ROM */
         env->hreset_excp_prefix = 0;
-        env->gpr[3] = env->cpu_index;
+
+        /* Tell KVM that we're in PAPR mode */
+        if (kvm_enabled()) {
+            kvmppc_set_papr(env);
+        }
+
+        qemu_register_reset(spapr_cpu_reset, cpu);
     }
 
     /* allocate RAM */
@@ -660,7 +672,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
     /* allocate hash page table.  For now we always make this 16mb,
      * later we should probably make it scale to the size of guest
-     * RAM */
+     * RAM.  FIXME: setting the htab information in the CPU env really
+     * belongs at CPU reset time, but we can get away with it for now
+     * because the PAPR guest is not permitted to write SDR1 so in
+     * fact these settings will never change during the run */
     spapr->htab_size = 1ULL << (pteg_shift + 7);
     spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size);
 
@@ -672,11 +687,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         /* Tell KVM that we're in PAPR mode */
         env->spr[SPR_SDR1] = (unsigned long)spapr->htab |
                              ((pteg_shift + 7) - 18);
-        env->spr[SPR_HIOR] = 0;
-
-        if (kvm_enabled()) {
-            kvmppc_set_papr(env);
-        }
     }
 
     filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
@@ -788,11 +798,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
     spapr->entry_point = 0x100;
 
-    /* SLOF will startup the secondary CPUs using RTAS */
-    for (env = first_cpu; env != NULL; env = env->next_cpu) {
-        env->halted = 1;
-    }
-
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
                                             initrd_base, initrd_size,
diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
index ae18595..b808f80 100644
--- a/hw/spapr_rtas.c
+++ b/hw/spapr_rtas.c
@@ -184,6 +184,11 @@ static void rtas_start_cpu(sPAPREnvironment *spapr,
             return;
         }
 
+        /* This will make sure qemu state is up to date with kvm, and
+         * mark it dirty so our changes get flushed back before the
+         * new cpu enters */
+        kvm_cpu_synchronize_state(env);
+
         env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
         env->nip = start;
         env->gpr[3] = r3;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 4/7] pseries: Use new method to correct reset sequence
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
                   ` (2 preceding siblings ...)
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 3/7] pseries: Fix and cleanup CPU initialization and reset David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 5/7] pseries: Add support for new KVM hash table control call David Gibson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

A number of things need to occur during reset of the PAPR
paravirtualized platform in a specific order.  For example, the hash
table needs to be cleared before the CPUs are reset, so that they
initialize their register state correctly, and the CPUs need to have
their main reset called before we set up the entry point state on the
boot cpu.  We also need to have the main qdev reset happen before the
creation and installation of the device tree for the new boot, because
we need the state of the devices settled to correctly construct the
device tree.

We currently do the pseries once-per-reset initializations done from a
reset handler.  However we can't adequately control when this handler
is called during the reset - in particular we can't guarantee it
happens after all the qdev resets (since qdevs might be registered
after the machine init function has executed).

This patch uses the new QEMUMachine reset method to to fix this
problem, ensuring the various order dependent reset steps happen in
the correct order.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 603674d..f515e02 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -552,13 +552,13 @@ static void emulate_spapr_hypercall(CPUPPCState *env)
     env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]);
 }

-static void spapr_reset(void *opaque)
+static void ppc_spapr_reset(void)
 {
-    sPAPREnvironment *spapr = (sPAPREnvironment *)opaque;
-
     /* flush out the hash table */
     memset(spapr->htab, 0, spapr->htab_size);

+    qemu_devices_reset();
+
     /* Load the fdt */
     spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
                        spapr->rtas_size);
@@ -805,14 +805,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
                                             boot_device, kernel_cmdline,
                                             pteg_shift + 7);
     assert(spapr->fdt_skel != NULL);
-
-    qemu_register_reset(spapr_reset, spapr);
 }

 static QEMUMachine spapr_machine = {
     .name = "pseries",
     .desc = "pSeries Logical Partition (PAPR compliant)",
     .init = ppc_spapr_init,
+    .reset = ppc_spapr_reset,
     .max_cpus = MAX_CPUS,
     .no_parallel = 1,
     .use_scsi = 1,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 5/7] pseries: Add support for new KVM hash table control call
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
                   ` (3 preceding siblings ...)
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 4/7] pseries: Use new method to correct reset sequence David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 6/7] pseries: Clear TCE state when resetting PAPR VIO devices David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 7/7] ppc/pseries: Reset VPA registration on CPU reset David Gibson
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

From: Ben Herrenschmidt <benh@kernel.crashing.org>

This adds support for then new "reset htab" ioctl which allows qemu
to properly cleanup the MMU hash table when the guest is reset. With
the corresponding kernel support, reset of a guest now works properly.

This also paves the way for indicating a different size hash table
to the kernel and for the kernel to be able to impose limits on
the requested size.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |  274 +++++++++++++++++++++++++++++---------------------
 hw/spapr.h           |    4 +-
 target-ppc/kvm.c     |   29 ++++++
 target-ppc/kvm_ppc.h |   19 ++++
 4 files changed, 213 insertions(+), 113 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index f515e02..2d8303b 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -83,6 +83,8 @@
 
 #define PHANDLE_XICP            0x00001111
 
+#define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
+
 sPAPREnvironment *spapr;
 
 int spapr_allocate_irq(int hint, enum xics_irq_type type)
@@ -132,12 +134,13 @@ int spapr_allocate_irq_block(int num, enum xics_irq_type type)
     return first;
 }
 
-static int spapr_set_associativity(void *fdt, sPAPREnvironment *spapr)
+static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
 {
     int ret = 0, offset;
     CPUPPCState *env;
     char cpu_model[32];
     int smt = kvmppc_smt_threads();
+    uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
 
     assert(spapr->cpu_model);
 
@@ -161,8 +164,16 @@ static int spapr_set_associativity(void *fdt, sPAPREnvironment *spapr)
             return offset;
         }
 
-        ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
-                          sizeof(associativity));
+        if (nb_numa_nodes > 1) {
+            ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+                              sizeof(associativity));
+            if (ret < 0) {
+                return ret;
+            }
+        }
+
+        ret = fdt_setprop(fdt, offset, "ibm,pft-size",
+                          pft_size_prop, sizeof(pft_size_prop));
         if (ret < 0) {
             return ret;
         }
@@ -204,45 +215,36 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
     return (p - prop) * sizeof(uint32_t);
 }
 
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \
+                    #exp, fdt_strerror(ret));                      \
+            exit(1);                                               \
+        }                                                          \
+    } while (0)
+
+
 static void *spapr_create_fdt_skel(const char *cpu_model,
-                                   target_phys_addr_t rma_size,
                                    target_phys_addr_t initrd_base,
                                    target_phys_addr_t initrd_size,
                                    target_phys_addr_t kernel_size,
                                    const char *boot_device,
-                                   const char *kernel_cmdline,
-                                   long hash_shift)
+                                   const char *kernel_cmdline)
 {
     void *fdt;
     CPUPPCState *env;
-    uint64_t mem_reg_property[2];
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
-    uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
     char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
         "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk";
     char qemu_hypertas_prop[] = "hcall-memop1";
+    uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
-    int i;
     char *modelname;
-    int smt = kvmppc_smt_threads();
+    int i, smt = kvmppc_smt_threads();
     unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
-    uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
-    uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0),
-                                cpu_to_be32(0x0), cpu_to_be32(0x0),
-                                cpu_to_be32(0x0)};
-    char mem_name[32];
-    target_phys_addr_t node0_size, mem_start;
-
-#define _FDT(exp) \
-    do { \
-        int ret = (exp);                                           \
-        if (ret < 0) {                                             \
-            fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \
-                    #exp, fdt_strerror(ret));                      \
-            exit(1);                                               \
-        }                                                          \
-    } while (0)
 
     fdt = g_malloc0(FDT_MAX_SIZE);
     _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
@@ -284,55 +286,6 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 
     _FDT((fdt_end_node(fdt)));
 
-    /* memory node(s) */
-    node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size;
-    if (rma_size > node0_size) {
-        rma_size = node0_size;
-    }
-
-    /* RMA */
-    mem_reg_property[0] = 0;
-    mem_reg_property[1] = cpu_to_be64(rma_size);
-    _FDT((fdt_begin_node(fdt, "memory@0")));
-    _FDT((fdt_property_string(fdt, "device_type", "memory")));
-    _FDT((fdt_property(fdt, "reg", mem_reg_property,
-        sizeof(mem_reg_property))));
-    _FDT((fdt_property(fdt, "ibm,associativity", associativity,
-        sizeof(associativity))));
-    _FDT((fdt_end_node(fdt)));
-
-    /* RAM: Node 0 */
-    if (node0_size > rma_size) {
-        mem_reg_property[0] = cpu_to_be64(rma_size);
-        mem_reg_property[1] = cpu_to_be64(node0_size - rma_size);
-
-        sprintf(mem_name, "memory@" TARGET_FMT_lx, rma_size);
-        _FDT((fdt_begin_node(fdt, mem_name)));
-        _FDT((fdt_property_string(fdt, "device_type", "memory")));
-        _FDT((fdt_property(fdt, "reg", mem_reg_property,
-                           sizeof(mem_reg_property))));
-        _FDT((fdt_property(fdt, "ibm,associativity", associativity,
-                           sizeof(associativity))));
-        _FDT((fdt_end_node(fdt)));
-    }
-
-    /* RAM: Node 1 and beyond */
-    mem_start = node0_size;
-    for (i = 1; i < nb_numa_nodes; i++) {
-        mem_reg_property[0] = cpu_to_be64(mem_start);
-        mem_reg_property[1] = cpu_to_be64(node_mem[i]);
-        associativity[3] = associativity[4] = cpu_to_be32(i);
-        sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start);
-        _FDT((fdt_begin_node(fdt, mem_name)));
-        _FDT((fdt_property_string(fdt, "device_type", "memory")));
-        _FDT((fdt_property(fdt, "reg", mem_reg_property,
-            sizeof(mem_reg_property))));
-        _FDT((fdt_property(fdt, "ibm,associativity", associativity,
-            sizeof(associativity))));
-        _FDT((fdt_end_node(fdt)));
-        mem_start += node_mem[i];
-    }
-
     /* cpus */
     _FDT((fdt_begin_node(fdt, "cpus")));
 
@@ -384,8 +337,6 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
         _FDT((fdt_property_cell(fdt, "timebase-frequency", tbfreq)));
         _FDT((fdt_property_cell(fdt, "clock-frequency", cpufreq)));
         _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
-        _FDT((fdt_property(fdt, "ibm,pft-size",
-                           pft_size_prop, sizeof(pft_size_prop))));
         _FDT((fdt_property_string(fdt, "status", "okay")));
         _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
 
@@ -484,6 +435,68 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
     return fdt;
 }
 
+static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
+{
+    uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0),
+                                cpu_to_be32(0x0), cpu_to_be32(0x0),
+                                cpu_to_be32(0x0)};
+    char mem_name[32];
+    target_phys_addr_t node0_size, mem_start;
+    uint64_t mem_reg_property[2];
+    int i, off;
+
+    /* memory node(s) */
+    node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size;
+    if (spapr->rma_size > node0_size) {
+        spapr->rma_size = node0_size;
+    }
+
+    /* RMA */
+    mem_reg_property[0] = 0;
+    mem_reg_property[1] = cpu_to_be64(spapr->rma_size);
+    off = fdt_add_subnode(fdt, 0, "memory@0");
+    _FDT(off);
+    _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
+    _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property,
+                      sizeof(mem_reg_property))));
+    _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
+                      sizeof(associativity))));
+
+    /* RAM: Node 0 */
+    if (node0_size > spapr->rma_size) {
+        mem_reg_property[0] = cpu_to_be64(spapr->rma_size);
+        mem_reg_property[1] = cpu_to_be64(node0_size - spapr->rma_size);
+
+        sprintf(mem_name, "memory@" TARGET_FMT_lx, spapr->rma_size);
+        off = fdt_add_subnode(fdt, 0, mem_name);
+        _FDT(off);
+        _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
+        _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property,
+                          sizeof(mem_reg_property))));
+        _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
+                          sizeof(associativity))));
+    }
+
+    /* RAM: Node 1 and beyond */
+    mem_start = node0_size;
+    for (i = 1; i < nb_numa_nodes; i++) {
+        mem_reg_property[0] = cpu_to_be64(mem_start);
+        mem_reg_property[1] = cpu_to_be64(node_mem[i]);
+        associativity[3] = associativity[4] = cpu_to_be32(i);
+        sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start);
+        off = fdt_add_subnode(fdt, 0, mem_name);
+        _FDT(off);
+        _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
+        _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property,
+                          sizeof(mem_reg_property))));
+        _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
+                          sizeof(associativity))));
+        mem_start += node_mem[i];
+    }
+
+    return 0;
+}
+
 static void spapr_finalize_fdt(sPAPREnvironment *spapr,
                                target_phys_addr_t fdt_addr,
                                target_phys_addr_t rtas_addr,
@@ -498,6 +511,12 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     /* open out the base tree into a temp buffer for the final tweaks */
     _FDT((fdt_open_into(spapr->fdt_skel, fdt, FDT_MAX_SIZE)));
 
+    ret = spapr_populate_memory(spapr, fdt);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't setup memory nodes in fdt\n");
+        exit(1);
+    }
+
     ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
     if (ret < 0) {
         fprintf(stderr, "couldn't setup vio devices in fdt\n");
@@ -520,11 +539,9 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     }
 
     /* Advertise NUMA via ibm,associativity */
-    if (nb_numa_nodes > 1) {
-        ret = spapr_set_associativity(fdt, spapr);
-        if (ret < 0) {
-            fprintf(stderr, "Couldn't set up NUMA device tree properties\n");
-        }
+    ret = spapr_fixup_cpu_dt(fdt, spapr);
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't finalize CPU device tree properties\n");
     }
 
     spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
@@ -552,10 +569,39 @@ static void emulate_spapr_hypercall(CPUPPCState *env)
     env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]);
 }
 
+static void spapr_reset_htab(sPAPREnvironment *spapr)
+{
+    long shift;
+
+    /* allocate hash page table.  For now we always make this 16mb,
+     * later we should probably make it scale to the size of guest
+     * RAM */
+
+    shift = kvmppc_reset_htab(spapr->htab_shift);
+
+    if (shift > 0) {
+        /* Kernel handles htab, we don't need to allocate one */
+        spapr->htab_shift = shift;
+    } else {
+        if (!spapr->htab) {
+            /* Allocate an htab if we don't yet have one */
+            spapr->htab = qemu_memalign(HTAB_SIZE(spapr), HTAB_SIZE(spapr));
+        }
+
+        /* And clear it */
+        memset(spapr->htab, 0, HTAB_SIZE(spapr));
+    }
+
+    /* Update the RMA size if necessary */
+    if (spapr->vrma_adjust) {
+        spapr->rma_size = kvmppc_rma_size(ram_size, spapr->htab_shift);
+    }
+}
+
 static void ppc_spapr_reset(void)
 {
-    /* flush out the hash table */
-    memset(spapr->htab, 0, spapr->htab_size);
+    /* Reset the hash table & recalc the RMA */
+    spapr_reset_htab(spapr);
 
     qemu_devices_reset();
 
@@ -583,6 +629,12 @@ static void spapr_cpu_reset(void *opaque)
     env->halted = 1;
 
     env->spr[SPR_HIOR] = 0;
+
+    env->external_htab = spapr->htab;
+    env->htab_base = -1;
+    env->htab_mask = HTAB_SIZE(spapr) - 1;
+    env->spr[SPR_SDR1] = (unsigned long)spapr->htab |
+        (spapr->htab_shift - 18);
 }
 
 /* pSeries LPAR / sPAPR hardware init */
@@ -598,11 +650,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     int i;
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *ram = g_new(MemoryRegion, 1);
-    target_phys_addr_t rma_alloc_size, rma_size;
+    target_phys_addr_t rma_alloc_size;
     uint32_t initrd_base = 0;
     long kernel_size = 0, initrd_size = 0;
     long load_limit, rtas_limit, fw_size;
-    long pteg_shift = 17;
     char *filename;
 
     msi_supported = true;
@@ -619,20 +670,39 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         hw_error("qemu: Unable to create RMA\n");
         exit(1);
     }
+
     if (rma_alloc_size && (rma_alloc_size < ram_size)) {
-        rma_size = rma_alloc_size;
+        spapr->rma_size = rma_alloc_size;
     } else {
-        rma_size = ram_size;
+        spapr->rma_size = ram_size;
+
+        /* With KVM, we don't actually know whether KVM supports an
+         * unbounded RMA (PR KVM) or is limited by the hash table size
+         * (HV KVM using VRMA), so we always assume the latter
+         *
+         * In that case, we also limit the initial allocations for RTAS
+         * etc... to 256M since we have no way to know what the VRMA size
+         * is going to be as it depends on the size of the hash table
+         * isn't determined yet.
+         */
+        if (kvm_enabled()) {
+            spapr->vrma_adjust = 1;
+            spapr->rma_size = MIN(spapr->rma_size, 0x10000000);
+        }
     }
 
     /* We place the device tree and RTAS just below either the top of the RMA,
      * or just below 2GB, whichever is lowere, so that it can be
      * processed with 32-bit real mode code if necessary */
-    rtas_limit = MIN(rma_size, 0x80000000);
+    rtas_limit = MIN(spapr->rma_size, 0x80000000);
     spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
     spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
     load_limit = spapr->fdt_addr - FW_OVERHEAD;
 
+    /* For now, always aim for a 16MB hash table */
+    /* FIXME: we should change this default based on RAM size */
+    spapr->htab_shift = 24;
+
     /* init CPUs */
     if (cpu_model == NULL) {
         cpu_model = kvm_enabled() ? "host" : "POWER7";
@@ -670,25 +740,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         memory_region_add_subregion(sysmem, nonrma_base, ram);
     }
 
-    /* allocate hash page table.  For now we always make this 16mb,
-     * later we should probably make it scale to the size of guest
-     * RAM.  FIXME: setting the htab information in the CPU env really
-     * belongs at CPU reset time, but we can get away with it for now
-     * because the PAPR guest is not permitted to write SDR1 so in
-     * fact these settings will never change during the run */
-    spapr->htab_size = 1ULL << (pteg_shift + 7);
-    spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size);
-
-    for (env = first_cpu; env != NULL; env = env->next_cpu) {
-        env->external_htab = spapr->htab;
-        env->htab_base = -1;
-        env->htab_mask = spapr->htab_size - 1;
-
-        /* Tell KVM that we're in PAPR mode */
-        env->spr[SPR_SDR1] = (unsigned long)spapr->htab |
-                             ((pteg_shift + 7) - 18);
-    }
-
     filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
     spapr->rtas_size = load_image_targphys(filename, spapr->rtas_addr,
                                            rtas_limit - spapr->rtas_addr);
@@ -747,7 +798,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         spapr_vscsi_create(spapr->vio_bus);
     }
 
-    if (rma_size < (MIN_RMA_SLOF << 20)) {
+    if (spapr->rma_size < (MIN_RMA_SLOF << 20)) {
         fprintf(stderr, "qemu: pSeries SLOF firmware requires >= "
                 "%ldM guest RMA (Real Mode Area memory)\n", MIN_RMA_SLOF);
         exit(1);
@@ -799,11 +850,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     spapr->entry_point = 0x100;
 
     /* Prepare the device tree */
-    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
+    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
                                             initrd_base, initrd_size,
                                             kernel_size,
-                                            boot_device, kernel_cmdline,
-                                            pteg_shift + 7);
+                                            boot_device, kernel_cmdline);
     assert(spapr->fdt_skel != NULL);
 }
 
diff --git a/hw/spapr.h b/hw/spapr.h
index 5d76c5e..4a155ac 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -15,7 +15,9 @@ typedef struct sPAPREnvironment {
 
     target_phys_addr_t ram_limit;
     void *htab;
-    long htab_size;
+    long htab_shift;
+    target_phys_addr_t rma_size;
+    int vrma_adjust;
     target_phys_addr_t fdt_addr, rtas_addr;
     long rtas_size;
     void *fdt_skel;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 1a7489b..546c116 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1010,6 +1010,7 @@ int kvmppc_smt_threads(void)
     return cap_ppc_smt ? cap_ppc_smt : 1;
 }
 
+#ifdef TARGET_PPC64
 off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
 {
     void *rma;
@@ -1053,6 +1054,16 @@ off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
     return size;
 }
 
+uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
+{
+    if (cap_ppc_rma >= 2) {
+        return current_size;
+    }
+    return MIN(current_size,
+               getrampagesize() << (hash_shift - 7));
+}
+#endif
+
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
 {
     struct kvm_create_spapr_tce args = {
@@ -1112,6 +1123,24 @@ int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
     return 0;
 }
 
+int kvmppc_reset_htab(int shift_hint)
+{
+    uint32_t shift = shift_hint;
+
+    if (kvm_enabled() &&
+        kvm_check_extension(kvm_state, KVM_CAP_PPC_ALLOC_HTAB)) {
+        int ret;
+        ret = kvm_vm_ioctl(kvm_state, KVM_PPC_ALLOCATE_HTAB, &shift);
+        if (ret < 0) {
+            return ret;
+        }
+        return shift;
+    }
+
+    /* For now.. */
+    return 0;
+}
+
 static inline uint32_t mfpvr(void)
 {
     uint32_t pvr;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index e2f8703..baad6eb 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -27,6 +27,8 @@ int kvmppc_smt_threads(void);
 off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem);
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
+int kvmppc_reset_htab(int shift_hint);
+uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
 #endif /* !CONFIG_USER_ONLY */
 const ppc_def_t *kvmppc_host_cpu_def(void);
 int kvmppc_fixup_cpu(CPUPPCState *env);
@@ -94,6 +96,23 @@ static inline int kvmppc_remove_spapr_tce(void *table, int pfd,
 {
     return -1;
 }
+
+static inline int kvmppc_reset_htab(int shift_hint)
+{
+    return -1;
+}
+
+static inline uint64_t kvmppc_rma_size(uint64_t current_size,
+                                       unsigned int hash_shift)
+{
+    return ram_size;
+}
+
+static inline int kvmppc_update_sdr1(CPUPPCState *env)
+{
+    return 0;
+}
+
 #endif /* !CONFIG_USER_ONLY */
 
 static inline const ppc_def_t *kvmppc_host_cpu_def(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 6/7] pseries: Clear TCE state when resetting PAPR VIO devices
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
                   ` (4 preceding siblings ...)
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 5/7] pseries: Add support for new KVM hash table control call David Gibson
@ 2012-08-15  4:33 ` David Gibson
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 7/7] ppc/pseries: Reset VPA registration on CPU reset David Gibson
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

When we reset the system, the reset method for VIO bus devices resets the
state of their request queue (if present) as it should.  However it was not
resetting the state of their TCE table (DMA translation) if present.  This
patch corrects that bug, and also removes some small code duplication in
the reset paths.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr_vio.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 7ca4452..298e239 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -324,9 +324,7 @@ static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
     }
     dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
 
-    dev->crq.qladdr = 0;
-    dev->crq.qsize = 0;
-    dev->crq.qnext = 0;
+    free_crq(dev);
 }
 
 static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token,
@@ -409,9 +407,8 @@ static void spapr_vio_busdev_reset(DeviceState *qdev)
     VIOsPAPRDevice *dev = DO_UPCAST(VIOsPAPRDevice, qdev, qdev);
     VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
 
-    if (dev->crq.qsize) {
-        free_crq(dev);
-    }
+    /* Shut down the request queue and TCEs if necessary */
+    spapr_vio_quiesce_one(dev);
 
     if (pc->reset) {
         pc->reset(dev);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 7/7] ppc/pseries: Reset VPA registration on CPU reset
  2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
                   ` (5 preceding siblings ...)
  2012-08-15  4:33 ` [Qemu-devel] [PATCH 6/7] pseries: Clear TCE state when resetting PAPR VIO devices David Gibson
@ 2012-08-15  4:33 ` David Gibson
  6 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2012-08-15  4:33 UTC (permalink / raw)
  To: agraf; +Cc: qemu-ppc, qemu-devel, paulus, David Gibson

The ppc specific CPU state contains several variables which track the
VPA, SLB shadow and dispatch trace log.  These are structures shared
between OS and hypervisor that are used on the pseries machine to track
various per-CPU quantities.

The address of these structures needs to be registered by the guest on each
boot, however currently this registration is not cleared when we reset the
cpu.  This patch corrects this bug.  Well, it corrects it for the full emu
case anyway.  To fix the KVM case, we need some KVM extensions to actually
make the KVM internal VPA registration accessible to qemu.  Patches coming
for qemu once the kernel updates are merged.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 target-ppc/translate_init.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 6fe4168..2bc4a3f 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10423,6 +10423,14 @@ static void ppc_cpu_reset(CPUState *s)
     env->pending_interrupts = 0;
     env->exception_index = POWERPC_EXCP_NONE;
     env->error_code = 0;
+
+#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+    env->vpa = 0;
+    env->slb_shadow = 0;
+    env->dispatch_trace_log = 0;
+    env->dtl_size = 0;
+#endif /* TARGET_PPC64 */
+
     /* Flush all TLBs */
     tlb_flush(env, 1);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-18  8:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-15  4:33 [Qemu-devel] [0/7] pseries: Patches to fix system reset David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 1/7] Allow QEMUMachine to override reset sequencing David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 2/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson
2012-08-17 13:58   ` Alexander Graf
2012-08-17 16:24     ` [Qemu-devel] [Qemu-ppc] " David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 3/7] pseries: Fix and cleanup CPU initialization and reset David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 4/7] pseries: Use new method to correct reset sequence David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 5/7] pseries: Add support for new KVM hash table control call David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 6/7] pseries: Clear TCE state when resetting PAPR VIO devices David Gibson
2012-08-15  4:33 ` [Qemu-devel] [PATCH 7/7] ppc/pseries: Reset VPA registration on CPU reset David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).