* [Qemu-devel] [0/7] System reset fixes for pseries @ 2012-09-10 6:38 David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 1/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson ` (6 more replies) 0 siblings, 7 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel Hi Alex, Now that 1.3 development is open, here's my pending series of patches to fix system reset for the pseries machine. This has most of the outstanding fixes - there is another for resetting the registered VPA address which is not here yet, because the kernel support to sync it with KVM is still on its way upstream. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/7] ppc: Make kvm_arch_put_registers() put *all* the registers 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset David Gibson ` (5 subsequent siblings) 6 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson At least when invoked with high enough 'level' arguments, kvm_arch_put_registers() is supposed to copy essentially all the cpu state as encoded in qemu's internal structures into the kvm state. Currently the ppc version does not do this - it never calls KVM_SET_SREGS, for example, and therefore never sets the SDR1 and various other important though rarely changed registers. Instead, the code paths which need to set these registers need to explicitly make (conditional) kvm calls which transfer the changes to kvm. This breaks the usual model of handling state updates in qemu, where code just changes the internal model and has it flushed out to kvm automatically at some later point. This patch fixes this for Book S ppc CPUs by adding a suitable call to KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register that is set with that call so far). This lets us remove the hacks to explicitly set these registers from the kvmppc_set_papr() function. The problem still exists for Book E CPUs (which use a different version of the kvm_sregs structure). But fixing that has some complications of its own so can be left to another day. Lkewise, there is still some ugly code for setting the PVR through special calls to SET_SREGS which is left in for now. The PVR needs to be set especially early because it can affect what other features are available on the CPU, so I need to do more thinking to see if it can be integrated into the normal paths or not. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- target-ppc/kvm.c | 89 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 50 insertions(+), 39 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index a31d278..1a7489b 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -60,6 +60,7 @@ static int cap_booke_sregs; static int cap_ppc_smt; static int cap_ppc_rma; static int cap_spapr_tce; +static int cap_hior; /* XXX We have a race condition where we actually have a level triggered * interrupt, but the infrastructure can't expose that yet, so the guest @@ -86,6 +87,7 @@ int kvm_arch_init(KVMState *s) cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT); cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA); cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE); + cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR); if (!cap_interrupt_level) { fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the " @@ -469,6 +471,53 @@ int kvm_arch_put_registers(CPUPPCState *env, int level) env->tlb_dirty = false; } + if (cap_segstate && (level >= KVM_PUT_RESET_STATE)) { + struct kvm_sregs sregs; + + sregs.pvr = env->spr[SPR_PVR]; + + sregs.u.s.sdr1 = env->spr[SPR_SDR1]; + + /* Sync SLB */ +#ifdef TARGET_PPC64 + for (i = 0; i < 64; i++) { + sregs.u.s.ppc64.slb[i].slbe = env->slb[i].esid; + sregs.u.s.ppc64.slb[i].slbv = env->slb[i].vsid; + } +#endif + + /* Sync SRs */ + for (i = 0; i < 16; i++) { + sregs.u.s.ppc32.sr[i] = env->sr[i]; + } + + /* Sync BATs */ + for (i = 0; i < 8; i++) { + sregs.u.s.ppc32.dbat[i] = ((uint64_t)env->DBAT[1][i] << 32) + | env->DBAT[0][i]; + sregs.u.s.ppc32.ibat[i] = ((uint64_t)env->IBAT[1][i] << 32) + | env->IBAT[0][i]; + } + + ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs); + if (ret) { + return ret; + } + } + + if (cap_hior && (level >= KVM_PUT_RESET_STATE)) { + uint64_t hior = env->spr[SPR_HIOR]; + struct kvm_one_reg reg = { + .id = KVM_REG_PPC_HIOR, + .addr = (uintptr_t) &hior, + }; + + ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, ®); + if (ret) { + return ret; + } + } + return ret; } @@ -946,52 +995,14 @@ int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len) void kvmppc_set_papr(CPUPPCState *env) { struct kvm_enable_cap cap = {}; - struct kvm_one_reg reg = {}; - struct kvm_sregs sregs = {}; int ret; - uint64_t hior = env->spr[SPR_HIOR]; cap.cap = KVM_CAP_PPC_PAPR; ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, &cap); if (ret) { - goto fail; - } - - /* - * XXX We set HIOR here. It really should be a qdev property of - * the CPU node, but we don't have CPUs converted to qdev yet. - * - * Once we have qdev CPUs, move HIOR to a qdev property and - * remove this chunk. - */ - reg.id = KVM_REG_PPC_HIOR; - reg.addr = (uintptr_t)&hior; - ret = kvm_vcpu_ioctl(env, KVM_SET_ONE_REG, ®); - if (ret) { - fprintf(stderr, "Couldn't set HIOR. Maybe you're running an old \n" - "kernel with support for HV KVM but no PAPR PR \n" - "KVM in which case things will work. If they don't \n" - "please update your host kernel!\n"); - } - - /* Set SDR1 so kernel space finds the HTAB */ - ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs); - if (ret) { - goto fail; - } - - sregs.u.s.sdr1 = env->spr[SPR_SDR1]; - - ret = kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs); - if (ret) { - goto fail; + cpu_abort(env, "This KVM version does not support PAPR\n"); } - - return; - -fail: - cpu_abort(env, "This KVM version does not support PAPR\n"); } int kvmppc_smt_threads(void) -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 1/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 13:45 ` Andreas Färber 2012-09-10 6:38 ` [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence David Gibson ` (4 subsequent siblings) 6 siblings, 1 reply; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson The current pseries machine init function iterates over the CPUs at several points, doing various bits of initialization. This is messy; these can and should be merged into a single iteration doing all the necessary per cpu initialization. Worse, some of these initializations were setting up state which should be set on every reset, not just at machine init time. A few of the initializations simply weren't necessary at all. This patch, therefore, moves those things that need to be to the per-cpu reset handler, and combines the remainder into two loops over the cpus (which also creates them). The second loop is for setting up hash table information, and will be removed in a subsequent patch also making other fixes to the hash table setup. This exposes a bug in our start-cpu RTAS routine (called by the guest to start up CPUs other than CPU0) under kvm. Previously, this function did not make a call to ensure that it's changes to the new cpu's state were pushed into KVM in-kernel state. We sort-of got away with this because some of the initializations had already placed the secondary CPUs into the right starting state for the sorts of Linux guests we've been running. Nonetheless the start-cpu RTAS call's behaviour was not correct and could easily have been broken by guest changes. This patch also fixes it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr.c | 33 +++++++++++++++++++-------------- hw/spapr_rtas.c | 5 +++++ 2 files changed, 24 insertions(+), 14 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index c34b767..be9092a 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -581,8 +581,15 @@ static void spapr_reset(void *opaque) static void spapr_cpu_reset(void *opaque) { PowerPCCPU *cpu = opaque; + CPUPPCState *env = &cpu->env; cpu_reset(CPU(cpu)); + + /* All CPUs start halted, except CPU0, the rest are explicitly + * started up by the guest using an RTAS call */ + env->halted = 1; + + env->spr[SPR_HIOR] = 0; } /* Returns whether we want to use VGA or not */ @@ -665,11 +672,16 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Set time-base frequency to 512 MHz */ cpu_ppc_tb_init(env, TIMEBASE_FREQ); - qemu_register_reset(spapr_cpu_reset, cpu); - env->hreset_vector = 0x60; + /* PAPR always has exception vectors in RAM not ROM */ env->hreset_excp_prefix = 0; - env->gpr[3] = env->cpu_index; + + /* Tell KVM that we're in PAPR mode */ + if (kvm_enabled()) { + kvmppc_set_papr(env); + } + + qemu_register_reset(spapr_cpu_reset, cpu); } /* allocate RAM */ @@ -685,7 +697,10 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* allocate hash page table. For now we always make this 16mb, * later we should probably make it scale to the size of guest - * RAM */ + * RAM. FIXME: setting the htab information in the CPU env really + * belongs at CPU reset time, but we can get away with it for now + * because the PAPR guest is not permitted to write SDR1 so in + * fact these settings will never change during the run */ spapr->htab_size = 1ULL << (pteg_shift + 7); spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size); @@ -697,11 +712,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Tell KVM that we're in PAPR mode */ env->spr[SPR_SDR1] = (unsigned long)spapr->htab | ((pteg_shift + 7) - 18); - env->spr[SPR_HIOR] = 0; - - if (kvm_enabled()) { - kvmppc_set_papr(env); - } } filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin"); @@ -827,11 +837,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, spapr->entry_point = 0x100; - /* SLOF will startup the secondary CPUs using RTAS */ - for (env = first_cpu; env != NULL; env = env->next_cpu) { - env->halted = 1; - } - /* Prepare the device tree */ spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size, initrd_base, initrd_size, diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c index ae18595..b808f80 100644 --- a/hw/spapr_rtas.c +++ b/hw/spapr_rtas.c @@ -184,6 +184,11 @@ static void rtas_start_cpu(sPAPREnvironment *spapr, return; } + /* This will make sure qemu state is up to date with kvm, and + * mark it dirty so our changes get flushed back before the + * new cpu enters */ + kvm_cpu_synchronize_state(env); + env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); env->nip = start; env->gpr[3] = r3; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset 2012-09-10 6:38 ` [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset David Gibson @ 2012-09-10 13:45 ` Andreas Färber 2012-09-10 23:45 ` [Qemu-devel] [Qemu-ppc] " David Gibson 0 siblings, 1 reply; 12+ messages in thread From: Andreas Färber @ 2012-09-10 13:45 UTC (permalink / raw) To: David Gibson; +Cc: qemu-ppc, agraf, qemu-devel Am 10.09.2012 08:38, schrieb David Gibson: > The current pseries machine init function iterates over the CPUs at several > points, doing various bits of initialization. This is messy; these can > and should be merged into a single iteration doing all the necessary per > cpu initialization. Worse, some of these initializations were setting up > state which should be set on every reset, not just at machine init time. > A few of the initializations simply weren't necessary at all. > > This patch, therefore, moves those things that need to be to the > per-cpu reset handler, and combines the remainder into two loops over > the cpus (which also creates them). The second loop is for setting up > hash table information, and will be removed in a subsequent patch also > making other fixes to the hash table setup. > > This exposes a bug in our start-cpu RTAS routine (called by the guest to > start up CPUs other than CPU0) under kvm. Previously, this function did > not make a call to ensure that it's changes to the new cpu's state were > pushed into KVM in-kernel state. We sort-of got away with this because > some of the initializations had already placed the secondary CPUs into the > right starting state for the sorts of Linux guests we've been running. > > Nonetheless the start-cpu RTAS call's behaviour was not correct and could > easily have been broken by guest changes. This patch also fixes it. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > hw/spapr.c | 33 +++++++++++++++++++-------------- > hw/spapr_rtas.c | 5 +++++ > 2 files changed, 24 insertions(+), 14 deletions(-) > > diff --git a/hw/spapr.c b/hw/spapr.c > index c34b767..be9092a 100644 > --- a/hw/spapr.c > +++ b/hw/spapr.c > @@ -581,8 +581,15 @@ static void spapr_reset(void *opaque) > static void spapr_cpu_reset(void *opaque) > { > PowerPCCPU *cpu = opaque; > + CPUPPCState *env = &cpu->env; > > cpu_reset(CPU(cpu)); > + > + /* All CPUs start halted, except CPU0, the rest are explicitly > + * started up by the guest using an RTAS call */ > + env->halted = 1; That comment does not make sense to me. There is no differentiation between CPU0 and other CPUs here, is an if (env->cpu_index != 0) missing or is ->halted overwritten for CPU0 elsewhere? (then the comment should probably go there instead?) Andreas > + > + env->spr[SPR_HIOR] = 0; > } > > /* Returns whether we want to use VGA or not */ > @@ -665,11 +672,16 @@ static void ppc_spapr_init(ram_addr_t ram_size, > > /* Set time-base frequency to 512 MHz */ > cpu_ppc_tb_init(env, TIMEBASE_FREQ); > - qemu_register_reset(spapr_cpu_reset, cpu); > > - env->hreset_vector = 0x60; > + /* PAPR always has exception vectors in RAM not ROM */ > env->hreset_excp_prefix = 0; > - env->gpr[3] = env->cpu_index; > + > + /* Tell KVM that we're in PAPR mode */ > + if (kvm_enabled()) { > + kvmppc_set_papr(env); > + } > + > + qemu_register_reset(spapr_cpu_reset, cpu); > } > > /* allocate RAM */ > @@ -685,7 +697,10 @@ static void ppc_spapr_init(ram_addr_t ram_size, > > /* allocate hash page table. For now we always make this 16mb, > * later we should probably make it scale to the size of guest > - * RAM */ > + * RAM. FIXME: setting the htab information in the CPU env really > + * belongs at CPU reset time, but we can get away with it for now > + * because the PAPR guest is not permitted to write SDR1 so in > + * fact these settings will never change during the run */ > spapr->htab_size = 1ULL << (pteg_shift + 7); > spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size); > > @@ -697,11 +712,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, > /* Tell KVM that we're in PAPR mode */ > env->spr[SPR_SDR1] = (unsigned long)spapr->htab | > ((pteg_shift + 7) - 18); > - env->spr[SPR_HIOR] = 0; > - > - if (kvm_enabled()) { > - kvmppc_set_papr(env); > - } > } > > filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin"); > @@ -827,11 +837,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, > > spapr->entry_point = 0x100; > > - /* SLOF will startup the secondary CPUs using RTAS */ > - for (env = first_cpu; env != NULL; env = env->next_cpu) { > - env->halted = 1; > - } > - > /* Prepare the device tree */ > spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size, > initrd_base, initrd_size, > diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c > index ae18595..b808f80 100644 > --- a/hw/spapr_rtas.c > +++ b/hw/spapr_rtas.c > @@ -184,6 +184,11 @@ static void rtas_start_cpu(sPAPREnvironment *spapr, > return; > } > > + /* This will make sure qemu state is up to date with kvm, and > + * mark it dirty so our changes get flushed back before the > + * new cpu enters */ > + kvm_cpu_synchronize_state(env); > + > env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); > env->nip = start; > env->gpr[3] = r3; > -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset 2012-09-10 13:45 ` Andreas Färber @ 2012-09-10 23:45 ` David Gibson 0 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 23:45 UTC (permalink / raw) To: Andreas Färber; +Cc: qemu-ppc, qemu-devel On Mon, Sep 10, 2012 at 03:45:31PM +0200, Andreas Färber wrote: > Am 10.09.2012 08:38, schrieb David Gibson: > > The current pseries machine init function iterates over the CPUs at several > > points, doing various bits of initialization. This is messy; these can > > and should be merged into a single iteration doing all the necessary per > > cpu initialization. Worse, some of these initializations were setting up > > state which should be set on every reset, not just at machine init time. > > A few of the initializations simply weren't necessary at all. > > > > This patch, therefore, moves those things that need to be to the > > per-cpu reset handler, and combines the remainder into two loops over > > the cpus (which also creates them). The second loop is for setting up > > hash table information, and will be removed in a subsequent patch also > > making other fixes to the hash table setup. > > > > This exposes a bug in our start-cpu RTAS routine (called by the guest to > > start up CPUs other than CPU0) under kvm. Previously, this function did > > not make a call to ensure that it's changes to the new cpu's state were > > pushed into KVM in-kernel state. We sort-of got away with this because > > some of the initializations had already placed the secondary CPUs into the > > right starting state for the sorts of Linux guests we've been running. > > > > Nonetheless the start-cpu RTAS call's behaviour was not correct and could > > easily have been broken by guest changes. This patch also fixes it. > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > hw/spapr.c | 33 +++++++++++++++++++-------------- > > hw/spapr_rtas.c | 5 +++++ > > 2 files changed, 24 insertions(+), 14 deletions(-) > > > > diff --git a/hw/spapr.c b/hw/spapr.c > > index c34b767..be9092a 100644 > > --- a/hw/spapr.c > > +++ b/hw/spapr.c > > @@ -581,8 +581,15 @@ static void spapr_reset(void *opaque) > > static void spapr_cpu_reset(void *opaque) > > { > > PowerPCCPU *cpu = opaque; > > + CPUPPCState *env = &cpu->env; > > > > cpu_reset(CPU(cpu)); > > + > > + /* All CPUs start halted, except CPU0, the rest are explicitly > > + * started up by the guest using an RTAS call */ > > + env->halted = 1; > > That comment does not make sense to me. There is no differentiation > between CPU0 and other CPUs here, is an if (env->cpu_index != 0) missing > or is ->halted overwritten for CPU0 elsewhere? (then the comment should > probably go there instead?) Ah, I guess it's not entirely clear here. Yes, ->halted is overwritten for CPU0 elsewhere, from spapr_reset(). Below is a revised version of the patch with a clearer comment. >From c5b36b49f51bda02e6b012d6a004ef9862de856f Mon Sep 17 00:00:00 2001 From: David Gibson <david@gibson.dropbear.id.au> Date: Tue, 11 Sep 2012 09:44:17 +1000 Subject: [PATCH] pseries: Fix and cleanup CPU initialization and reset The current pseries machine init function iterates over the CPUs at several points, doing various bits of initialization. This is messy; these can and should be merged into a single iteration doing all the necessary per cpu initialization. Worse, some of these initializations were setting up state which should be set on every reset, not just at machine init time. A few of the initializations simply weren't necessary at all. This patch, therefore, moves those things that need to be to the per-cpu reset handler, and combines the remainder into two loops over the cpus (which also creates them). The second loop is for setting up hash table information, and will be removed in a subsequent patch also making other fixes to the hash table setup. This exposes a bug in our start-cpu RTAS routine (called by the guest to start up CPUs other than CPU0) under kvm. Previously, this function did not make a call to ensure that it's changes to the new cpu's state were pushed into KVM in-kernel state. We sort-of got away with this because some of the initializations had already placed the secondary CPUs into the right starting state for the sorts of Linux guests we've been running. Nonetheless the start-cpu RTAS call's behaviour was not correct and could easily have been broken by guest changes. This patch also fixes it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr.c | 34 ++++++++++++++++++++-------------- hw/spapr_rtas.c | 5 +++++ 2 files changed, 25 insertions(+), 14 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index c34b767..d88525a 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -581,8 +581,16 @@ static void spapr_reset(void *opaque) static void spapr_cpu_reset(void *opaque) { PowerPCCPU *cpu = opaque; + CPUPPCState *env = &cpu->env; cpu_reset(CPU(cpu)); + + /* All CPUs start halted. CPU0 is unhalted from the machine level + * reset code and the rest are explicitly started up by the guest + * using an RTAS call */ + env->halted = 1; + + env->spr[SPR_HIOR] = 0; } /* Returns whether we want to use VGA or not */ @@ -665,11 +673,16 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Set time-base frequency to 512 MHz */ cpu_ppc_tb_init(env, TIMEBASE_FREQ); - qemu_register_reset(spapr_cpu_reset, cpu); - env->hreset_vector = 0x60; + /* PAPR always has exception vectors in RAM not ROM */ env->hreset_excp_prefix = 0; - env->gpr[3] = env->cpu_index; + + /* Tell KVM that we're in PAPR mode */ + if (kvm_enabled()) { + kvmppc_set_papr(env); + } + + qemu_register_reset(spapr_cpu_reset, cpu); } /* allocate RAM */ @@ -685,7 +698,10 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* allocate hash page table. For now we always make this 16mb, * later we should probably make it scale to the size of guest - * RAM */ + * RAM. FIXME: setting the htab information in the CPU env really + * belongs at CPU reset time, but we can get away with it for now + * because the PAPR guest is not permitted to write SDR1 so in + * fact these settings will never change during the run */ spapr->htab_size = 1ULL << (pteg_shift + 7); spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size); @@ -697,11 +713,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, /* Tell KVM that we're in PAPR mode */ env->spr[SPR_SDR1] = (unsigned long)spapr->htab | ((pteg_shift + 7) - 18); - env->spr[SPR_HIOR] = 0; - - if (kvm_enabled()) { - kvmppc_set_papr(env); - } } filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin"); @@ -827,11 +838,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, spapr->entry_point = 0x100; - /* SLOF will startup the secondary CPUs using RTAS */ - for (env = first_cpu; env != NULL; env = env->next_cpu) { - env->halted = 1; - } - /* Prepare the device tree */ spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size, initrd_base, initrd_size, diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c index ae18595..b808f80 100644 --- a/hw/spapr_rtas.c +++ b/hw/spapr_rtas.c @@ -184,6 +184,11 @@ static void rtas_start_cpu(sPAPREnvironment *spapr, return; } + /* This will make sure qemu state is up to date with kvm, and + * mark it dirty so our changes get flushed back before the + * new cpu enters */ + kvm_cpu_synchronize_state(env); + env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); env->nip = start; env->gpr[3] = r3; -- 1.7.10.4 -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 1/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 13:47 ` Andreas Färber 2012-09-10 6:38 ` [Qemu-devel] [PATCH 4/7] pseries: Add support for new KVM hash table control call David Gibson ` (3 subsequent siblings) 6 siblings, 1 reply; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson A number of things need to occur during reset of the PAPR paravirtualized platform in a specific order. For example, the hash table needs to be cleared before the CPUs are reset, so that they initialize their register state correctly, and the CPUs need to have their main reset called before we set up the entry point state on the boot cpu. We also need to have the main qdev reset happen before the creation and installation of the device tree for the new boot, because we need the state of the devices settled to correctly construct the device tree. We currently do the pseries once-per-reset initializations done from a reset handler. However we can't adequately control when this handler is called during the reset - in particular we can't guarantee it happens after all the qdev resets (since qdevs might be registered after the machine init function has executed). This patch uses the new QEMUMachine reset method to to fix this problem, ensuring the various order dependent reset steps happen in the correct order. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index be9092a..0fdfe41 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -559,13 +559,13 @@ static void emulate_spapr_hypercall(CPUPPCState *env) env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]); } -static void spapr_reset(void *opaque) +static void ppc_spapr_reset(void) { - sPAPREnvironment *spapr = (sPAPREnvironment *)opaque; - /* flush out the hash table */ memset(spapr->htab, 0, spapr->htab_size); + qemu_devices_reset(); + /* Load the fdt */ spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr, spapr->rtas_size); @@ -844,14 +844,13 @@ static void ppc_spapr_init(ram_addr_t ram_size, boot_device, kernel_cmdline, pteg_shift + 7); assert(spapr->fdt_skel != NULL); - - qemu_register_reset(spapr_reset, spapr); } static QEMUMachine spapr_machine = { .name = "pseries", .desc = "pSeries Logical Partition (PAPR compliant)", .init = ppc_spapr_init, + .reset = ppc_spapr_reset, .max_cpus = MAX_CPUS, .no_parallel = 1, .use_scsi = 1, -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence 2012-09-10 6:38 ` [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence David Gibson @ 2012-09-10 13:47 ` Andreas Färber 0 siblings, 0 replies; 12+ messages in thread From: Andreas Färber @ 2012-09-10 13:47 UTC (permalink / raw) To: David Gibson; +Cc: qemu-ppc, agraf, qemu-devel Am 10.09.2012 08:38, schrieb David Gibson: > A number of things need to occur during reset of the PAPR > paravirtualized platform in a specific order. For example, the hash > table needs to be cleared before the CPUs are reset, so that they > initialize their register state correctly, and the CPUs need to have > their main reset called before we set up the entry point state on the > boot cpu. We also need to have the main qdev reset happen before the > creation and installation of the device tree for the new boot, because > we need the state of the devices settled to correctly construct the > device tree. > > We currently do the pseries once-per-reset initializations done from a > reset handler. However we can't adequately control when this handler > is called during the reset - in particular we can't guarantee it > happens after all the qdev resets (since qdevs might be registered > after the machine init function has executed). > > This patch uses the new QEMUMachine reset method to to fix this > problem, ensuring the various order dependent reset steps happen in > the correct order. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andreas Färber <afaerber@suse.de> Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/7] pseries: Add support for new KVM hash table control call 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson ` (2 preceding siblings ...) 2012-09-10 6:38 ` [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 5/7] pseries: Clear TCE and signal state when resetting PAPR VIO devices David Gibson ` (2 subsequent siblings) 6 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson From: Ben Herrenschmidt <benh@kernel.crashing.org> This adds support for then new "reset htab" ioctl which allows qemu to properly cleanup the MMU hash table when the guest is reset. With the corresponding kernel support, reset of a guest now works properly. This also paves the way for indicating a different size hash table to the kernel and for the kernel to be able to impose limits on the requested size. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr.c | 274 +++++++++++++++++++++++++++++--------------------- hw/spapr.h | 4 +- target-ppc/kvm.c | 29 ++++++ target-ppc/kvm_ppc.h | 19 ++++ 4 files changed, 213 insertions(+), 113 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 0fdfe41..4c9b4ad 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -85,6 +85,8 @@ #define PHANDLE_XICP 0x00001111 +#define HTAB_SIZE(spapr) (1ULL << ((spapr)->htab_shift)) + sPAPREnvironment *spapr; int spapr_allocate_irq(int hint, enum xics_irq_type type) @@ -134,12 +136,13 @@ int spapr_allocate_irq_block(int num, enum xics_irq_type type) return first; } -static int spapr_set_associativity(void *fdt, sPAPREnvironment *spapr) +static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) { int ret = 0, offset; CPUPPCState *env; char cpu_model[32]; int smt = kvmppc_smt_threads(); + uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)}; assert(spapr->cpu_model); @@ -163,8 +166,16 @@ static int spapr_set_associativity(void *fdt, sPAPREnvironment *spapr) return offset; } - ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity, - sizeof(associativity)); + if (nb_numa_nodes > 1) { + ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity, + sizeof(associativity)); + if (ret < 0) { + return ret; + } + } + + ret = fdt_setprop(fdt, offset, "ibm,pft-size", + pft_size_prop, sizeof(pft_size_prop)); if (ret < 0) { return ret; } @@ -206,45 +217,36 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop, return (p - prop) * sizeof(uint32_t); } +#define _FDT(exp) \ + do { \ + int ret = (exp); \ + if (ret < 0) { \ + fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \ + #exp, fdt_strerror(ret)); \ + exit(1); \ + } \ + } while (0) + + static void *spapr_create_fdt_skel(const char *cpu_model, - target_phys_addr_t rma_size, target_phys_addr_t initrd_base, target_phys_addr_t initrd_size, target_phys_addr_t kernel_size, const char *boot_device, - const char *kernel_cmdline, - long hash_shift) + const char *kernel_cmdline) { void *fdt; CPUPPCState *env; - uint64_t mem_reg_property[2]; uint32_t start_prop = cpu_to_be32(initrd_base); uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size); - uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)}; char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt" "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk"; char qemu_hypertas_prop[] = "hcall-memop1"; + uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)}; uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; - int i; char *modelname; - int smt = kvmppc_smt_threads(); + int i, smt = kvmppc_smt_threads(); unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80}; - uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)}; - uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0), - cpu_to_be32(0x0), cpu_to_be32(0x0), - cpu_to_be32(0x0)}; - char mem_name[32]; - target_phys_addr_t node0_size, mem_start; - -#define _FDT(exp) \ - do { \ - int ret = (exp); \ - if (ret < 0) { \ - fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \ - #exp, fdt_strerror(ret)); \ - exit(1); \ - } \ - } while (0) fdt = g_malloc0(FDT_MAX_SIZE); _FDT((fdt_create(fdt, FDT_MAX_SIZE))); @@ -289,55 +291,6 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_end_node(fdt))); - /* memory node(s) */ - node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size; - if (rma_size > node0_size) { - rma_size = node0_size; - } - - /* RMA */ - mem_reg_property[0] = 0; - mem_reg_property[1] = cpu_to_be64(rma_size); - _FDT((fdt_begin_node(fdt, "memory@0"))); - _FDT((fdt_property_string(fdt, "device_type", "memory"))); - _FDT((fdt_property(fdt, "reg", mem_reg_property, - sizeof(mem_reg_property)))); - _FDT((fdt_property(fdt, "ibm,associativity", associativity, - sizeof(associativity)))); - _FDT((fdt_end_node(fdt))); - - /* RAM: Node 0 */ - if (node0_size > rma_size) { - mem_reg_property[0] = cpu_to_be64(rma_size); - mem_reg_property[1] = cpu_to_be64(node0_size - rma_size); - - sprintf(mem_name, "memory@" TARGET_FMT_lx, rma_size); - _FDT((fdt_begin_node(fdt, mem_name))); - _FDT((fdt_property_string(fdt, "device_type", "memory"))); - _FDT((fdt_property(fdt, "reg", mem_reg_property, - sizeof(mem_reg_property)))); - _FDT((fdt_property(fdt, "ibm,associativity", associativity, - sizeof(associativity)))); - _FDT((fdt_end_node(fdt))); - } - - /* RAM: Node 1 and beyond */ - mem_start = node0_size; - for (i = 1; i < nb_numa_nodes; i++) { - mem_reg_property[0] = cpu_to_be64(mem_start); - mem_reg_property[1] = cpu_to_be64(node_mem[i]); - associativity[3] = associativity[4] = cpu_to_be32(i); - sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start); - _FDT((fdt_begin_node(fdt, mem_name))); - _FDT((fdt_property_string(fdt, "device_type", "memory"))); - _FDT((fdt_property(fdt, "reg", mem_reg_property, - sizeof(mem_reg_property)))); - _FDT((fdt_property(fdt, "ibm,associativity", associativity, - sizeof(associativity)))); - _FDT((fdt_end_node(fdt))); - mem_start += node_mem[i]; - } - /* cpus */ _FDT((fdt_begin_node(fdt, "cpus"))); @@ -389,8 +342,6 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property_cell(fdt, "timebase-frequency", tbfreq))); _FDT((fdt_property_cell(fdt, "clock-frequency", cpufreq))); _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr))); - _FDT((fdt_property(fdt, "ibm,pft-size", - pft_size_prop, sizeof(pft_size_prop)))); _FDT((fdt_property_string(fdt, "status", "okay"))); _FDT((fdt_property(fdt, "64-bit", NULL, 0))); @@ -489,6 +440,68 @@ static void *spapr_create_fdt_skel(const char *cpu_model, return fdt; } +static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt) +{ + uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0), + cpu_to_be32(0x0), cpu_to_be32(0x0), + cpu_to_be32(0x0)}; + char mem_name[32]; + target_phys_addr_t node0_size, mem_start; + uint64_t mem_reg_property[2]; + int i, off; + + /* memory node(s) */ + node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size; + if (spapr->rma_size > node0_size) { + spapr->rma_size = node0_size; + } + + /* RMA */ + mem_reg_property[0] = 0; + mem_reg_property[1] = cpu_to_be64(spapr->rma_size); + off = fdt_add_subnode(fdt, 0, "memory@0"); + _FDT(off); + _FDT((fdt_setprop_string(fdt, off, "device_type", "memory"))); + _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property, + sizeof(mem_reg_property)))); + _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity, + sizeof(associativity)))); + + /* RAM: Node 0 */ + if (node0_size > spapr->rma_size) { + mem_reg_property[0] = cpu_to_be64(spapr->rma_size); + mem_reg_property[1] = cpu_to_be64(node0_size - spapr->rma_size); + + sprintf(mem_name, "memory@" TARGET_FMT_lx, spapr->rma_size); + off = fdt_add_subnode(fdt, 0, mem_name); + _FDT(off); + _FDT((fdt_setprop_string(fdt, off, "device_type", "memory"))); + _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property, + sizeof(mem_reg_property)))); + _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity, + sizeof(associativity)))); + } + + /* RAM: Node 1 and beyond */ + mem_start = node0_size; + for (i = 1; i < nb_numa_nodes; i++) { + mem_reg_property[0] = cpu_to_be64(mem_start); + mem_reg_property[1] = cpu_to_be64(node_mem[i]); + associativity[3] = associativity[4] = cpu_to_be32(i); + sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start); + off = fdt_add_subnode(fdt, 0, mem_name); + _FDT(off); + _FDT((fdt_setprop_string(fdt, off, "device_type", "memory"))); + _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property, + sizeof(mem_reg_property)))); + _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity, + sizeof(associativity)))); + mem_start += node_mem[i]; + } + + return 0; +} + static void spapr_finalize_fdt(sPAPREnvironment *spapr, target_phys_addr_t fdt_addr, target_phys_addr_t rtas_addr, @@ -503,6 +516,12 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr, /* open out the base tree into a temp buffer for the final tweaks */ _FDT((fdt_open_into(spapr->fdt_skel, fdt, FDT_MAX_SIZE))); + ret = spapr_populate_memory(spapr, fdt); + if (ret < 0) { + fprintf(stderr, "couldn't setup memory nodes in fdt\n"); + exit(1); + } + ret = spapr_populate_vdevice(spapr->vio_bus, fdt); if (ret < 0) { fprintf(stderr, "couldn't setup vio devices in fdt\n"); @@ -525,11 +544,9 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr, } /* Advertise NUMA via ibm,associativity */ - if (nb_numa_nodes > 1) { - ret = spapr_set_associativity(fdt, spapr); - if (ret < 0) { - fprintf(stderr, "Couldn't set up NUMA device tree properties\n"); - } + ret = spapr_fixup_cpu_dt(fdt, spapr); + if (ret < 0) { + fprintf(stderr, "Couldn't finalize CPU device tree properties\n"); } if (!spapr->has_graphics) { @@ -559,10 +576,39 @@ static void emulate_spapr_hypercall(CPUPPCState *env) env->gpr[3] = spapr_hypercall(env, env->gpr[3], &env->gpr[4]); } +static void spapr_reset_htab(sPAPREnvironment *spapr) +{ + long shift; + + /* allocate hash page table. For now we always make this 16mb, + * later we should probably make it scale to the size of guest + * RAM */ + + shift = kvmppc_reset_htab(spapr->htab_shift); + + if (shift > 0) { + /* Kernel handles htab, we don't need to allocate one */ + spapr->htab_shift = shift; + } else { + if (!spapr->htab) { + /* Allocate an htab if we don't yet have one */ + spapr->htab = qemu_memalign(HTAB_SIZE(spapr), HTAB_SIZE(spapr)); + } + + /* And clear it */ + memset(spapr->htab, 0, HTAB_SIZE(spapr)); + } + + /* Update the RMA size if necessary */ + if (spapr->vrma_adjust) { + spapr->rma_size = kvmppc_rma_size(ram_size, spapr->htab_shift); + } +} + static void ppc_spapr_reset(void) { - /* flush out the hash table */ - memset(spapr->htab, 0, spapr->htab_size); + /* Reset the hash table & recalc the RMA */ + spapr_reset_htab(spapr); qemu_devices_reset(); @@ -590,6 +636,12 @@ static void spapr_cpu_reset(void *opaque) env->halted = 1; env->spr[SPR_HIOR] = 0; + + env->external_htab = spapr->htab; + env->htab_base = -1; + env->htab_mask = HTAB_SIZE(spapr) - 1; + env->spr[SPR_SDR1] = (unsigned long)spapr->htab | + (spapr->htab_shift - 18); } /* Returns whether we want to use VGA or not */ @@ -623,11 +675,10 @@ static void ppc_spapr_init(ram_addr_t ram_size, int i; MemoryRegion *sysmem = get_system_memory(); MemoryRegion *ram = g_new(MemoryRegion, 1); - target_phys_addr_t rma_alloc_size, rma_size; + target_phys_addr_t rma_alloc_size; uint32_t initrd_base = 0; long kernel_size = 0, initrd_size = 0; long load_limit, rtas_limit, fw_size; - long pteg_shift = 17; char *filename; msi_supported = true; @@ -644,20 +695,39 @@ static void ppc_spapr_init(ram_addr_t ram_size, hw_error("qemu: Unable to create RMA\n"); exit(1); } + if (rma_alloc_size && (rma_alloc_size < ram_size)) { - rma_size = rma_alloc_size; + spapr->rma_size = rma_alloc_size; } else { - rma_size = ram_size; + spapr->rma_size = ram_size; + + /* With KVM, we don't actually know whether KVM supports an + * unbounded RMA (PR KVM) or is limited by the hash table size + * (HV KVM using VRMA), so we always assume the latter + * + * In that case, we also limit the initial allocations for RTAS + * etc... to 256M since we have no way to know what the VRMA size + * is going to be as it depends on the size of the hash table + * isn't determined yet. + */ + if (kvm_enabled()) { + spapr->vrma_adjust = 1; + spapr->rma_size = MIN(spapr->rma_size, 0x10000000); + } } /* We place the device tree and RTAS just below either the top of the RMA, * or just below 2GB, whichever is lowere, so that it can be * processed with 32-bit real mode code if necessary */ - rtas_limit = MIN(rma_size, 0x80000000); + rtas_limit = MIN(spapr->rma_size, 0x80000000); spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE; spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE; load_limit = spapr->fdt_addr - FW_OVERHEAD; + /* For now, always aim for a 16MB hash table */ + /* FIXME: we should change this default based on RAM size */ + spapr->htab_shift = 24; + /* init CPUs */ if (cpu_model == NULL) { cpu_model = kvm_enabled() ? "host" : "POWER7"; @@ -695,25 +765,6 @@ static void ppc_spapr_init(ram_addr_t ram_size, memory_region_add_subregion(sysmem, nonrma_base, ram); } - /* allocate hash page table. For now we always make this 16mb, - * later we should probably make it scale to the size of guest - * RAM. FIXME: setting the htab information in the CPU env really - * belongs at CPU reset time, but we can get away with it for now - * because the PAPR guest is not permitted to write SDR1 so in - * fact these settings will never change during the run */ - spapr->htab_size = 1ULL << (pteg_shift + 7); - spapr->htab = qemu_memalign(spapr->htab_size, spapr->htab_size); - - for (env = first_cpu; env != NULL; env = env->next_cpu) { - env->external_htab = spapr->htab; - env->htab_base = -1; - env->htab_mask = spapr->htab_size - 1; - - /* Tell KVM that we're in PAPR mode */ - env->spr[SPR_SDR1] = (unsigned long)spapr->htab | - ((pteg_shift + 7) - 18); - } - filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin"); spapr->rtas_size = load_image_targphys(filename, spapr->rtas_addr, rtas_limit - spapr->rtas_addr); @@ -786,7 +837,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, } } - if (rma_size < (MIN_RMA_SLOF << 20)) { + if (spapr->rma_size < (MIN_RMA_SLOF << 20)) { fprintf(stderr, "qemu: pSeries SLOF firmware requires >= " "%ldM guest RMA (Real Mode Area memory)\n", MIN_RMA_SLOF); exit(1); @@ -838,11 +889,10 @@ static void ppc_spapr_init(ram_addr_t ram_size, spapr->entry_point = 0x100; /* Prepare the device tree */ - spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size, + spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, initrd_base, initrd_size, kernel_size, - boot_device, kernel_cmdline, - pteg_shift + 7); + boot_device, kernel_cmdline); assert(spapr->fdt_skel != NULL); } diff --git a/hw/spapr.h b/hw/spapr.h index ac34a17..f1fb646 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -15,7 +15,9 @@ typedef struct sPAPREnvironment { target_phys_addr_t ram_limit; void *htab; - long htab_size; + long htab_shift; + target_phys_addr_t rma_size; + int vrma_adjust; target_phys_addr_t fdt_addr, rtas_addr; long rtas_size; void *fdt_skel; diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 1a7489b..546c116 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1010,6 +1010,7 @@ int kvmppc_smt_threads(void) return cap_ppc_smt ? cap_ppc_smt : 1; } +#ifdef TARGET_PPC64 off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem) { void *rma; @@ -1053,6 +1054,16 @@ off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem) return size; } +uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift) +{ + if (cap_ppc_rma >= 2) { + return current_size; + } + return MIN(current_size, + getrampagesize() << (hash_shift - 7)); +} +#endif + void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd) { struct kvm_create_spapr_tce args = { @@ -1112,6 +1123,24 @@ int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size) return 0; } +int kvmppc_reset_htab(int shift_hint) +{ + uint32_t shift = shift_hint; + + if (kvm_enabled() && + kvm_check_extension(kvm_state, KVM_CAP_PPC_ALLOC_HTAB)) { + int ret; + ret = kvm_vm_ioctl(kvm_state, KVM_PPC_ALLOCATE_HTAB, &shift); + if (ret < 0) { + return ret; + } + return shift; + } + + /* For now.. */ + return 0; +} + static inline uint32_t mfpvr(void) { uint32_t pvr; diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index e2f8703..baad6eb 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -27,6 +27,8 @@ int kvmppc_smt_threads(void); off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem); void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd); int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size); +int kvmppc_reset_htab(int shift_hint); +uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift); #endif /* !CONFIG_USER_ONLY */ const ppc_def_t *kvmppc_host_cpu_def(void); int kvmppc_fixup_cpu(CPUPPCState *env); @@ -94,6 +96,23 @@ static inline int kvmppc_remove_spapr_tce(void *table, int pfd, { return -1; } + +static inline int kvmppc_reset_htab(int shift_hint) +{ + return -1; +} + +static inline uint64_t kvmppc_rma_size(uint64_t current_size, + unsigned int hash_shift) +{ + return ram_size; +} + +static inline int kvmppc_update_sdr1(CPUPPCState *env) +{ + return 0; +} + #endif /* !CONFIG_USER_ONLY */ static inline const ppc_def_t *kvmppc_host_cpu_def(void) -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 5/7] pseries: Clear TCE and signal state when resetting PAPR VIO devices 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson ` (3 preceding siblings ...) 2012-09-10 6:38 ` [Qemu-devel] [PATCH 4/7] pseries: Add support for new KVM hash table control call David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 6/7] pseries: Reset emulated PCI TCE tables on system reset David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 7/7] pseries: Fix XICS reset David Gibson 6 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson When we reset the system, the reset method for VIO bus devices resets the state of their request queue (if present) as it should. However it was not resetting the state of their TCE table (DMA translation) if present. It was also not resetting the state of the per-device signal mask set with H_VIO_SIGNAL. This patch corrects both bugs, and also removes some small code duplication in the reset paths. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr_vio.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c index 7ca4452..752836e 100644 --- a/hw/spapr_vio.c +++ b/hw/spapr_vio.c @@ -324,9 +324,7 @@ static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev) } dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size); - dev->crq.qladdr = 0; - dev->crq.qsize = 0; - dev->crq.qnext = 0; + free_crq(dev); } static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token, @@ -409,9 +407,10 @@ static void spapr_vio_busdev_reset(DeviceState *qdev) VIOsPAPRDevice *dev = DO_UPCAST(VIOsPAPRDevice, qdev, qdev); VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev); - if (dev->crq.qsize) { - free_crq(dev); - } + /* Shut down the request queue and TCEs if necessary */ + spapr_vio_quiesce_one(dev); + + dev->signal_state = 0; if (pc->reset) { pc->reset(dev); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 6/7] pseries: Reset emulated PCI TCE tables on system reset 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson ` (4 preceding siblings ...) 2012-09-10 6:38 ` [Qemu-devel] [PATCH 5/7] pseries: Clear TCE and signal state when resetting PAPR VIO devices David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 7/7] pseries: Fix XICS reset David Gibson 6 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson The emulated PCI host bridge on the pseries machine incorporates an IOMMU (PAPR TCE table). Currently the mappings in this IOMMU are not cleared when we reset the system. This patch fixes this bug. To do this it adds a new reset function to the IOMMU emulation code. The VIO devices already reset their TCE tables, but they do so by destroying and re-creating their DMA context. This doesn't work for the PCI host bridge, because the infrastructure for PCI IOMMUs has already copied/cached the DMA pointer context into the subordinate PCI device structures. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/spapr.h | 1 + hw/spapr_iommu.c | 11 +++++++++++ hw/spapr_pci.c | 10 ++++++++++ 3 files changed, 22 insertions(+) diff --git a/hw/spapr.h b/hw/spapr.h index f1fb646..f9a7b0f 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -338,6 +338,7 @@ typedef struct sPAPRTCE { void spapr_iommu_init(void); DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size); void spapr_tce_free(DMAContext *dma); +void spapr_tce_reset(DMAContext *dma); int spapr_dma_dt(void *fdt, int node_off, const char *propname, uint32_t liobn, uint64_t window, uint32_t size); int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname, diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c index 53b7317..216aa06 100644 --- a/hw/spapr_iommu.c +++ b/hw/spapr_iommu.c @@ -162,6 +162,17 @@ void spapr_tce_free(DMAContext *dma) } } +void spapr_tce_reset(DMAContext *dma) +{ + if (dma) { + sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma); + size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT) + * sizeof(sPAPRTCE); + + memset(tcet->table, 0, table_size); + } +} + static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba, target_ulong tce) { diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c index 661c05b..203155e 100644 --- a/hw/spapr_pci.c +++ b/hw/spapr_pci.c @@ -595,6 +595,15 @@ static int spapr_phb_init(SysBusDevice *s) return 0; } +static void spapr_phb_reset(DeviceState *qdev) +{ + SysBusDevice *s = sysbus_from_qdev(qdev); + sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s); + + /* Reset the IOMMU state */ + spapr_tce_reset(sphb->dma); +} + static Property spapr_phb_properties[] = { DEFINE_PROP_HEX64("buid", sPAPRPHBState, buid, 0), DEFINE_PROP_STRING("busname", sPAPRPHBState, busname), @@ -613,6 +622,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) sdc->init = spapr_phb_init; dc->props = spapr_phb_properties; + dc->reset = spapr_phb_reset; } static const TypeInfo spapr_phb_info = { -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 7/7] pseries: Fix XICS reset 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson ` (5 preceding siblings ...) 2012-09-10 6:38 ` [Qemu-devel] [PATCH 6/7] pseries: Reset emulated PCI TCE tables on system reset David Gibson @ 2012-09-10 6:38 ` David Gibson 2012-09-12 5:58 ` [Qemu-devel] [Qemu-ppc] " David Gibson 6 siblings, 1 reply; 12+ messages in thread From: David Gibson @ 2012-09-10 6:38 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel, David Gibson The XICS interrupt controller used on the pseries machine currently has no reset handler. We can get away with this under some circumstances, but it's not correct, and can cause failures if the XICS happens to be in the wrong state at the time of reset. This patch adds a hook to properly reset the XICS state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/xics.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/hw/xics.c b/hw/xics.c index b674771..8b1ee2d 100644 --- a/hw/xics.c +++ b/hw/xics.c @@ -489,11 +489,32 @@ static void rtas_int_on(sPAPREnvironment *spapr, uint32_t token, rtas_st(rets, 0, 0); /* Success */ } +static void xics_reset(void *opaque) +{ + struct icp_state *icp = (struct icp_state *)opaque; + struct ics_state *ics = icp->ics; + int i; + + for (i = 0; i < icp->nr_servers; i++) { + icp->ss[i].xirr = 0; + icp->ss[i].pending_priority = 0; + icp->ss[i].mfrr = 0xff; + /* Make all outputs are deasserted */ + qemu_set_irq(icp->ss[i].output, 0); + } + + for (i = 0; i < ics->nr_irqs; i++) { + memset(&ics->irqs[i], 0, sizeof(ics->irqs[i])); + + ics->irqs[i].priority = 0xff; + ics->irqs[i].saved_priority = 0xff; + } +} + struct icp_state *xics_system_init(int nr_irqs) { CPUPPCState *env; int max_server_num; - int i; struct icp_state *icp; struct ics_state *ics; @@ -508,10 +529,6 @@ struct icp_state *xics_system_init(int nr_irqs) icp->nr_servers = max_server_num + 1; icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state)); - for (i = 0; i < icp->nr_servers; i++) { - icp->ss[i].mfrr = 0xff; - } - for (env = first_cpu; env != NULL; env = env->next_cpu) { struct icp_server_state *ss = &icp->ss[env->cpu_index]; @@ -539,11 +556,6 @@ struct icp_state *xics_system_init(int nr_irqs) icp->ics = ics; ics->icp = icp; - for (i = 0; i < nr_irqs; i++) { - ics->irqs[i].priority = 0xff; - ics->irqs[i].saved_priority = 0xff; - } - ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs); spapr_register_hypercall(H_CPPR, h_cppr); @@ -556,5 +568,7 @@ struct icp_state *xics_system_init(int nr_irqs) spapr_rtas_register("ibm,int-off", rtas_int_off); spapr_rtas_register("ibm,int-on", rtas_int_on); + qemu_register_reset(xics_reset, icp); + return icp; } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 7/7] pseries: Fix XICS reset 2012-09-10 6:38 ` [Qemu-devel] [PATCH 7/7] pseries: Fix XICS reset David Gibson @ 2012-09-12 5:58 ` David Gibson 0 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-09-12 5:58 UTC (permalink / raw) To: agraf; +Cc: qemu-ppc, qemu-devel On Mon, Sep 10, 2012 at 04:38:51PM +1000, David Gibson wrote: > The XICS interrupt controller used on the pseries machine currently has no > reset handler. We can get away with this under some circumstances, but > it's not correct, and can cause failures if the XICS happens to be in the > wrong state at the time of reset. > > This patch adds a hook to properly reset the XICS state. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Oops, don't apply that one. It has a bug which will break level sensitive interrupts. Here's a fixed version. >From 353b529cf2049347356e3479f5e48dc2597e4fbb Mon Sep 17 00:00:00 2001 From: David Gibson <david@gibson.dropbear.id.au> Date: Wed, 22 Aug 2012 15:20:41 +1000 Subject: [PATCH] pseries: Fix XICS reset The XICS interrupt controller used on the pseries machine currently has no reset handler. We can get away with this under some circumstances, but it's not correct, and can cause failures if the XICS happens to be in the wrong state at the time of reset. This patch adds a hook to properly reset the XICS state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/xics.c | 38 ++++++++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/hw/xics.c b/hw/xics.c index b674771..a8a08ce 100644 --- a/hw/xics.c +++ b/hw/xics.c @@ -489,11 +489,36 @@ static void rtas_int_on(sPAPREnvironment *spapr, uint32_t token, rtas_st(rets, 0, 0); /* Success */ } +static void xics_reset(void *opaque) +{ + struct icp_state *icp = (struct icp_state *)opaque; + struct ics_state *ics = icp->ics; + int i; + + for (i = 0; i < icp->nr_servers; i++) { + icp->ss[i].xirr = 0; + icp->ss[i].pending_priority = 0; + icp->ss[i].mfrr = 0xff; + /* Make all outputs are deasserted */ + qemu_set_irq(icp->ss[i].output, 0); + } + + for (i = 0; i < ics->nr_irqs; i++) { + /* Reset everything *except* the type */ + ics->irqs[i].server = 0; + ics->irqs[i].asserted = 0; + ics->irqs[i].sent = 0; + ics->irqs[i].rejected = 0; + ics->irqs[i].masked_pending = 0; + ics->irqs[i].priority = 0xff; + ics->irqs[i].saved_priority = 0xff; + } +} + struct icp_state *xics_system_init(int nr_irqs) { CPUPPCState *env; int max_server_num; - int i; struct icp_state *icp; struct ics_state *ics; @@ -508,10 +533,6 @@ struct icp_state *xics_system_init(int nr_irqs) icp->nr_servers = max_server_num + 1; icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state)); - for (i = 0; i < icp->nr_servers; i++) { - icp->ss[i].mfrr = 0xff; - } - for (env = first_cpu; env != NULL; env = env->next_cpu) { struct icp_server_state *ss = &icp->ss[env->cpu_index]; @@ -539,11 +560,6 @@ struct icp_state *xics_system_init(int nr_irqs) icp->ics = ics; ics->icp = icp; - for (i = 0; i < nr_irqs; i++) { - ics->irqs[i].priority = 0xff; - ics->irqs[i].saved_priority = 0xff; - } - ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs); spapr_register_hypercall(H_CPPR, h_cppr); @@ -556,5 +572,7 @@ struct icp_state *xics_system_init(int nr_irqs) spapr_rtas_register("ibm,int-off", rtas_int_off); spapr_rtas_register("ibm,int-on", rtas_int_on); + qemu_register_reset(xics_reset, icp); + return icp; } -- 1.7.10.4 -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-09-12 5:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-10 6:38 [Qemu-devel] [0/7] System reset fixes for pseries David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 1/7] ppc: Make kvm_arch_put_registers() put *all* the registers David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 2/7] pseries: Fix and cleanup CPU initialization and reset David Gibson 2012-09-10 13:45 ` Andreas Färber 2012-09-10 23:45 ` [Qemu-devel] [Qemu-ppc] " David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 3/7] pseries: Use new method to correct reset sequence David Gibson 2012-09-10 13:47 ` Andreas Färber 2012-09-10 6:38 ` [Qemu-devel] [PATCH 4/7] pseries: Add support for new KVM hash table control call David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 5/7] pseries: Clear TCE and signal state when resetting PAPR VIO devices David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 6/7] pseries: Reset emulated PCI TCE tables on system reset David Gibson 2012-09-10 6:38 ` [Qemu-devel] [PATCH 7/7] pseries: Fix XICS reset David Gibson 2012-09-12 5:58 ` [Qemu-devel] [Qemu-ppc] " David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).