* [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 8:29 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor Nicholas Piggin
` (8 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
Invalid or missing partition table entry exceptions should cause HV
interrupts. HDSISR is set to bad MMU config, which is consistent with
the ISA and experimentally matches what POWER9 generates.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
target/ppc/mmu-radix64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index d4e16bd7db..df2fec80ce 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -556,13 +556,13 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
} else {
if (!ppc64_v3_get_pate(cpu, lpid, &pate)) {
if (guest_visible) {
- ppc_radix64_raise_si(cpu, access_type, eaddr, DSISR_NOPTE);
+ ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
}
return false;
}
if (!validate_pate(cpu, lpid, &pate)) {
if (guest_visible) {
- ppc_radix64_raise_si(cpu, access_type, eaddr, DSISR_R_BADCONFIG);
+ ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
}
return false;
}
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems
2022-02-15 3:16 ` [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems Nicholas Piggin
@ 2022-02-15 8:29 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 8:29 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> Invalid or missing partition table entry exceptions should cause HV
> interrupts. HDSISR is set to bad MMU config, which is consistent with
> the ISA and experimentally matches what POWER9 generates.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Adding the previous R-b for patchwork:
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
C.
> ---
> target/ppc/mmu-radix64.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index d4e16bd7db..df2fec80ce 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -556,13 +556,13 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
> } else {
> if (!ppc64_v3_get_pate(cpu, lpid, &pate)) {
> if (guest_visible) {
> - ppc_radix64_raise_si(cpu, access_type, eaddr, DSISR_NOPTE);
> + ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
> }
> return false;
> }
> if (!validate_pate(cpu, lpid, &pate)) {
> if (guest_visible) {
> - ppc_radix64_raise_si(cpu, access_type, eaddr, DSISR_R_BADCONFIG);
> + ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
> }
> return false;
> }
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
2022-02-15 3:16 ` [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 18:34 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed Nicholas Piggin
` (7 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
The spapr virtual hypervisor does not require the hdecr timer.
Remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/ppc.c | 2 +-
hw/ppc/spapr_cpu_core.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index ba7fa0f3b5..c6dfc5975f 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1072,7 +1072,7 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
}
/* Create new timer */
tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);
- if (env->has_hv_mode) {
+ if (env->has_hv_mode && !cpu->vhyp) {
tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_hdecr_cb,
cpu);
} else {
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index a781e97f8d..ed84713960 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -261,12 +261,12 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
return false;
}
- /* Set time-base frequency to 512 MHz */
- cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
-
cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
kvmppc_set_papr(cpu);
+ /* Set time-base frequency to 512 MHz. vhyp must be set first. */
+ cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
+
if (spapr_irq_cpu_intc_create(spapr, cpu, errp) < 0) {
qdev_unrealize(DEVICE(cpu));
return false;
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor
2022-02-15 3:16 ` [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor Nicholas Piggin
@ 2022-02-15 18:34 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 18:34 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> The spapr virtual hypervisor does not require the hdecr timer.
> Remove it.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> ---
> hw/ppc/ppc.c | 2 +-
> hw/ppc/spapr_cpu_core.c | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index ba7fa0f3b5..c6dfc5975f 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -1072,7 +1072,7 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
> }
> /* Create new timer */
> tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);
> - if (env->has_hv_mode) {
> + if (env->has_hv_mode && !cpu->vhyp) {
> tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_hdecr_cb,
> cpu);
> } else {
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index a781e97f8d..ed84713960 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -261,12 +261,12 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> return false;
> }
>
> - /* Set time-base frequency to 512 MHz */
> - cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
> -
> cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
> kvmppc_set_papr(cpu);
>
> + /* Set time-base frequency to 512 MHz. vhyp must be set first. */
> + cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
> +
> if (spapr_irq_cpu_intc_create(spapr, cpu, errp) < 0) {
> qdev_unrealize(DEVICE(cpu));
> return false;
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
2022-02-15 3:16 ` [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems Nicholas Piggin
2022-02-15 3:16 ` [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 18:36 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU Nicholas Piggin
` (6 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
Machines which don't emulate the HDEC facility are able to use the
timer for something else. Provide functions to start and stop the
hdecr timer.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/ppc.c | 20 ++++++++++++++++++++
include/hw/ppc/ppc.h | 3 +++
2 files changed, 23 insertions(+)
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index c6dfc5975f..4bfd413c7f 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1083,6 +1083,26 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
return &cpu_ppc_set_tb_clk;
}
+void cpu_ppc_hdecr_init (CPUPPCState *env)
+{
+ PowerPCCPU *cpu = env_archcpu(env);
+
+ assert(env->tb_env->hdecr_timer == NULL);
+
+ env->tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_hdecr_cb,
+ cpu);
+}
+
+void cpu_ppc_hdecr_exit (CPUPPCState *env)
+{
+ PowerPCCPU *cpu = env_archcpu(env);
+
+ timer_free(env->tb_env->hdecr_timer);
+ env->tb_env->hdecr_timer = NULL;
+
+ cpu_ppc_hdecr_lower(cpu);
+}
+
/*****************************************************************************/
/* PowerPC 40x timers */
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 93e614cffd..fcf9e495a0 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -54,6 +54,9 @@ struct ppc_tb_t {
uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
+void cpu_ppc_hdecr_init (CPUPPCState *env);
+void cpu_ppc_hdecr_exit (CPUPPCState *env);
+
/* Embedded PowerPC DCR management */
typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn);
typedef void (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val);
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed
2022-02-15 3:16 ` [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed Nicholas Piggin
@ 2022-02-15 18:36 ` Cédric Le Goater
2022-02-16 0:36 ` Nicholas Piggin
0 siblings, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 18:36 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> Machines which don't emulate the HDEC facility are able to use the
> timer for something else. Provide functions to start and stop the
> hdecr timer.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> hw/ppc/ppc.c | 20 ++++++++++++++++++++
> include/hw/ppc/ppc.h | 3 +++
> 2 files changed, 23 insertions(+)
>
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index c6dfc5975f..4bfd413c7f 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -1083,6 +1083,26 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
> return &cpu_ppc_set_tb_clk;
> }
>
> +void cpu_ppc_hdecr_init (CPUPPCState *env)
checkpatch will complain ^
> +{
> + PowerPCCPU *cpu = env_archcpu(env);
> +
> + assert(env->tb_env->hdecr_timer == NULL);
> +
> + env->tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_hdecr_cb,
> + cpu);
> +}
> +
I am not convinced. Can't we start and stop the hdecr on demand ?
Thanks,
C.
> +void cpu_ppc_hdecr_exit (CPUPPCState *env)
> +{
> + PowerPCCPU *cpu = env_archcpu(env);
> +
> + timer_free(env->tb_env->hdecr_timer);
> + env->tb_env->hdecr_timer = NULL;
> +
> + cpu_ppc_hdecr_lower(cpu);
> +}
> +
> /*****************************************************************************/
> /* PowerPC 40x timers */
>
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index 93e614cffd..fcf9e495a0 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -54,6 +54,9 @@ struct ppc_tb_t {
>
> uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
> clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
> +void cpu_ppc_hdecr_init (CPUPPCState *env);
> +void cpu_ppc_hdecr_exit (CPUPPCState *env);
> +
> /* Embedded PowerPC DCR management */
> typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn);
> typedef void (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val);
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed
2022-02-15 18:36 ` Cédric Le Goater
@ 2022-02-16 0:36 ` Nicholas Piggin
0 siblings, 0 replies; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-16 0:36 UTC (permalink / raw)
To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
Excerpts from Cédric Le Goater's message of February 16, 2022 4:36 am:
> On 2/15/22 04:16, Nicholas Piggin wrote:
>> Machines which don't emulate the HDEC facility are able to use the
>> timer for something else. Provide functions to start and stop the
>> hdecr timer.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> hw/ppc/ppc.c | 20 ++++++++++++++++++++
>> include/hw/ppc/ppc.h | 3 +++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>> index c6dfc5975f..4bfd413c7f 100644
>> --- a/hw/ppc/ppc.c
>> +++ b/hw/ppc/ppc.c
>> @@ -1083,6 +1083,26 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
>> return &cpu_ppc_set_tb_clk;
>> }
>>
>> +void cpu_ppc_hdecr_init (CPUPPCState *env)
>
> checkpatch will complain ^
It did but I thouht I would keep to existing style. I'll change it.
>
>> +{
>> + PowerPCCPU *cpu = env_archcpu(env);
>> +
>> + assert(env->tb_env->hdecr_timer == NULL);
>> +
>> + env->tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_hdecr_cb,
>> + cpu);
>> +}
>> +
>
> I am not convinced. Can't we start and stop the hdecr on demand ?
timer_mod() for existing hdecr_timer when we do ppc_store_hdecr, but
that shouldn't be used elsewhere in pseries except for nested HV.
Thanks,
Nick
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (2 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 10:01 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success Nicholas Piggin
` (5 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
The radix on vhyp MMU uses a single-level radix table walk, with the
partition scope mapping provided by the flat QEMU machine memory.
A subsequent change will use the two-level radix walk on vhyp in some
situations, so provide a helper which can abstract that logic.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
target/ppc/mmu-radix64.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index df2fec80ce..5535f0fe20 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -354,6 +354,17 @@ static int ppc_radix64_partition_scoped_xlate(PowerPCCPU *cpu,
return 0;
}
+/*
+ * The spapr vhc has a flat partition scope provided by qemu memory.
+ */
+static bool vhyp_flat_addressing(PowerPCCPU *cpu)
+{
+ if (cpu->vhyp) {
+ return true;
+ }
+ return false;
+}
+
static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
MMUAccessType access_type,
vaddr eaddr, uint64_t pid,
@@ -385,7 +396,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
}
prtbe_addr = (pate.dw1 & PATE1_R_PRTB) + offset;
- if (cpu->vhyp) {
+ if (vhyp_flat_addressing(cpu)) {
prtbe0 = ldq_phys(cs->as, prtbe_addr);
} else {
/*
@@ -411,7 +422,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
*g_page_size = PRTBE_R_GET_RTS(prtbe0);
base_addr = prtbe0 & PRTBE_R_RPDB;
nls = prtbe0 & PRTBE_R_RPDS;
- if (msr_hv || cpu->vhyp) {
+ if (msr_hv || vhyp_flat_addressing(cpu)) {
/*
* Can treat process table addresses as real addresses
*/
@@ -515,7 +526,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
relocation = !mmuidx_real(mmu_idx);
/* HV or virtual hypervisor Real Mode Access */
- if (!relocation && (mmuidx_hv(mmu_idx) || cpu->vhyp)) {
+ if (!relocation && (mmuidx_hv(mmu_idx) || vhyp_flat_addressing(cpu))) {
/* In real mode top 4 effective addr bits (mostly) ignored */
*raddr = eaddr & 0x0FFFFFFFFFFFFFFFULL;
@@ -592,7 +603,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
g_raddr = eaddr & R_EADDR_MASK;
}
- if (cpu->vhyp) {
+ if (vhyp_flat_addressing(cpu)) {
*raddr = g_raddr;
} else {
/*
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU
2022-02-15 3:16 ` [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU Nicholas Piggin
@ 2022-02-15 10:01 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 10:01 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> The radix on vhyp MMU uses a single-level radix table walk, with the
> partition scope mapping provided by the flat QEMU machine memory.
>
> A subsequent change will use the two-level radix walk on vhyp in some
> situations, so provide a helper which can abstract that logic.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> ---
> target/ppc/mmu-radix64.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index df2fec80ce..5535f0fe20 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -354,6 +354,17 @@ static int ppc_radix64_partition_scoped_xlate(PowerPCCPU *cpu,
> return 0;
> }
>
> +/*
> + * The spapr vhc has a flat partition scope provided by qemu memory.
> + */
> +static bool vhyp_flat_addressing(PowerPCCPU *cpu)
> +{
> + if (cpu->vhyp) {
> + return true;
> + }
> + return false;
> +}
> +
> static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
> MMUAccessType access_type,
> vaddr eaddr, uint64_t pid,
> @@ -385,7 +396,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
> }
> prtbe_addr = (pate.dw1 & PATE1_R_PRTB) + offset;
>
> - if (cpu->vhyp) {
> + if (vhyp_flat_addressing(cpu)) {
> prtbe0 = ldq_phys(cs->as, prtbe_addr);
> } else {
> /*
> @@ -411,7 +422,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
> *g_page_size = PRTBE_R_GET_RTS(prtbe0);
> base_addr = prtbe0 & PRTBE_R_RPDB;
> nls = prtbe0 & PRTBE_R_RPDS;
> - if (msr_hv || cpu->vhyp) {
> + if (msr_hv || vhyp_flat_addressing(cpu)) {
> /*
> * Can treat process table addresses as real addresses
> */
> @@ -515,7 +526,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
> relocation = !mmuidx_real(mmu_idx);
>
> /* HV or virtual hypervisor Real Mode Access */
> - if (!relocation && (mmuidx_hv(mmu_idx) || cpu->vhyp)) {
> + if (!relocation && (mmuidx_hv(mmu_idx) || vhyp_flat_addressing(cpu))) {
> /* In real mode top 4 effective addr bits (mostly) ignored */
> *raddr = eaddr & 0x0FFFFFFFFFFFFFFFULL;
>
> @@ -592,7 +603,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
> g_raddr = eaddr & R_EADDR_MASK;
> }
>
> - if (cpu->vhyp) {
> + if (vhyp_flat_addressing(cpu)) {
> *raddr = g_raddr;
> } else {
> /*
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (3 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 10:03 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler Nicholas Piggin
` (4 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.
The spapr implementation currently just asserts lpid is always 0
and always return success.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/spapr.c | 7 ++++++-
target/ppc/cpu.h | 3 ++-
target/ppc/mmu-radix64.c | 7 ++++++-
3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 15a02d3e78..1892a29e2d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1309,13 +1309,18 @@ void spapr_set_all_lpcrs(target_ulong value, target_ulong mask)
}
}
-static void spapr_get_pate(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry)
+static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
+ target_ulong lpid, ppc_v3_pate_t *entry)
{
SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
+ assert(lpid == 0);
+
/* Copy PATE1:GR into PATE0:HR */
entry->dw0 = spapr->patb_entry & PATE0_HR;
entry->dw1 = spapr->patb_entry;
+
+ return true;
}
#define HPTE(_table, _i) (void *)(((uint64_t *)(_table)) + ((_i) * 2))
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 555c6b9245..c79ae74f10 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1320,7 +1320,8 @@ struct PPCVirtualHypervisorClass {
hwaddr ptex, int n);
void (*hpte_set_c)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
void (*hpte_set_r)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
- void (*get_pate)(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry);
+ bool (*get_pate)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
+ target_ulong lpid, ppc_v3_pate_t *entry);
target_ulong (*encode_hpt_for_kvm_pr)(PPCVirtualHypervisor *vhyp);
void (*cpu_exec_enter)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
void (*cpu_exec_exit)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 5535f0fe20..3b6d75a292 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -563,7 +563,12 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
if (cpu->vhyp) {
PPCVirtualHypervisorClass *vhc;
vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
- vhc->get_pate(cpu->vhyp, &pate);
+ if (!vhc->get_pate(cpu->vhyp, cpu, lpid, &pate)) {
+ if (guest_visible) {
+ ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
+ }
+ return false;
+ }
} else {
if (!ppc64_v3_get_pate(cpu, lpid, &pate)) {
if (guest_visible) {
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success
2022-02-15 3:16 ` [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success Nicholas Piggin
@ 2022-02-15 10:03 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 10:03 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> In prepartion for implementing a full partition table option for
> vhyp, update the get_pate method to take an lpid and return a
> success/fail indicator.
>
> The spapr implementation currently just asserts lpid is always 0
> and always return success.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> ---
> hw/ppc/spapr.c | 7 ++++++-
> target/ppc/cpu.h | 3 ++-
> target/ppc/mmu-radix64.c | 7 ++++++-
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 15a02d3e78..1892a29e2d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1309,13 +1309,18 @@ void spapr_set_all_lpcrs(target_ulong value, target_ulong mask)
> }
> }
>
> -static void spapr_get_pate(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry)
> +static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
> + target_ulong lpid, ppc_v3_pate_t *entry)
> {
> SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>
> + assert(lpid == 0);
> +
> /* Copy PATE1:GR into PATE0:HR */
> entry->dw0 = spapr->patb_entry & PATE0_HR;
> entry->dw1 = spapr->patb_entry;
> +
> + return true;
> }
>
> #define HPTE(_table, _i) (void *)(((uint64_t *)(_table)) + ((_i) * 2))
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 555c6b9245..c79ae74f10 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1320,7 +1320,8 @@ struct PPCVirtualHypervisorClass {
> hwaddr ptex, int n);
> void (*hpte_set_c)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
> void (*hpte_set_r)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
> - void (*get_pate)(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry);
> + bool (*get_pate)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
> + target_ulong lpid, ppc_v3_pate_t *entry);
> target_ulong (*encode_hpt_for_kvm_pr)(PPCVirtualHypervisor *vhyp);
> void (*cpu_exec_enter)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
> void (*cpu_exec_exit)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index 5535f0fe20..3b6d75a292 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -563,7 +563,12 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr eaddr,
> if (cpu->vhyp) {
> PPCVirtualHypervisorClass *vhc;
> vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> - vhc->get_pate(cpu->vhyp, &pate);
> + if (!vhc->get_pate(cpu->vhyp, cpu, lpid, &pate)) {
> + if (guest_visible) {
> + ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, DSISR_R_BADCONFIG);
> + }
> + return false;
> + }
> } else {
> if (!ppc64_v3_get_pate(cpu, lpid, &pate)) {
> if (guest_visible) {
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (4 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 10:04 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper Nicholas Piggin
` (3 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
The virtual hypervisor currently always intercepts and handles
hypercalls but with a future change this will not always be the case.
Add a helper for the test so the logic is abstracted from the mechanism.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
target/ppc/excp_helper.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index fcc83a7701..6b6ec71bc2 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1278,6 +1278,18 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
}
#ifdef TARGET_PPC64
+/*
+ * When running under vhyp, hcalls are always intercepted and sent to the
+ * vhc->hypercall handler.
+ */
+static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
+{
+ if (cpu->vhyp) {
+ return true;
+ }
+ return false;
+}
+
static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
{
CPUState *cs = CPU(cpu);
@@ -1439,7 +1451,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
env->nip += 4;
/* "PAPR mode" built-in hypercall emulation */
- if ((lev == 1) && cpu->vhyp) {
+ if ((lev == 1) && books_vhyp_handles_hcall(cpu)) {
PPCVirtualHypervisorClass *vhc =
PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
vhc->hypercall(cpu->vhyp, cpu);
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler
2022-02-15 3:16 ` [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler Nicholas Piggin
@ 2022-02-15 10:04 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 10:04 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> The virtual hypervisor currently always intercepts and handles
> hypercalls but with a future change this will not always be the case.
>
> Add a helper for the test so the logic is abstracted from the mechanism.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> ---
> target/ppc/excp_helper.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index fcc83a7701..6b6ec71bc2 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -1278,6 +1278,18 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
> }
>
> #ifdef TARGET_PPC64
> +/*
> + * When running under vhyp, hcalls are always intercepted and sent to the
> + * vhc->hypercall handler.
> + */
> +static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
> +{
> + if (cpu->vhyp) {
> + return true;
> + }
> + return false;
> +}
> +
> static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> {
> CPUState *cs = CPU(cpu);
> @@ -1439,7 +1451,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> env->nip += 4;
>
> /* "PAPR mode" built-in hypercall emulation */
> - if ((lev == 1) && cpu->vhyp) {
> + if ((lev == 1) && books_vhyp_handles_hcall(cpu)) {
> PPCVirtualHypervisorClass *vhc =
> PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> vhc->hypercall(cpu->vhyp, cpu);
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (5 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 10:04 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support Nicholas Piggin
` (2 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
This moves the logic to reset the QEMU exception state into its own
function.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
target/ppc/excp_helper.c | 41 ++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 6b6ec71bc2..778eb4f3b0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -360,12 +360,20 @@ static void ppc_excp_apply_ail(PowerPCCPU *cpu, int excp, target_ulong msr,
}
#endif
-static void powerpc_set_excp_state(PowerPCCPU *cpu,
- target_ulong vector, target_ulong msr)
+static void powerpc_reset_excp_state(PowerPCCPU *cpu)
{
CPUState *cs = CPU(cpu);
CPUPPCState *env = &cpu->env;
+ /* Reset exception state */
+ cs->exception_index = POWERPC_EXCP_NONE;
+ env->error_code = 0;
+}
+
+static void powerpc_set_excp_state(PowerPCCPU *cpu, target_ulong vector, target_ulong msr)
+{
+ CPUPPCState *env = &cpu->env;
+
assert((msr & env->msr_mask) == msr);
/*
@@ -376,21 +384,20 @@ static void powerpc_set_excp_state(PowerPCCPU *cpu,
* will prevent setting of the HV bit which some exceptions might need
* to do.
*/
+ env->nip = vector;
env->msr = msr;
hreg_compute_hflags(env);
- env->nip = vector;
- /* Reset exception state */
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
- /* Reset the reservation */
- env->reserve_addr = -1;
+ powerpc_reset_excp_state(cpu);
/*
* Any interrupt is context synchronizing, check if TCG TLB needs
* a delayed flush on ppc64
*/
check_tlb_flush(env, false);
+
+ /* Reset the reservation */
+ env->reserve_addr = -1;
}
static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
@@ -471,8 +478,7 @@ static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
env->spr[SPR_40x_ESR] = ESR_FP;
@@ -609,8 +615,7 @@ static void powerpc_excp_6xx(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
@@ -783,8 +788,7 @@ static void powerpc_excp_7xx(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
@@ -969,8 +973,7 @@ static void powerpc_excp_74xx(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
@@ -1168,8 +1171,7 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
@@ -1406,8 +1408,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
case POWERPC_EXCP_FP:
if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
trace_ppc_excp_fp_ignore();
- cs->exception_index = POWERPC_EXCP_NONE;
- env->error_code = 0;
+ powerpc_reset_excp_state(cpu);
return;
}
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper
2022-02-15 3:16 ` [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper Nicholas Piggin
@ 2022-02-15 10:04 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 10:04 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> This moves the logic to reset the QEMU exception state into its own
> function.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> ---
> target/ppc/excp_helper.c | 41 ++++++++++++++++++++--------------------
> 1 file changed, 21 insertions(+), 20 deletions(-)
>
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 6b6ec71bc2..778eb4f3b0 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -360,12 +360,20 @@ static void ppc_excp_apply_ail(PowerPCCPU *cpu, int excp, target_ulong msr,
> }
> #endif
>
> -static void powerpc_set_excp_state(PowerPCCPU *cpu,
> - target_ulong vector, target_ulong msr)
> +static void powerpc_reset_excp_state(PowerPCCPU *cpu)
> {
> CPUState *cs = CPU(cpu);
> CPUPPCState *env = &cpu->env;
>
> + /* Reset exception state */
> + cs->exception_index = POWERPC_EXCP_NONE;
> + env->error_code = 0;
> +}
> +
> +static void powerpc_set_excp_state(PowerPCCPU *cpu, target_ulong vector, target_ulong msr)
> +{
> + CPUPPCState *env = &cpu->env;
> +
> assert((msr & env->msr_mask) == msr);
>
> /*
> @@ -376,21 +384,20 @@ static void powerpc_set_excp_state(PowerPCCPU *cpu,
> * will prevent setting of the HV bit which some exceptions might need
> * to do.
> */
> + env->nip = vector;
> env->msr = msr;
> hreg_compute_hflags(env);
> - env->nip = vector;
> - /* Reset exception state */
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
>
> - /* Reset the reservation */
> - env->reserve_addr = -1;
> + powerpc_reset_excp_state(cpu);
>
> /*
> * Any interrupt is context synchronizing, check if TCG TLB needs
> * a delayed flush on ppc64
> */
> check_tlb_flush(env, false);
> +
> + /* Reset the reservation */
> + env->reserve_addr = -1;
> }
>
> static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
> @@ -471,8 +478,7 @@ static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
> env->spr[SPR_40x_ESR] = ESR_FP;
> @@ -609,8 +615,7 @@ static void powerpc_excp_6xx(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
>
> @@ -783,8 +788,7 @@ static void powerpc_excp_7xx(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
>
> @@ -969,8 +973,7 @@ static void powerpc_excp_74xx(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
>
> @@ -1168,8 +1171,7 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
>
> @@ -1406,8 +1408,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> case POWERPC_EXCP_FP:
> if ((msr_fe0 == 0 && msr_fe1 == 0) || msr_fp == 0) {
> trace_ppc_excp_fp_ignore();
> - cs->exception_index = POWERPC_EXCP_NONE;
> - env->error_code = 0;
> + powerpc_reset_excp_state(cpu);
> return;
> }
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (6 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 15:59 ` Fabiano Rosas
2022-02-15 17:28 ` Cédric Le Goater
2022-02-15 3:16 ` [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor Nicholas Piggin
2022-02-15 18:33 ` [PATCH 0/9] ppc: nested KVM HV for spapr " Cédric Le Goater
9 siblings, 2 replies; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.
Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/pegasos2.c | 6 ++++
hw/ppc/spapr.c | 6 ++++
target/ppc/cpu.h | 2 ++
target/ppc/excp_helper.c | 76 ++++++++++++++++++++++++++++++++++------
target/ppc/mmu-radix64.c | 15 ++++++--
5 files changed, 92 insertions(+), 13 deletions(-)
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 298e6b93e2..d45008ac71 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -449,6 +449,11 @@ static target_ulong pegasos2_rtas(PowerPCCPU *cpu, Pegasos2MachineState *pm,
}
}
+static bool pegasos2_cpu_in_nested(PowerPCCPU *cpu)
+{
+ return false;
+}
+
static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
{
Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
@@ -504,6 +509,7 @@ static void pegasos2_machine_class_init(ObjectClass *oc, void *data)
mc->default_ram_id = "pegasos2.ram";
mc->default_ram_size = 512 * MiB;
+ vhc->cpu_in_nested = pegasos2_cpu_in_nested;
vhc->hypercall = pegasos2_hypercall;
vhc->cpu_exec_enter = vhyp_nop;
vhc->cpu_exec_exit = vhyp_nop;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1892a29e2d..3a5cf92c94 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4470,6 +4470,11 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
return NULL;
}
+static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
+{
+ return false;
+}
+
static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
{
SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
@@ -4578,6 +4583,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
fwc->get_dev_path = spapr_get_fw_dev_path;
nc->nmi_monitor_handler = spapr_nmi;
smc->phb_placement = spapr_phb_placement;
+ vhc->cpu_in_nested = spapr_cpu_in_nested;
vhc->hypercall = emulate_spapr_hypercall;
vhc->hpt_mask = spapr_hpt_mask;
vhc->map_hptes = spapr_map_hptes;
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c79ae74f10..d8cc956c97 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1311,6 +1311,8 @@ PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc);
#ifndef CONFIG_USER_ONLY
struct PPCVirtualHypervisorClass {
InterfaceClass parent;
+ bool (*cpu_in_nested)(PowerPCCPU *cpu);
+ void (*deliver_hv_excp)(PowerPCCPU *cpu, int excp);
void (*hypercall)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
hwaddr (*hpt_mask)(PPCVirtualHypervisor *vhyp);
const ppc_hash_pte64_t *(*map_hptes)(PPCVirtualHypervisor *vhyp,
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 778eb4f3b0..ecff7654cb 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1279,6 +1279,22 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
powerpc_set_excp_state(cpu, vector, new_msr);
}
+/*
+ * When running a nested HV guest under vhyp, external interrupts are
+ * delivered as HVIRT.
+ */
+static bool vhyp_promotes_external_to_hvirt(PowerPCCPU *cpu)
+{
+ if (cpu->vhyp) {
+ PPCVirtualHypervisorClass *vhc;
+ vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+ if (vhc->cpu_in_nested(cpu)) {
+ return true;
+ }
+ }
+ return false;
+}
+
#ifdef TARGET_PPC64
/*
* When running under vhyp, hcalls are always intercepted and sent to the
@@ -1287,7 +1303,29 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
{
if (cpu->vhyp) {
- return true;
+ PPCVirtualHypervisorClass *vhc;
+ vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+ if (!vhc->cpu_in_nested(cpu)) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/*
+ * When running a nested KVM HV guest under vhyp, HV exceptions are not
+ * delivered to the guest (because there is no concept of HV support), but
+ * rather they are sent tothe vhyp to exit from the L2 back to the L1 and
+ * return from the H_ENTER_NESTED hypercall.
+ */
+static bool books_vhyp_handles_hv_excp(PowerPCCPU *cpu)
+{
+ if (cpu->vhyp) {
+ PPCVirtualHypervisorClass *vhc;
+ vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+ if (vhc->cpu_in_nested(cpu)) {
+ return true;
+ }
}
return false;
}
@@ -1540,12 +1578,6 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
break;
}
- /* Sanity check */
- if (!(env->msr_mask & MSR_HVB) && srr0 == SPR_HSRR0) {
- cpu_abort(cs, "Trying to deliver HV exception (HSRR) %d with "
- "no HV support\n", excp);
- }
-
/*
* Sort out endianness of interrupt, this differs depending on the
* CPU, the HV mode, etc...
@@ -1564,10 +1596,26 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
env->spr[srr1] = msr;
}
- /* This can update new_msr and vector if AIL applies */
- ppc_excp_apply_ail(cpu, excp, msr, &new_msr, &vector);
+ if ((new_msr & MSR_HVB) && books_vhyp_handles_hv_excp(cpu)) {
+ PPCVirtualHypervisorClass *vhc =
+ PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+ /* Deliver interrupt to L1 by returning from the H_ENTER_NESTED call */
+ vhc->deliver_hv_excp(cpu, excp);
- powerpc_set_excp_state(cpu, vector, new_msr);
+ powerpc_reset_excp_state(cpu);
+
+ } else {
+ /* Sanity check */
+ if (!(env->msr_mask & MSR_HVB) && srr0 == SPR_HSRR0) {
+ cpu_abort(cs, "Trying to deliver HV exception (HSRR) %d with "
+ "no HV support\n", excp);
+ }
+
+ /* This can update new_msr and vector if AIL applies */
+ ppc_excp_apply_ail(cpu, excp, msr, &new_msr, &vector);
+
+ powerpc_set_excp_state(cpu, vector, new_msr);
+ }
}
#else
static inline void powerpc_excp_books(PowerPCCPU *cpu, int excp)
@@ -1687,7 +1735,11 @@ static void ppc_hw_interrupt(CPUPPCState *env)
/* HEIC blocks delivery to the hypervisor */
if ((async_deliver && !(heic && msr_hv && !msr_pr)) ||
(env->has_hv_mode && msr_hv == 0 && !lpes0)) {
- powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
+ if (vhyp_promotes_external_to_hvirt(cpu)) {
+ powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+ } else {
+ powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
+ }
return;
}
}
@@ -1797,6 +1849,8 @@ void ppc_cpu_do_fwnmi_machine_check(CPUState *cs, target_ulong vector)
msr |= (1ULL << MSR_LE);
}
+ /* Anything for nested required here? MSR[HV] bit? */
+
powerpc_set_excp_state(cpu, vector, msr);
}
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 3b6d75a292..9b7a6a7f11 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -355,12 +355,23 @@ static int ppc_radix64_partition_scoped_xlate(PowerPCCPU *cpu,
}
/*
- * The spapr vhc has a flat partition scope provided by qemu memory.
+ * The spapr vhc has a flat partition scope provided by qemu memory when
+ * not nested.
+ *
+ * When running a nested guest, the addressing is 2-level radix on top of the
+ * vhc memory, so it works practically identically to the bare metal 2-level
+ * radix. So that code is selected directly. A cleaner and more flexible nested
+ * hypervisor implementation would allow the vhc to provide a ->nested_xlate()
+ * function but that is not required for the moment.
*/
static bool vhyp_flat_addressing(PowerPCCPU *cpu)
{
if (cpu->vhyp) {
- return true;
+ PPCVirtualHypervisorClass *vhc;
+ vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+ if (!vhc->cpu_in_nested(cpu)) {
+ return true;
+ }
}
return false;
}
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support
2022-02-15 3:16 ` [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support Nicholas Piggin
@ 2022-02-15 15:59 ` Fabiano Rosas
2022-02-15 17:28 ` Cédric Le Goater
1 sibling, 0 replies; 30+ messages in thread
From: Fabiano Rosas @ 2022-02-15 15:59 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc
Cc: qemu-devel, Nicholas Piggin, Cédric Le Goater
Nicholas Piggin <npiggin@gmail.com> writes:
> Introduce virtual hypervisor methods that can support a "Nested KVM HV"
> implementation using the bare metal 2-level radix MMU, and using HV
> exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
>
> HV exceptions can now be raised in the TCG spapr machine when running a
> nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
> hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
>
> HV exceptions are intercepted in the exception handler code and instead
> of causing interrupts in the guest and switching the machine to HV mode,
> they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
> interrupt vector numer as return value as required by the hcall API.
>
> Address translation is provided by the 2-level page table walker that is
> implemented for the bare metal radix MMU. The partition scope page table
> is pointed to the L1's partition scope by the get_pate vhc method.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support
2022-02-15 3:16 ` [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support Nicholas Piggin
2022-02-15 15:59 ` Fabiano Rosas
@ 2022-02-15 17:28 ` Cédric Le Goater
2022-02-15 19:19 ` BALATON Zoltan
1 sibling, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 17:28 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> Introduce virtual hypervisor methods that can support a "Nested KVM HV"
> implementation using the bare metal 2-level radix MMU, and using HV
> exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
>
> HV exceptions can now be raised in the TCG spapr machine when running a
> nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
> hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
>
> HV exceptions are intercepted in the exception handler code and instead
> of causing interrupts in the guest and switching the machine to HV mode,
> they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
> interrupt vector numer as return value as required by the hcall API.
>
> Address translation is provided by the 2-level page table walker that is
> implemented for the bare metal radix MMU. The partition scope page table
> is pointed to the L1's partition scope by the get_pate vhc method.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> hw/ppc/pegasos2.c | 6 ++++
> hw/ppc/spapr.c | 6 ++++
> target/ppc/cpu.h | 2 ++
> target/ppc/excp_helper.c | 76 ++++++++++++++++++++++++++++++++++------
> target/ppc/mmu-radix64.c | 15 ++++++--
> 5 files changed, 92 insertions(+), 13 deletions(-)
>
> diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
> index 298e6b93e2..d45008ac71 100644
> --- a/hw/ppc/pegasos2.c
> +++ b/hw/ppc/pegasos2.c
> @@ -449,6 +449,11 @@ static target_ulong pegasos2_rtas(PowerPCCPU *cpu, Pegasos2MachineState *pm,
> }
> }
>
> +static bool pegasos2_cpu_in_nested(PowerPCCPU *cpu)
> +{
> + return false;
> +}
> +
> static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
> {
> Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
> @@ -504,6 +509,7 @@ static void pegasos2_machine_class_init(ObjectClass *oc, void *data)
> mc->default_ram_id = "pegasos2.ram";
> mc->default_ram_size = 512 * MiB;
>
> + vhc->cpu_in_nested = pegasos2_cpu_in_nested;
> vhc->hypercall = pegasos2_hypercall;
> vhc->cpu_exec_enter = vhyp_nop;
> vhc->cpu_exec_exit = vhyp_nop;
I don't think you need to worry about the pegasos2 machine as it only
implements a few of the PPCVirtualHypervisorClass handlers and it can
not run any of these virtualization features. I would drop this part.
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1892a29e2d..3a5cf92c94 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4470,6 +4470,11 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
> return NULL;
> }
>
> +static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
> +{
> + return false;
> +}
> +
> static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
> {
> SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> @@ -4578,6 +4583,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> fwc->get_dev_path = spapr_get_fw_dev_path;
> nc->nmi_monitor_handler = spapr_nmi;
> smc->phb_placement = spapr_phb_placement;
> + vhc->cpu_in_nested = spapr_cpu_in_nested;
> vhc->hypercall = emulate_spapr_hypercall;
> vhc->hpt_mask = spapr_hpt_mask;
> vhc->map_hptes = spapr_map_hptes;
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c79ae74f10..d8cc956c97 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1311,6 +1311,8 @@ PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc);
> #ifndef CONFIG_USER_ONLY
> struct PPCVirtualHypervisorClass {
> InterfaceClass parent;
> + bool (*cpu_in_nested)(PowerPCCPU *cpu);
> + void (*deliver_hv_excp)(PowerPCCPU *cpu, int excp);
> void (*hypercall)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
> hwaddr (*hpt_mask)(PPCVirtualHypervisor *vhyp);
> const ppc_hash_pte64_t *(*map_hptes)(PPCVirtualHypervisor *vhyp,
>
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 778eb4f3b0..ecff7654cb 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -1279,6 +1279,22 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
> powerpc_set_excp_state(cpu, vector, new_msr);
> }
a helper such as :
static bool inline books_vhyp_cpu_in_nested(PowerPCCPU *cpu)
{
return PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp)->cpu_in_nested(cpu);
}
would help to reduce the routines below.
> +/*
> + * When running a nested HV guest under vhyp, external interrupts are
> + * delivered as HVIRT.
> + */
> +static bool vhyp_promotes_external_to_hvirt(PowerPCCPU *cpu)
You seem to have choosen the 'books_vhyp_' prefix. I am not sure of the
naming yet but, for now, it's the better to keep it consistent.
> +{
> + if (cpu->vhyp) {
> + PPCVirtualHypervisorClass *vhc;
> + vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + if (vhc->cpu_in_nested(cpu)) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> #ifdef TARGET_PPC64
> /*
> * When running under vhyp, hcalls are always intercepted and sent to the
> @@ -1287,7 +1303,29 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
> static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
> {
> if (cpu->vhyp) {
> - return true;
> + PPCVirtualHypervisorClass *vhc;
> + vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + if (!vhc->cpu_in_nested(cpu)) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +/*
> + * When running a nested KVM HV guest under vhyp, HV exceptions are not
> + * delivered to the guest (because there is no concept of HV support), but
> + * rather they are sent tothe vhyp to exit from the L2 back to the L1 and
> + * return from the H_ENTER_NESTED hypercall.
> + */
> +static bool books_vhyp_handles_hv_excp(PowerPCCPU *cpu)
> +{
> + if (cpu->vhyp) {
> + PPCVirtualHypervisorClass *vhc;
> + vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + if (vhc->cpu_in_nested(cpu)) {
> + return true;
> + }
> }
> return false;
> }
> @@ -1540,12 +1578,6 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> break;
> }
>
> - /* Sanity check */
> - if (!(env->msr_mask & MSR_HVB) && srr0 == SPR_HSRR0) {
> - cpu_abort(cs, "Trying to deliver HV exception (HSRR) %d with "
> - "no HV support\n", excp);
> - }
>
> /*
> * Sort out endianness of interrupt, this differs depending on the
> * CPU, the HV mode, etc...
> @@ -1564,10 +1596,26 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> env->spr[srr1] = msr;
> }
>
> - /* This can update new_msr and vector if AIL applies */
> - ppc_excp_apply_ail(cpu, excp, msr, &new_msr, &vector);
> + if ((new_msr & MSR_HVB) && books_vhyp_handles_hv_excp(cpu)) {
> + PPCVirtualHypervisorClass *vhc =
> + PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + /* Deliver interrupt to L1 by returning from the H_ENTER_NESTED call */
> + vhc->deliver_hv_excp(cpu, excp);
>
> - powerpc_set_excp_state(cpu, vector, new_msr);
> + powerpc_reset_excp_state(cpu);
> +
> + } else {
> + /* Sanity check */
> + if (!(env->msr_mask & MSR_HVB) && srr0 == SPR_HSRR0) {
> + cpu_abort(cs, "Trying to deliver HV exception (HSRR) %d with "
> + "no HV support\n", excp);
> + }
> +
> + /* This can update new_msr and vector if AIL applies */
> + ppc_excp_apply_ail(cpu, excp, msr, &new_msr, &vector);
> +
> + powerpc_set_excp_state(cpu, vector, new_msr);
> + }
> }
> #else
> static inline void powerpc_excp_books(PowerPCCPU *cpu, int excp)
> @@ -1687,7 +1735,11 @@ static void ppc_hw_interrupt(CPUPPCState *env)
> /* HEIC blocks delivery to the hypervisor */
> if ((async_deliver && !(heic && msr_hv && !msr_pr)) ||
> (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
> - powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
> + if (vhyp_promotes_external_to_hvirt(cpu)) {
> + powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
> + } else {
> + powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
> + }
> return;
> }
> }
> @@ -1797,6 +1849,8 @@ void ppc_cpu_do_fwnmi_machine_check(CPUState *cs, target_ulong vector)
> msr |= (1ULL << MSR_LE);
> }
>
> + /* Anything for nested required here? MSR[HV] bit? */
> +
> powerpc_set_excp_state(cpu, vector, msr);
> }
>
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index 3b6d75a292..9b7a6a7f11 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -355,12 +355,23 @@ static int ppc_radix64_partition_scoped_xlate(PowerPCCPU *cpu,
> }
>
> /*
> - * The spapr vhc has a flat partition scope provided by qemu memory.
> + * The spapr vhc has a flat partition scope provided by qemu memory when
> + * not nested.
> + *
> + * When running a nested guest, the addressing is 2-level radix on top of the
> + * vhc memory, so it works practically identically to the bare metal 2-level
> + * radix. So that code is selected directly. A cleaner and more flexible nested
> + * hypervisor implementation would allow the vhc to provide a ->nested_xlate()
> + * function but that is not required for the moment.
> */
> static bool vhyp_flat_addressing(PowerPCCPU *cpu)
> {
> if (cpu->vhyp) {
> - return true;
> + PPCVirtualHypervisorClass *vhc;
> + vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + if (!vhc->cpu_in_nested(cpu)) {
> + return true;
> + }
> }
> return false;
> }
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support
2022-02-15 17:28 ` Cédric Le Goater
@ 2022-02-15 19:19 ` BALATON Zoltan
2022-02-16 0:49 ` Nicholas Piggin
0 siblings, 1 reply; 30+ messages in thread
From: BALATON Zoltan @ 2022-02-15 19:19 UTC (permalink / raw)
To: Cédric Le Goater
Cc: qemu-ppc, qemu-devel, Nicholas Piggin, Fabiano Rosas
[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]
On Tue, 15 Feb 2022, Cédric Le Goater wrote:
> On 2/15/22 04:16, Nicholas Piggin wrote:
>> Introduce virtual hypervisor methods that can support a "Nested KVM HV"
>> implementation using the bare metal 2-level radix MMU, and using HV
>> exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
>>
>> HV exceptions can now be raised in the TCG spapr machine when running a
>> nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
>> hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
>>
>> HV exceptions are intercepted in the exception handler code and instead
>> of causing interrupts in the guest and switching the machine to HV mode,
>> they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
>> interrupt vector numer as return value as required by the hcall API.
>>
>> Address translation is provided by the 2-level page table walker that is
>> implemented for the bare metal radix MMU. The partition scope page table
>> is pointed to the L1's partition scope by the get_pate vhc method.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> hw/ppc/pegasos2.c | 6 ++++
>> hw/ppc/spapr.c | 6 ++++
>> target/ppc/cpu.h | 2 ++
>> target/ppc/excp_helper.c | 76 ++++++++++++++++++++++++++++++++++------
>> target/ppc/mmu-radix64.c | 15 ++++++--
>> 5 files changed, 92 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
>> index 298e6b93e2..d45008ac71 100644
>> --- a/hw/ppc/pegasos2.c
>> +++ b/hw/ppc/pegasos2.c
>> @@ -449,6 +449,11 @@ static target_ulong pegasos2_rtas(PowerPCCPU *cpu,
>> Pegasos2MachineState *pm,
>> }
>> }
>> +static bool pegasos2_cpu_in_nested(PowerPCCPU *cpu)
>> +{
>> + return false;
>> +}
>> +
>> static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU
>> *cpu)
>> {
>> Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
>> @@ -504,6 +509,7 @@ static void pegasos2_machine_class_init(ObjectClass
>> *oc, void *data)
>> mc->default_ram_id = "pegasos2.ram";
>> mc->default_ram_size = 512 * MiB;
>> + vhc->cpu_in_nested = pegasos2_cpu_in_nested;
>> vhc->hypercall = pegasos2_hypercall;
>> vhc->cpu_exec_enter = vhyp_nop;
>> vhc->cpu_exec_exit = vhyp_nop;
>
>
> I don't think you need to worry about the pegasos2 machine as it only
> implements a few of the PPCVirtualHypervisorClass handlers and it can
> not run any of these virtualization features. I would drop this part.
I don't know anything about HV and running it nested but I'm sure pegasos2
does not run with it as the hardware does not have HV (or radix MMU which
is mentioned in the commit message above) and I've only used vhyp here to
avoid having to modify vof and be able to use the same vof.bin that we
have. This was the simplest way but it probably does not work with KVM
either so I agree that unless it's required to implement this method for
all machines using vhyp then this should not be needed for pegasos2. We
only really need hypercall to be able to use VOF which is needed for
booting OSes as the board firmware is not redistributable.
If this gets in the way we could replace it with some other hypercall
method (there was some discussion during the review of the series adding
VOF support to pegasos2, we could support MOL OSI or some own solution
instead) if VOF supported these, but I did not want to touch VOF so went
with the simplest working solution.
Regards,
BALATON Zoltan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support
2022-02-15 19:19 ` BALATON Zoltan
@ 2022-02-16 0:49 ` Nicholas Piggin
0 siblings, 0 replies; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-16 0:49 UTC (permalink / raw)
To: BALATON Zoltan, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Fabiano Rosas
Excerpts from BALATON Zoltan's message of February 16, 2022 5:19 am:
> On Tue, 15 Feb 2022, Cédric Le Goater wrote:
>> On 2/15/22 04:16, Nicholas Piggin wrote:
>>> Introduce virtual hypervisor methods that can support a "Nested KVM HV"
>>> implementation using the bare metal 2-level radix MMU, and using HV
>>> exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
>>>
>>> HV exceptions can now be raised in the TCG spapr machine when running a
>>> nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
>>> hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
>>>
>>> HV exceptions are intercepted in the exception handler code and instead
>>> of causing interrupts in the guest and switching the machine to HV mode,
>>> they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
>>> interrupt vector numer as return value as required by the hcall API.
>>>
>>> Address translation is provided by the 2-level page table walker that is
>>> implemented for the bare metal radix MMU. The partition scope page table
>>> is pointed to the L1's partition scope by the get_pate vhc method.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> hw/ppc/pegasos2.c | 6 ++++
>>> hw/ppc/spapr.c | 6 ++++
>>> target/ppc/cpu.h | 2 ++
>>> target/ppc/excp_helper.c | 76 ++++++++++++++++++++++++++++++++++------
>>> target/ppc/mmu-radix64.c | 15 ++++++--
>>> 5 files changed, 92 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
>>> index 298e6b93e2..d45008ac71 100644
>>> --- a/hw/ppc/pegasos2.c
>>> +++ b/hw/ppc/pegasos2.c
>>> @@ -449,6 +449,11 @@ static target_ulong pegasos2_rtas(PowerPCCPU *cpu,
>>> Pegasos2MachineState *pm,
>>> }
>>> }
>>> +static bool pegasos2_cpu_in_nested(PowerPCCPU *cpu)
>>> +{
>>> + return false;
>>> +}
>>> +
>>> static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU
>>> *cpu)
>>> {
>>> Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
>>> @@ -504,6 +509,7 @@ static void pegasos2_machine_class_init(ObjectClass
>>> *oc, void *data)
>>> mc->default_ram_id = "pegasos2.ram";
>>> mc->default_ram_size = 512 * MiB;
>>> + vhc->cpu_in_nested = pegasos2_cpu_in_nested;
>>> vhc->hypercall = pegasos2_hypercall;
>>> vhc->cpu_exec_enter = vhyp_nop;
>>> vhc->cpu_exec_exit = vhyp_nop;
>>
>>
>> I don't think you need to worry about the pegasos2 machine as it only
>> implements a few of the PPCVirtualHypervisorClass handlers and it can
>> not run any of these virtualization features. I would drop this part.
>
> I don't know anything about HV and running it nested but I'm sure pegasos2
> does not run with it as the hardware does not have HV (or radix MMU which
> is mentioned in the commit message above) and I've only used vhyp here to
> avoid having to modify vof and be able to use the same vof.bin that we
> have. This was the simplest way but it probably does not work with KVM
> either so I agree that unless it's required to implement this method for
> all machines using vhyp then this should not be needed for pegasos2. We
> only really need hypercall to be able to use VOF which is needed for
> booting OSes as the board firmware is not redistributable.
>
> If this gets in the way we could replace it with some other hypercall
> method (there was some discussion during the review of the series adding
> VOF support to pegasos2, we could support MOL OSI or some own solution
> instead) if VOF supported these, but I did not want to touch VOF so went
> with the simplest working solution.
Thanks, if there is a problem we can solve it one way or another. Don't
worry about it for now but when reviews are happy I might just need help
to test it doesn't interfere with your machine.
Thanks,
Nick
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (7 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support Nicholas Piggin
@ 2022-02-15 3:16 ` Nicholas Piggin
2022-02-15 16:01 ` Fabiano Rosas
2022-02-15 18:21 ` Cédric Le Goater
2022-02-15 18:33 ` [PATCH 0/9] ppc: nested KVM HV for spapr " Cédric Le Goater
9 siblings, 2 replies; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-15 3:16 UTC (permalink / raw)
To: qemu-ppc
Cc: Fabiano Rosas, qemu-devel, Nicholas Piggin, Cédric Le Goater
This implements the Nested KVM HV hcall API for spapr under TCG.
The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.
The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.
The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.
The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/spapr.c | 32 +++-
hw/ppc/spapr_caps.c | 11 +-
hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++++
include/hw/ppc/spapr.h | 74 +++++++++-
target/ppc/cpu.h | 3 +
5 files changed, 431 insertions(+), 10 deletions(-)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3a5cf92c94..6988e3ec76 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
{
SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
- assert(lpid == 0);
+ if (!cpu->in_spapr_nested) {
+ assert(lpid == 0);
- /* Copy PATE1:GR into PATE0:HR */
- entry->dw0 = spapr->patb_entry & PATE0_HR;
- entry->dw1 = spapr->patb_entry;
+ /* Copy PATE1:GR into PATE0:HR */
+ entry->dw0 = spapr->patb_entry & PATE0_HR;
+ entry->dw1 = spapr->patb_entry;
+
+ } else {
+ uint64_t patb, pats;
+
+ assert(lpid != 0);
+
+ patb = spapr->nested_ptcr & PTCR_PATB;
+ pats = spapr->nested_ptcr & PTCR_PATS;
+
+ /* Calculate number of entries */
+ pats = 1ull << (pats + 12 - 4);
+ if (pats <= lpid) {
+ return false;
+ }
+
+ /* Grab entry */
+ patb += 16 * lpid;
+ entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+ entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+ }
return true;
}
@@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
{
- return false;
+ return cpu->in_spapr_nested;
}
static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
@@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
nc->nmi_monitor_handler = spapr_nmi;
smc->phb_placement = spapr_phb_placement;
vhc->cpu_in_nested = spapr_cpu_in_nested;
+ vhc->deliver_hv_excp = spapr_exit_nested;
vhc->hypercall = emulate_spapr_hypercall;
vhc->hpt_mask = spapr_hpt_mask;
vhc->map_hptes = spapr_map_hptes;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 5cc80776d0..4d8bb2ad2c 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
{
ERRP_GUARD();
PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+ CPUPPCState *env = &cpu->env;
if (!val) {
/* capability disabled by default */
return;
}
- if (tcg_enabled()) {
- error_setg(errp, "No Nested KVM-HV support in TCG");
+ if (!(env->insns_flags2 & PPC2_ISA300)) {
+ error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
- } else if (kvm_enabled()) {
+ }
+
+ if (kvm_enabled()) {
if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
spapr->max_compat_pvr)) {
- error_setg(errp, "Nested KVM-HV only supported on POWER9");
+ error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
error_append_hint(errp,
"Try appending -machine max-cpu-compat=power9\n");
return;
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 5dec056796..3129fae90d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -9,6 +9,7 @@
#include "qemu/error-report.h"
#include "exec/exec-all.h"
#include "helper_regs.h"
+#include "hw/ppc/ppc.h"
#include "hw/ppc/spapr.h"
#include "hw/ppc/spapr_cpu_core.h"
#include "mmu-hash64.h"
@@ -1500,6 +1501,321 @@ static void hypercall_register_softmmu(void)
}
#endif
+/* TCG only */
+#define PRTS_MASK 0x1f
+
+static target_ulong h_set_ptbl(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ target_ulong ptcr = args[0];
+
+ if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
+ return H_FUNCTION;
+ }
+
+ if ((ptcr & PRTS_MASK) + 12 - 4 > 12) {
+ return H_PARAMETER;
+ }
+
+ spapr->nested_ptcr = ptcr; /* Save new partition table */
+
+ return H_SUCCESS;
+}
+
+static target_ulong h_tlb_invalidate(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ /*
+ * The spapr virtual hypervisor nested HV implementation retains no L2
+ * translation state except for TLB. And the TLB is always invalidated
+ * across L1<->L2 transitions, so nothing is required here.
+ */
+
+ return H_SUCCESS;
+}
+
+static target_ulong h_copy_tofrom_guest(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ /*
+ * This HCALL is not required, L1 KVM will take a slow path and walk the
+ * page tables manually to do the data copy.
+ */
+ return H_FUNCTION;
+}
+
+/*
+ * When this handler returns, the environment is switched to the L2 guest
+ * and TCG begins running that. spapr_exit_nested() performs the switch from
+ * L2 back to L1 and returns from the H_ENTER_NESTED hcall.
+ */
+static target_ulong h_enter_nested(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+ CPUState *cs = CPU(cpu);
+ CPUPPCState *env = &cpu->env;
+ target_ulong hv_ptr = args[0];
+ target_ulong regs_ptr = args[1];
+ target_ulong hdec, now = cpu_ppc_load_tbl(env);
+ target_ulong lpcr, lpcr_mask;
+ struct kvmppc_hv_guest_state *hvstate;
+ struct kvmppc_hv_guest_state hv_state;
+ struct kvmppc_pt_regs *regs;
+ hwaddr len;
+ uint32_t cr;
+ int i;
+
+ if (cpu->in_spapr_nested) {
+ return H_FUNCTION;
+ }
+ if (spapr->nested_ptcr == 0) {
+ return H_NOT_AVAILABLE;
+ }
+
+ len = sizeof(*hvstate);
+ hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
+ if (!hvstate || len != sizeof(*hvstate)) {
+ return H_PARAMETER;
+ }
+
+ memcpy(&hv_state, hvstate, len);
+
+ cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
+
+ /*
+ * We accept versions 1 and 2. Version 2 fields are unused because TCG
+ * does not implement DAWR*.
+ */
+ if (hv_state.version > HV_GUEST_STATE_VERSION) {
+ return H_PARAMETER;
+ }
+
+ cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
+ if (!cpu->nested_host_state) {
+ return H_NO_MEM;
+ }
+
+ memcpy(cpu->nested_host_state, env, sizeof(CPUPPCState));
+
+ len = sizeof(*regs);
+ regs = cpu_physical_memory_map(regs_ptr, &len, true);
+ if (!regs || len != sizeof(*regs)) {
+ g_free(cpu->nested_host_state);
+ return H_P2;
+ }
+
+ len = sizeof(env->gpr);
+ assert(len == sizeof(regs->gpr));
+ memcpy(env->gpr, regs->gpr, len);
+
+ env->lr = regs->link;
+ env->ctr = regs->ctr;
+ cpu_write_xer(env, regs->xer);
+
+ cr = regs->ccr;
+ for (i = 7; i >= 0; i--) {
+ env->crf[i] = cr & 15;
+ cr >>= 4;
+ }
+
+ env->msr = regs->msr;
+ env->nip = regs->nip;
+
+ cpu_physical_memory_unmap(regs, len, 0 /* read */, len /* access len */);
+
+ env->cfar = hv_state.cfar;
+
+ assert(env->spr[SPR_LPIDR] == 0);
+ env->spr[SPR_LPIDR] = hv_state.lpid;
+
+ lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
+ lpcr = (env->spr[SPR_LPCR] & ~lpcr_mask) | (hv_state.lpcr & lpcr_mask);
+ lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
+ lpcr &= ~LPCR_LPES0;
+ env->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
+
+ env->spr[SPR_PCR] = hv_state.pcr;
+ /* hv_state.amor is not used */
+ env->spr[SPR_DPDES] = hv_state.dpdes;
+ env->spr[SPR_HFSCR] = hv_state.hfscr;
+ hdec = hv_state.hdec_expiry - now;
+ env->tb_env->tb_offset += hv_state.tb_offset;
+ /* TCG does not implement DAWR*, CIABR, PURR, SPURR, IC, VTB, HEIR SPRs*/
+ env->spr[SPR_SRR0] = hv_state.srr0;
+ env->spr[SPR_SRR1] = hv_state.srr1;
+ env->spr[SPR_SPRG0] = hv_state.sprg[0];
+ env->spr[SPR_SPRG1] = hv_state.sprg[1];
+ env->spr[SPR_SPRG2] = hv_state.sprg[2];
+ env->spr[SPR_SPRG3] = hv_state.sprg[3];
+ env->spr[SPR_BOOKS_PID] = hv_state.pidr;
+ env->spr[SPR_PPR] = hv_state.ppr;
+
+ cpu_ppc_hdecr_init(env);
+ cpu_ppc_store_hdecr(env, hdec);
+
+ /*
+ * The hv_state.vcpu_token is not needed. It is used by the KVM
+ * implementation to remember which L2 vCPU last ran on which physical
+ * CPU so as to invalidate process scope translations if it is moved
+ * between physical CPUs. For now TLBs are always flushed on L1<->L2
+ * transitions so this is not a problem.
+ *
+ * Could validate that the same vcpu_token does not attempt to run on
+ * different L1 vCPUs at the same time, but that would be a L1 KVM bug
+ * and it's not obviously worth a new data structure to do it.
+ */
+
+ cpu->in_spapr_nested = true;
+
+ hreg_compute_hflags(env);
+ tlb_flush(cs);
+ env->reserve_addr = -1; /* Reset the reservation */
+
+ /*
+ * The spapr hcall helper sets env->gpr[3] to the return value, but at
+ * this point the L1 is not returning from the hcall but rather we
+ * start running the L2, so r3 must not be clobbered, so return env->gpr[3]
+ * to leave it unchanged.
+ */
+ return env->gpr[3];
+}
+
+void spapr_exit_nested(PowerPCCPU *cpu, int excp)
+{
+ CPUState *cs = CPU(cpu);
+ CPUPPCState *env = &cpu->env;
+ target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
+ target_ulong hv_ptr = cpu->nested_host_state->gpr[4];
+ target_ulong regs_ptr = cpu->nested_host_state->gpr[5];
+ struct kvmppc_hv_guest_state *hvstate;
+ struct kvmppc_pt_regs *regs;
+ hwaddr len;
+ int i;
+
+ assert(cpu->in_spapr_nested);
+ cpu->in_spapr_nested = false;
+
+ cpu_ppc_hdecr_exit(env);
+
+ len = sizeof(*hvstate);
+ hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
+ if (!hvstate || len != sizeof(*hvstate)) {
+ r3_return = H_PARAMETER;
+ goto out_restore_l1;
+ }
+
+ env->tb_env->tb_offset -= hvstate->tb_offset;
+
+ hvstate->cfar = env->cfar;
+ hvstate->lpcr = env->spr[SPR_LPCR];
+ hvstate->pcr = env->spr[SPR_PCR];
+ hvstate->dpdes = env->spr[SPR_DPDES];
+ hvstate->hfscr = env->spr[SPR_HFSCR];
+
+ if (excp == POWERPC_EXCP_HDSI) {
+ hvstate->hdar = env->spr[SPR_HDAR];
+ hvstate->hdsisr = env->spr[SPR_HDSISR];
+ hvstate->asdr = env->spr[SPR_ASDR];
+ } else if (excp == POWERPC_EXCP_HISI) {
+ hvstate->asdr = env->spr[SPR_ASDR];
+ }
+
+ /* HEIR should be implemented for HV mode and saved here. */
+ hvstate->srr0 = env->spr[SPR_SRR0];
+ hvstate->srr1 = env->spr[SPR_SRR1];
+ hvstate->sprg[0] = env->spr[SPR_SPRG0];
+ hvstate->sprg[1] = env->spr[SPR_SPRG1];
+ hvstate->sprg[2] = env->spr[SPR_SPRG2];
+ hvstate->sprg[3] = env->spr[SPR_SPRG3];
+ hvstate->pidr = env->spr[SPR_BOOKS_PID];
+ hvstate->ppr = env->spr[SPR_PPR];
+
+ cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
+
+ len = sizeof(*regs);
+ regs = cpu_physical_memory_map(regs_ptr, &len, true);
+ if (!regs || len != sizeof(*regs)) {
+ r3_return = H_P2;
+ goto out_restore_l1;
+ }
+
+ len = sizeof(env->gpr);
+ assert(len == sizeof(regs->gpr));
+ memcpy(regs->gpr, env->gpr, len);
+
+ regs->link = env->lr;
+ regs->ctr = env->ctr;
+ regs->xer = cpu_read_xer(env);
+
+ regs->ccr = 0;
+ for (i = 0; i < 8; i++) {
+ regs->ccr |= (env->crf[i] & 15) << (4 * (7 - i));
+ }
+
+ if (excp == POWERPC_EXCP_MCHECK ||
+ excp == POWERPC_EXCP_RESET ||
+ excp == POWERPC_EXCP_SYSCALL) {
+ regs->nip = env->spr[SPR_SRR0];
+ regs->msr = env->spr[SPR_SRR1] & env->msr_mask;
+ } else {
+ regs->nip = env->spr[SPR_HSRR0];
+ regs->msr = env->spr[SPR_HSRR1] & env->msr_mask;
+ }
+
+ cpu_physical_memory_unmap(regs, len, 0 /* read */, len /* access len */);
+
+out_restore_l1:
+ memcpy(env->gpr, cpu->nested_host_state->gpr, sizeof(env->gpr));
+ env->lr = cpu->nested_host_state->lr;
+ env->ctr = cpu->nested_host_state->ctr;
+ memcpy(env->crf, cpu->nested_host_state->crf, sizeof(env->crf));
+ env->cfar = cpu->nested_host_state->cfar;
+ env->xer = cpu->nested_host_state->xer;
+ env->so = cpu->nested_host_state->so;
+ env->ov = cpu->nested_host_state->ov;
+ env->ov32 = cpu->nested_host_state->ov32;
+ env->ca32 = cpu->nested_host_state->ca32;
+ env->msr = cpu->nested_host_state->msr;
+ env->nip = cpu->nested_host_state->nip;
+
+ assert(env->spr[SPR_LPIDR] != 0);
+ env->spr[SPR_LPCR] = cpu->nested_host_state->spr[SPR_LPCR];
+ env->spr[SPR_LPIDR] = cpu->nested_host_state->spr[SPR_LPIDR];
+ env->spr[SPR_PCR] = cpu->nested_host_state->spr[SPR_PCR];
+ env->spr[SPR_DPDES] = 0;
+ env->spr[SPR_HFSCR] = cpu->nested_host_state->spr[SPR_HFSCR];
+ env->spr[SPR_SRR0] = cpu->nested_host_state->spr[SPR_SRR0];
+ env->spr[SPR_SRR1] = cpu->nested_host_state->spr[SPR_SRR1];
+ env->spr[SPR_SPRG0] = cpu->nested_host_state->spr[SPR_SPRG0];
+ env->spr[SPR_SPRG1] = cpu->nested_host_state->spr[SPR_SPRG1];
+ env->spr[SPR_SPRG2] = cpu->nested_host_state->spr[SPR_SPRG2];
+ env->spr[SPR_SPRG3] = cpu->nested_host_state->spr[SPR_SPRG3];
+ env->spr[SPR_BOOKS_PID] = cpu->nested_host_state->spr[SPR_BOOKS_PID];
+ env->spr[SPR_PPR] = cpu->nested_host_state->spr[SPR_PPR];
+
+ g_free(cpu->nested_host_state);
+ cpu->nested_host_state = NULL;
+
+ /*
+ * Return the interrupt vector address from H_ENTER_NESTED to the L1
+ * (or error code).
+ */
+ env->gpr[3] = r3_return;
+
+ hreg_compute_hflags(env);
+ tlb_flush(cs);
+ env->reserve_addr = -1; /* Reset the reservation */
+}
+
static void hypercall_register_types(void)
{
hypercall_register_softmmu();
@@ -1555,6 +1871,11 @@ static void hypercall_register_types(void)
spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
+
+ spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
+ spapr_register_hypercall(KVMPPC_H_ENTER_NESTED, h_enter_nested);
+ spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
+ spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
}
type_init(hypercall_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index edbf3eeed0..852fe61b36 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -199,6 +199,9 @@ struct SpaprMachineState {
bool has_graphics;
uint32_t vsmt; /* Virtual SMT mode (KVM's "core stride") */
+ /* Nested HV support (TCG only) */
+ uint64_t nested_ptcr;
+
Notifier epow_notifier;
QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
bool use_hotplug_event_source;
@@ -579,7 +582,14 @@ struct SpaprMachineState {
#define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3)
/* 0x4 was used for KVMPPC_H_UPDATE_PHANDLE in SLOF */
#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5)
-#define KVMPPC_HCALL_MAX KVMPPC_H_VOF_CLIENT
+
+/* Platform-specific hcalls used for nested HV KVM */
+#define KVMPPC_H_SET_PARTITION_TABLE (KVMPPC_HCALL_BASE + 0x800)
+#define KVMPPC_H_ENTER_NESTED (KVMPPC_HCALL_BASE + 0x804)
+#define KVMPPC_H_TLB_INVALIDATE (KVMPPC_HCALL_BASE + 0x808)
+#define KVMPPC_H_COPY_TOFROM_GUEST (KVMPPC_HCALL_BASE + 0x80C)
+
+#define KVMPPC_HCALL_MAX KVMPPC_H_COPY_TOFROM_GUEST
/*
* The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating
@@ -589,6 +599,65 @@ struct SpaprMachineState {
#define SVM_H_TPM_COMM 0xEF10
#define SVM_HCALL_MAX SVM_H_TPM_COMM
+/*
+ * Register state for entering a nested guest with H_ENTER_NESTED.
+ * New member must be added at the end.
+ */
+struct kvmppc_hv_guest_state {
+ uint64_t version; /* version of this structure layout, must be first */
+ uint32_t lpid;
+ uint32_t vcpu_token;
+ /* These registers are hypervisor privileged (at least for writing) */
+ uint64_t lpcr;
+ uint64_t pcr;
+ uint64_t amor;
+ uint64_t dpdes;
+ uint64_t hfscr;
+ int64_t tb_offset;
+ uint64_t dawr0;
+ uint64_t dawrx0;
+ uint64_t ciabr;
+ uint64_t hdec_expiry;
+ uint64_t purr;
+ uint64_t spurr;
+ uint64_t ic;
+ uint64_t vtb;
+ uint64_t hdar;
+ uint64_t hdsisr;
+ uint64_t heir;
+ uint64_t asdr;
+ /* These are OS privileged but need to be set late in guest entry */
+ uint64_t srr0;
+ uint64_t srr1;
+ uint64_t sprg[4];
+ uint64_t pidr;
+ uint64_t cfar;
+ uint64_t ppr;
+ /* Version 1 ends here */
+ uint64_t dawr1;
+ uint64_t dawrx1;
+ /* Version 2 ends here */
+};
+
+/* Latest version of hv_guest_state structure */
+#define HV_GUEST_STATE_VERSION 2
+
+/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
+struct kvmppc_pt_regs {
+ uint64_t gpr[32];
+ uint64_t nip;
+ uint64_t msr;
+ uint64_t orig_gpr3; /* Used for restarting system calls */
+ uint64_t ctr;
+ uint64_t link;
+ uint64_t xer;
+ uint64_t ccr;
+ uint64_t softe; /* Soft enabled/disabled */
+ uint64_t trap; /* Reason for being here */
+ uint64_t dar; /* Fault registers */
+ uint64_t dsisr; /* on 4xx/Book-E used for ESR */
+ uint64_t result; /* Result of a system call */
+};
typedef struct SpaprDeviceTreeUpdateHeader {
uint32_t version_id;
@@ -606,6 +675,9 @@ typedef target_ulong (*spapr_hcall_fn)(PowerPCCPU *cpu, SpaprMachineState *sm,
void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
target_ulong *args);
+
+void spapr_exit_nested(PowerPCCPU *cpu, int excp);
+
target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu, SpaprMachineState *spapr,
target_ulong shift);
target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index d8cc956c97..65c4401130 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1301,6 +1301,9 @@ struct PowerPCCPU {
bool pre_2_10_migration;
bool pre_3_0_migration;
int32_t mig_slb_nr;
+
+ bool in_spapr_nested;
+ CPUPPCState *nested_host_state;
};
--
2.23.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
2022-02-15 3:16 ` [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor Nicholas Piggin
@ 2022-02-15 16:01 ` Fabiano Rosas
2022-02-15 18:21 ` Cédric Le Goater
1 sibling, 0 replies; 30+ messages in thread
From: Fabiano Rosas @ 2022-02-15 16:01 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc
Cc: qemu-devel, Nicholas Piggin, Cédric Le Goater
Nicholas Piggin <npiggin@gmail.com> writes:
> This implements the Nested KVM HV hcall API for spapr under TCG.
>
> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
> L1 is switched back in returned from the hcall when a HV exception
> is sent to the vhyp. Register state is copied in and out according to
> the nested KVM HV hcall API specification.
>
> The hdecr timer is started when the L2 is switched in, and it provides
> the HDEC / 0x980 return to L1.
>
> The MMU re-uses the bare metal radix 2-level page table walker by
> using the get_pate method to point the MMU to the nested partition
> table entry. MMU faults due to partition scope errors raise HV
> exceptions and accordingly are routed back to the L1.
>
> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
> and exit).
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
2022-02-15 3:16 ` [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor Nicholas Piggin
2022-02-15 16:01 ` Fabiano Rosas
@ 2022-02-15 18:21 ` Cédric Le Goater
2022-02-16 1:16 ` Nicholas Piggin
1 sibling, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 18:21 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> This implements the Nested KVM HV hcall API for spapr under TCG.
>
> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
> L1 is switched back in returned from the hcall when a HV exception
> is sent to the vhyp. Register state is copied in and out according to
> the nested KVM HV hcall API specification.
>
> The hdecr timer is started when the L2 is switched in, and it provides
> the HDEC / 0x980 return to L1.
>
> The MMU re-uses the bare metal radix 2-level page table walker by
> using the get_pate method to point the MMU to the nested partition
> table entry. MMU faults due to partition scope errors raise HV
> exceptions and accordingly are routed back to the L1.
>
> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
> and exit).
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> hw/ppc/spapr.c | 32 +++-
> hw/ppc/spapr_caps.c | 11 +-
> hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h | 74 +++++++++-
> target/ppc/cpu.h | 3 +
> 5 files changed, 431 insertions(+), 10 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3a5cf92c94..6988e3ec76 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
> {
> SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>
> - assert(lpid == 0);
> + if (!cpu->in_spapr_nested) {
Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
it belongs to PowerPCCPU. See the end of the patch, for a proposal.
btw, this helps the ordering of files :
[diff]
orderFile = /path/to/qemu/scripts/git.orderfile
> + assert(lpid == 0);
>
> - /* Copy PATE1:GR into PATE0:HR */
> - entry->dw0 = spapr->patb_entry & PATE0_HR;
> - entry->dw1 = spapr->patb_entry;
> + /* Copy PATE1:GR into PATE0:HR */
> + entry->dw0 = spapr->patb_entry & PATE0_HR;
> + entry->dw1 = spapr->patb_entry;
> +
> + } else {
> + uint64_t patb, pats;
> +
> + assert(lpid != 0);
> +
> + patb = spapr->nested_ptcr & PTCR_PATB;
> + pats = spapr->nested_ptcr & PTCR_PATS;
> +
> + /* Calculate number of entries */
> + pats = 1ull << (pats + 12 - 4);
> + if (pats <= lpid) {
> + return false;
> + }
> +
> + /* Grab entry */
> + patb += 16 * lpid;
> + entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
> + entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
> + }
>
> return true;
> }
> @@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
>
> static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
> {
> - return false;
> + return cpu->in_spapr_nested;
> }
>
> static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
> @@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> nc->nmi_monitor_handler = spapr_nmi;
> smc->phb_placement = spapr_phb_placement;
> vhc->cpu_in_nested = spapr_cpu_in_nested;
> + vhc->deliver_hv_excp = spapr_exit_nested;
> vhc->hypercall = emulate_spapr_hypercall;
> vhc->hpt_mask = spapr_hpt_mask;
> vhc->map_hptes = spapr_map_hptes;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 5cc80776d0..4d8bb2ad2c 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
> {
> ERRP_GUARD();
> PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> + CPUPPCState *env = &cpu->env;
>
> if (!val) {
> /* capability disabled by default */
> return;
> }
>
> - if (tcg_enabled()) {
> - error_setg(errp, "No Nested KVM-HV support in TCG");
I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
KVM-over-pseries. I think the platform name is important. Anyhow, this is
a more global discussion but we should talk about it someday because these
HV mode are becoming confusing ! We have PR also :)
> + if (!(env->insns_flags2 & PPC2_ISA300)) {
> + error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
> error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
return ?
> - } else if (kvm_enabled()) {
> + }
> +
> + if (kvm_enabled()) {
> if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
> spapr->max_compat_pvr)) {
> - error_setg(errp, "Nested KVM-HV only supported on POWER9");
> + error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
> error_append_hint(errp,
> "Try appending -machine max-cpu-compat=power9\n");
> return;
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 5dec056796..3129fae90d 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -9,6 +9,7 @@
> #include "qemu/error-report.h"
> #include "exec/exec-all.h"
> #include "helper_regs.h"
> +#include "hw/ppc/ppc.h"
> #include "hw/ppc/spapr.h"
> #include "hw/ppc/spapr_cpu_core.h"
> #include "mmu-hash64.h"
> @@ -1500,6 +1501,321 @@ static void hypercall_register_softmmu(void)
> }
> #endif
>
> +/* TCG only */
> +#define PRTS_MASK 0x1f
> +
> +static target_ulong h_set_ptbl(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + target_ulong ptcr = args[0];
> +
> + if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
> + return H_FUNCTION;
> + }
> +
> + if ((ptcr & PRTS_MASK) + 12 - 4 > 12) {
> + return H_PARAMETER;
> + }
> +
> + spapr->nested_ptcr = ptcr; /* Save new partition table */
> +
> + return H_SUCCESS;
> +}
> +
> +static target_ulong h_tlb_invalidate(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + /*
> + * The spapr virtual hypervisor nested HV implementation retains no L2
> + * translation state except for TLB. And the TLB is always invalidated
> + * across L1<->L2 transitions, so nothing is required here.
> + */
> +
> + return H_SUCCESS;
> +}
> +
> +static target_ulong h_copy_tofrom_guest(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + /*
> + * This HCALL is not required, L1 KVM will take a slow path and walk the
> + * page tables manually to do the data copy.
> + */
> + return H_FUNCTION;
> +}
> +
> +/*
> + * When this handler returns, the environment is switched to the L2 guest
> + * and TCG begins running that. spapr_exit_nested() performs the switch from
> + * L2 back to L1 and returns from the H_ENTER_NESTED hcall.
> + */
> +static target_ulong h_enter_nested(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> + CPUState *cs = CPU(cpu);
> + CPUPPCState *env = &cpu->env;
> + target_ulong hv_ptr = args[0];
> + target_ulong regs_ptr = args[1];
> + target_ulong hdec, now = cpu_ppc_load_tbl(env);
> + target_ulong lpcr, lpcr_mask;
> + struct kvmppc_hv_guest_state *hvstate;
> + struct kvmppc_hv_guest_state hv_state;
> + struct kvmppc_pt_regs *regs;
> + hwaddr len;
> + uint32_t cr;
> + int i;
> +
> + if (cpu->in_spapr_nested) {
> + return H_FUNCTION;
That would be an L3 :)
> + }
> + if (spapr->nested_ptcr == 0) {
> + return H_NOT_AVAILABLE;
> + }
> +
> + len = sizeof(*hvstate);
> + hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
When a CPU is available, I would prefer :
hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
MEMTXATTRS_UNSPECIFIED);
like ppc_hash64_map_hptes() does. This is minor.
> + if (!hvstate || len != sizeof(*hvstate)) {
> + return H_PARAMETER;
> + }
> +
> + memcpy(&hv_state, hvstate, len);
> +
> + cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
checkpatch will complain with the above comments.
> +
> + /*
> + * We accept versions 1 and 2. Version 2 fields are unused because TCG
> + * does not implement DAWR*.
> + */
> + if (hv_state.version > HV_GUEST_STATE_VERSION) {
> + return H_PARAMETER;
> + }
> +
> + cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
I think we could preallocate this buffer once we know nested are supported,
or if we keep it, it could be our 'in_spapr_nested' indicator.
> + if (!cpu->nested_host_state) {
> + return H_NO_MEM;
> + }
> +
> + memcpy(cpu->nested_host_state, env, sizeof(CPUPPCState));
> +
> + len = sizeof(*regs);
> + regs = cpu_physical_memory_map(regs_ptr, &len, true);
> + if (!regs || len != sizeof(*regs)) {
> + g_free(cpu->nested_host_state);
> + return H_P2;
> + }
> +
> + len = sizeof(env->gpr);
> + assert(len == sizeof(regs->gpr));
> + memcpy(env->gpr, regs->gpr, len);
> +
> + env->lr = regs->link;
> + env->ctr = regs->ctr;
> + cpu_write_xer(env, regs->xer);
> +
> + cr = regs->ccr;
> + for (i = 7; i >= 0; i--) {
> + env->crf[i] = cr & 15;
> + cr >>= 4;
> + }
> +
> + env->msr = regs->msr;
> + env->nip = regs->nip;
> +
> + cpu_physical_memory_unmap(regs, len, 0 /* read */, len /* access len */);
> +
> + env->cfar = hv_state.cfar;
> +
> + assert(env->spr[SPR_LPIDR] == 0);
> + env->spr[SPR_LPIDR] = hv_state.lpid;
> +
> + lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
> + lpcr = (env->spr[SPR_LPCR] & ~lpcr_mask) | (hv_state.lpcr & lpcr_mask);
> + lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
> + lpcr &= ~LPCR_LPES0;
> + env->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
> +
> + env->spr[SPR_PCR] = hv_state.pcr;
> + /* hv_state.amor is not used */
> + env->spr[SPR_DPDES] = hv_state.dpdes;
> + env->spr[SPR_HFSCR] = hv_state.hfscr;
> + hdec = hv_state.hdec_expiry - now;
> + env->tb_env->tb_offset += hv_state.tb_offset;
> + /* TCG does not implement DAWR*, CIABR, PURR, SPURR, IC, VTB, HEIR SPRs*/
> + env->spr[SPR_SRR0] = hv_state.srr0;
> + env->spr[SPR_SRR1] = hv_state.srr1;
> + env->spr[SPR_SPRG0] = hv_state.sprg[0];
> + env->spr[SPR_SPRG1] = hv_state.sprg[1];
> + env->spr[SPR_SPRG2] = hv_state.sprg[2];
> + env->spr[SPR_SPRG3] = hv_state.sprg[3];
> + env->spr[SPR_BOOKS_PID] = hv_state.pidr;
> + env->spr[SPR_PPR] = hv_state.ppr;
> +
> + cpu_ppc_hdecr_init(env);
> + cpu_ppc_store_hdecr(env, hdec);
> +
> + /*
> + * The hv_state.vcpu_token is not needed. It is used by the KVM
> + * implementation to remember which L2 vCPU last ran on which physical
> + * CPU so as to invalidate process scope translations if it is moved
> + * between physical CPUs. For now TLBs are always flushed on L1<->L2
> + * transitions so this is not a problem.
> + *
> + * Could validate that the same vcpu_token does not attempt to run on
> + * different L1 vCPUs at the same time, but that would be a L1 KVM bug
> + * and it's not obviously worth a new data structure to do it.
> + */
> +
> + cpu->in_spapr_nested = true;
> +
> + hreg_compute_hflags(env);
> + tlb_flush(cs);
> + env->reserve_addr = -1; /* Reset the reservation */
> +
> + /*
> + * The spapr hcall helper sets env->gpr[3] to the return value, but at
> + * this point the L1 is not returning from the hcall but rather we
> + * start running the L2, so r3 must not be clobbered, so return env->gpr[3]
> + * to leave it unchanged.
> + */
> + return env->gpr[3];
> +}
> +
> +void spapr_exit_nested(PowerPCCPU *cpu, int excp)
> +{
> + CPUState *cs = CPU(cpu);
> + CPUPPCState *env = &cpu->env;
> + target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
> + target_ulong hv_ptr = cpu->nested_host_state->gpr[4];
> + target_ulong regs_ptr = cpu->nested_host_state->gpr[5];
> + struct kvmppc_hv_guest_state *hvstate;
> + struct kvmppc_pt_regs *regs;
> + hwaddr len;
> + int i;
> +
> + assert(cpu->in_spapr_nested);
> + cpu->in_spapr_nested = false;
> +
> + cpu_ppc_hdecr_exit(env);
> +
> + len = sizeof(*hvstate);
> + hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
> + if (!hvstate || len != sizeof(*hvstate)) {
> + r3_return = H_PARAMETER;
> + goto out_restore_l1;
> + }
> +
> + env->tb_env->tb_offset -= hvstate->tb_offset;
> +
> + hvstate->cfar = env->cfar;
> + hvstate->lpcr = env->spr[SPR_LPCR];
> + hvstate->pcr = env->spr[SPR_PCR];
> + hvstate->dpdes = env->spr[SPR_DPDES];
> + hvstate->hfscr = env->spr[SPR_HFSCR];
> +
> + if (excp == POWERPC_EXCP_HDSI) {
> + hvstate->hdar = env->spr[SPR_HDAR];
> + hvstate->hdsisr = env->spr[SPR_HDSISR];
> + hvstate->asdr = env->spr[SPR_ASDR];
> + } else if (excp == POWERPC_EXCP_HISI) {
> + hvstate->asdr = env->spr[SPR_ASDR];
> + }
> +
> + /* HEIR should be implemented for HV mode and saved here. */
> + hvstate->srr0 = env->spr[SPR_SRR0];
> + hvstate->srr1 = env->spr[SPR_SRR1];
> + hvstate->sprg[0] = env->spr[SPR_SPRG0];
> + hvstate->sprg[1] = env->spr[SPR_SPRG1];
> + hvstate->sprg[2] = env->spr[SPR_SPRG2];
> + hvstate->sprg[3] = env->spr[SPR_SPRG3];
> + hvstate->pidr = env->spr[SPR_BOOKS_PID];
> + hvstate->ppr = env->spr[SPR_PPR];
> +
> + cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
> +
> + len = sizeof(*regs);
> + regs = cpu_physical_memory_map(regs_ptr, &len, true);
> + if (!regs || len != sizeof(*regs)) {
> + r3_return = H_P2;
> + goto out_restore_l1;
> + }
> +
> + len = sizeof(env->gpr);
> + assert(len == sizeof(regs->gpr));
> + memcpy(regs->gpr, env->gpr, len);
> +
> + regs->link = env->lr;
> + regs->ctr = env->ctr;
> + regs->xer = cpu_read_xer(env);
> +
> + regs->ccr = 0;
> + for (i = 0; i < 8; i++) {
> + regs->ccr |= (env->crf[i] & 15) << (4 * (7 - i));
> + }
> +
> + if (excp == POWERPC_EXCP_MCHECK ||
> + excp == POWERPC_EXCP_RESET ||
> + excp == POWERPC_EXCP_SYSCALL) {
> + regs->nip = env->spr[SPR_SRR0];
> + regs->msr = env->spr[SPR_SRR1] & env->msr_mask;
> + } else {
> + regs->nip = env->spr[SPR_HSRR0];
> + regs->msr = env->spr[SPR_HSRR1] & env->msr_mask;
> + }
> +
> + cpu_physical_memory_unmap(regs, len, 0 /* read */, len /* access len */);
> +
> +out_restore_l1:
> + memcpy(env->gpr, cpu->nested_host_state->gpr, sizeof(env->gpr));
> + env->lr = cpu->nested_host_state->lr;
> + env->ctr = cpu->nested_host_state->ctr;
> + memcpy(env->crf, cpu->nested_host_state->crf, sizeof(env->crf));
> + env->cfar = cpu->nested_host_state->cfar;
> + env->xer = cpu->nested_host_state->xer;
> + env->so = cpu->nested_host_state->so;
> + env->ov = cpu->nested_host_state->ov;
> + env->ov32 = cpu->nested_host_state->ov32;
> + env->ca32 = cpu->nested_host_state->ca32;
> + env->msr = cpu->nested_host_state->msr;
> + env->nip = cpu->nested_host_state->nip;
> +
> + assert(env->spr[SPR_LPIDR] != 0);
> + env->spr[SPR_LPCR] = cpu->nested_host_state->spr[SPR_LPCR];
> + env->spr[SPR_LPIDR] = cpu->nested_host_state->spr[SPR_LPIDR];
> + env->spr[SPR_PCR] = cpu->nested_host_state->spr[SPR_PCR];
> + env->spr[SPR_DPDES] = 0;
> + env->spr[SPR_HFSCR] = cpu->nested_host_state->spr[SPR_HFSCR];
> + env->spr[SPR_SRR0] = cpu->nested_host_state->spr[SPR_SRR0];
> + env->spr[SPR_SRR1] = cpu->nested_host_state->spr[SPR_SRR1];
> + env->spr[SPR_SPRG0] = cpu->nested_host_state->spr[SPR_SPRG0];
> + env->spr[SPR_SPRG1] = cpu->nested_host_state->spr[SPR_SPRG1];
> + env->spr[SPR_SPRG2] = cpu->nested_host_state->spr[SPR_SPRG2];
> + env->spr[SPR_SPRG3] = cpu->nested_host_state->spr[SPR_SPRG3];
> + env->spr[SPR_BOOKS_PID] = cpu->nested_host_state->spr[SPR_BOOKS_PID];
> + env->spr[SPR_PPR] = cpu->nested_host_state->spr[SPR_PPR];
> +
> + g_free(cpu->nested_host_state);
> + cpu->nested_host_state = NULL;
> +
> + /*
> + * Return the interrupt vector address from H_ENTER_NESTED to the L1
> + * (or error code).
> + */
> + env->gpr[3] = r3_return;
> +
> + hreg_compute_hflags(env);
> + tlb_flush(cs);
> + env->reserve_addr = -1; /* Reset the reservation */
> +}
> +
> static void hypercall_register_types(void)
> {
> hypercall_register_softmmu();
> @@ -1555,6 +1871,11 @@ static void hypercall_register_types(void)
> spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>
> spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
> +
> + spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> + spapr_register_hypercall(KVMPPC_H_ENTER_NESTED, h_enter_nested);
> + spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
> + spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
> }
>
> type_init(hypercall_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index edbf3eeed0..852fe61b36 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -199,6 +199,9 @@ struct SpaprMachineState {
> bool has_graphics;
> uint32_t vsmt; /* Virtual SMT mode (KVM's "core stride") */
>
> + /* Nested HV support (TCG only) */
> + uint64_t nested_ptcr;
> +
> Notifier epow_notifier;
> QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
> bool use_hotplug_event_source;
> @@ -579,7 +582,14 @@ struct SpaprMachineState {
> #define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3)
> /* 0x4 was used for KVMPPC_H_UPDATE_PHANDLE in SLOF */
> #define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5)
> -#define KVMPPC_HCALL_MAX KVMPPC_H_VOF_CLIENT
> +
> +/* Platform-specific hcalls used for nested HV KVM */
> +#define KVMPPC_H_SET_PARTITION_TABLE (KVMPPC_HCALL_BASE + 0x800)
> +#define KVMPPC_H_ENTER_NESTED (KVMPPC_HCALL_BASE + 0x804)
> +#define KVMPPC_H_TLB_INVALIDATE (KVMPPC_HCALL_BASE + 0x808)
> +#define KVMPPC_H_COPY_TOFROM_GUEST (KVMPPC_HCALL_BASE + 0x80C)
> +
> +#define KVMPPC_HCALL_MAX KVMPPC_H_COPY_TOFROM_GUEST
>
> /*
> * The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating
> @@ -589,6 +599,65 @@ struct SpaprMachineState {
> #define SVM_H_TPM_COMM 0xEF10
> #define SVM_HCALL_MAX SVM_H_TPM_COMM
>
> +/*
> + * Register state for entering a nested guest with H_ENTER_NESTED.
> + * New member must be added at the end.
> + */
> +struct kvmppc_hv_guest_state {
> + uint64_t version; /* version of this structure layout, must be first */
> + uint32_t lpid;
> + uint32_t vcpu_token;
> + /* These registers are hypervisor privileged (at least for writing) */
> + uint64_t lpcr;
> + uint64_t pcr;
> + uint64_t amor;
> + uint64_t dpdes;
> + uint64_t hfscr;
> + int64_t tb_offset;
> + uint64_t dawr0;
> + uint64_t dawrx0;
> + uint64_t ciabr;
> + uint64_t hdec_expiry;
> + uint64_t purr;
> + uint64_t spurr;
> + uint64_t ic;
> + uint64_t vtb;
> + uint64_t hdar;
> + uint64_t hdsisr;
> + uint64_t heir;
> + uint64_t asdr;
> + /* These are OS privileged but need to be set late in guest entry */
> + uint64_t srr0;
> + uint64_t srr1;
> + uint64_t sprg[4];
> + uint64_t pidr;
> + uint64_t cfar;
> + uint64_t ppr;
> + /* Version 1 ends here */
> + uint64_t dawr1;
> + uint64_t dawrx1;
> + /* Version 2 ends here */
> +};
> +
> +/* Latest version of hv_guest_state structure */
> +#define HV_GUEST_STATE_VERSION 2
> +
> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
> +struct kvmppc_pt_regs {
> + uint64_t gpr[32];
> + uint64_t nip;
> + uint64_t msr;
> + uint64_t orig_gpr3; /* Used for restarting system calls */
> + uint64_t ctr;
> + uint64_t link;
> + uint64_t xer;
> + uint64_t ccr;
> + uint64_t softe; /* Soft enabled/disabled */
> + uint64_t trap; /* Reason for being here */
> + uint64_t dar; /* Fault registers */
> + uint64_t dsisr; /* on 4xx/Book-E used for ESR */
> + uint64_t result; /* Result of a system call */
> +};
The above structs are shared with KVM for this QEMU implementation.
I don't think they belong to asm-powerpc/kvm.h but how could we keep them
in sync ? The version should be protecting us from unexpected changes.
> typedef struct SpaprDeviceTreeUpdateHeader {
> uint32_t version_id;
> @@ -606,6 +675,9 @@ typedef target_ulong (*spapr_hcall_fn)(PowerPCCPU *cpu, SpaprMachineState *sm,
> void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
> target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> target_ulong *args);
> +
> +void spapr_exit_nested(PowerPCCPU *cpu, int excp);
> +
> target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu, SpaprMachineState *spapr,
> target_ulong shift);
> target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index d8cc956c97..65c4401130 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1301,6 +1301,9 @@ struct PowerPCCPU {
> bool pre_2_10_migration;
> bool pre_3_0_migration;
> int32_t mig_slb_nr;
> +
> + bool in_spapr_nested;
> + CPUPPCState *nested_host_state;
> };
These new fields belong to SpaprCpuState. I shouldn't be too hard to adapt.
Thanks,
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
2022-02-15 18:21 ` Cédric Le Goater
@ 2022-02-16 1:16 ` Nicholas Piggin
2022-02-16 10:23 ` Cédric Le Goater
0 siblings, 1 reply; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-16 1:16 UTC (permalink / raw)
To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
Excerpts from Cédric Le Goater's message of February 16, 2022 4:21 am:
> On 2/15/22 04:16, Nicholas Piggin wrote:
>> This implements the Nested KVM HV hcall API for spapr under TCG.
>>
>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>> L1 is switched back in returned from the hcall when a HV exception
>> is sent to the vhyp. Register state is copied in and out according to
>> the nested KVM HV hcall API specification.
>>
>> The hdecr timer is started when the L2 is switched in, and it provides
>> the HDEC / 0x980 return to L1.
>>
>> The MMU re-uses the bare metal radix 2-level page table walker by
>> using the get_pate method to point the MMU to the nested partition
>> table entry. MMU faults due to partition scope errors raise HV
>> exceptions and accordingly are routed back to the L1.
>>
>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>> and exit).
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> hw/ppc/spapr.c | 32 +++-
>> hw/ppc/spapr_caps.c | 11 +-
>> hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++++
>> include/hw/ppc/spapr.h | 74 +++++++++-
>> target/ppc/cpu.h | 3 +
>> 5 files changed, 431 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3a5cf92c94..6988e3ec76 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>> {
>> SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>>
>> - assert(lpid == 0);
>> + if (!cpu->in_spapr_nested) {
>
> Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
> it belongs to PowerPCCPU. See the end of the patch, for a proposal.
SpaprCpuState. Certainly that's a better place, I must have missed it.
>
> btw, this helps the ordering of files :
>
> [diff]
> orderFile = /path/to/qemu/scripts/git.orderfile
>
>> + assert(lpid == 0);
>>
>> - /* Copy PATE1:GR into PATE0:HR */
>> - entry->dw0 = spapr->patb_entry & PATE0_HR;
>> - entry->dw1 = spapr->patb_entry;
>> + /* Copy PATE1:GR into PATE0:HR */
>> + entry->dw0 = spapr->patb_entry & PATE0_HR;
>> + entry->dw1 = spapr->patb_entry;
>> +
>> + } else {
>> + uint64_t patb, pats;
>> +
>> + assert(lpid != 0);
>> +
>> + patb = spapr->nested_ptcr & PTCR_PATB;
>> + pats = spapr->nested_ptcr & PTCR_PATS;
>> +
>> + /* Calculate number of entries */
>> + pats = 1ull << (pats + 12 - 4);
>> + if (pats <= lpid) {
>> + return false;
>> + }
>> +
>> + /* Grab entry */
>> + patb += 16 * lpid;
>> + entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
>> + entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
>> + }
>>
>> return true;
>> }
>> @@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
>>
>> static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
>> {
>> - return false;
>> + return cpu->in_spapr_nested;
>> }
>>
>> static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
>> @@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>> nc->nmi_monitor_handler = spapr_nmi;
>> smc->phb_placement = spapr_phb_placement;
>> vhc->cpu_in_nested = spapr_cpu_in_nested;
>> + vhc->deliver_hv_excp = spapr_exit_nested;
>> vhc->hypercall = emulate_spapr_hypercall;
>> vhc->hpt_mask = spapr_hpt_mask;
>> vhc->map_hptes = spapr_map_hptes;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 5cc80776d0..4d8bb2ad2c 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>> {
>> ERRP_GUARD();
>> PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>> + CPUPPCState *env = &cpu->env;
>>
>> if (!val) {
>> /* capability disabled by default */
>> return;
>> }
>>
>> - if (tcg_enabled()) {
>> - error_setg(errp, "No Nested KVM-HV support in TCG");
>
> I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
> KVM-over-pseries. I think the platform name is important. Anyhow, this is
> a more global discussion but we should talk about it someday because these
> HV mode are becoming confusing ! We have PR also :)
The cap is nested-hv and QEMU describes it nested KVM HV. Are we stuck
with that? That could make a name change even more confusing.
It's really a new backend for the KVM HV front end. Like how POWER8 /
POWER9 bare metal backends are completely different now.
But I guess that does not help the end user to understand. On the other
hand, the user might not think "HV" is the HV mode of the CPU and just
thinks of it as "hypervisor".
I like paravirt-hv but nested-hv is not too bad. Anyway I'm happy to
change it.
>
>
>> + if (!(env->insns_flags2 & PPC2_ISA300)) {
>> + error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
>> error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
>
> return ?
Yep.
>> +static target_ulong h_enter_nested(PowerPCCPU *cpu,
>> + SpaprMachineState *spapr,
>> + target_ulong opcode,
>> + target_ulong *args)
>> +{
>> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> + CPUState *cs = CPU(cpu);
>> + CPUPPCState *env = &cpu->env;
>> + target_ulong hv_ptr = args[0];
>> + target_ulong regs_ptr = args[1];
>> + target_ulong hdec, now = cpu_ppc_load_tbl(env);
>> + target_ulong lpcr, lpcr_mask;
>> + struct kvmppc_hv_guest_state *hvstate;
>> + struct kvmppc_hv_guest_state hv_state;
>> + struct kvmppc_pt_regs *regs;
>> + hwaddr len;
>> + uint32_t cr;
>> + int i;
>> +
>> + if (cpu->in_spapr_nested) {
>> + return H_FUNCTION;
>
> That would be an L3 :)
Well if the L2 makes the hcall, vhyp won't handle it but rather it
will cause L2 exit to L1 and the L1 will handle the H_ENTER_NESTED
hcall. So we can (and have) run an L3 guest under the L2 of this
machine :)
This is probably more of an assert(!cpu->in_spapr_nested). Actually
that assert could go in the general spapr hypercall handler.
>
>> + }
>> + if (spapr->nested_ptcr == 0) {
>> + return H_NOT_AVAILABLE;
>> + }
>> +
>> + len = sizeof(*hvstate);
>> + hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
>
> When a CPU is available, I would prefer :
>
> hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
> MEMTXATTRS_UNSPECIFIED);
>
> like ppc_hash64_map_hptes() does. This is minor.
I'll check it out. Still not entire sure about read+write access
though.
>
>> + if (!hvstate || len != sizeof(*hvstate)) {
>> + return H_PARAMETER;
>> + }
>> +
>> + memcpy(&hv_state, hvstate, len);
>> +
>> + cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
>
> checkpatch will complain with the above comments.
Yeah it did. Turns out I also had a bug where I missed setting write
access further down.
>
>> +
>> + /*
>> + * We accept versions 1 and 2. Version 2 fields are unused because TCG
>> + * does not implement DAWR*.
>> + */
>> + if (hv_state.version > HV_GUEST_STATE_VERSION) {
>> + return H_PARAMETER;
>> + }
>> +
>> + cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
>
> I think we could preallocate this buffer once we know nested are supported,
> or if we keep it, it could be our 'in_spapr_nested' indicator.
That's true. I kind of liked to allocate on demand, but for performance
and robustness might be better to keep it around (could allocate when we
see a H_SET_PARTITION_TABLE.
I'll just keep it as is for the first iteration. Probably in fact we
would rather make a specific structure for it that only has what we
require rather than the entire CPUPPCState so all this can be optimised
a bit in a later round.
>> +struct kvmppc_hv_guest_state {
>> + uint64_t version; /* version of this structure layout, must be first */
>> + uint32_t lpid;
>> + uint32_t vcpu_token;
>> + /* These registers are hypervisor privileged (at least for writing) */
>> + uint64_t lpcr;
>> + uint64_t pcr;
>> + uint64_t amor;
>> + uint64_t dpdes;
>> + uint64_t hfscr;
>> + int64_t tb_offset;
>> + uint64_t dawr0;
>> + uint64_t dawrx0;
>> + uint64_t ciabr;
>> + uint64_t hdec_expiry;
>> + uint64_t purr;
>> + uint64_t spurr;
>> + uint64_t ic;
>> + uint64_t vtb;
>> + uint64_t hdar;
>> + uint64_t hdsisr;
>> + uint64_t heir;
>> + uint64_t asdr;
>> + /* These are OS privileged but need to be set late in guest entry */
>> + uint64_t srr0;
>> + uint64_t srr1;
>> + uint64_t sprg[4];
>> + uint64_t pidr;
>> + uint64_t cfar;
>> + uint64_t ppr;
>> + /* Version 1 ends here */
>> + uint64_t dawr1;
>> + uint64_t dawrx1;
>> + /* Version 2 ends here */
>> +};
>> +
>> +/* Latest version of hv_guest_state structure */
>> +#define HV_GUEST_STATE_VERSION 2
>> +
>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>> +struct kvmppc_pt_regs {
>> + uint64_t gpr[32];
>> + uint64_t nip;
>> + uint64_t msr;
>> + uint64_t orig_gpr3; /* Used for restarting system calls */
>> + uint64_t ctr;
>> + uint64_t link;
>> + uint64_t xer;
>> + uint64_t ccr;
>> + uint64_t softe; /* Soft enabled/disabled */
>> + uint64_t trap; /* Reason for being here */
>> + uint64_t dar; /* Fault registers */
>> + uint64_t dsisr; /* on 4xx/Book-E used for ESR */
>> + uint64_t result; /* Result of a system call */
>> +};
>
> The above structs are shared with KVM for this QEMU implementation.
>
> I don't think they belong to asm-powerpc/kvm.h but how could we keep them
> in sync ? The version should be protecting us from unexpected changes.
Not sure how we should do that. How are other PAPR API definitions kept
in synch? I guess they just have a document spec for the upstream. Paul
made a spec document for the nested HV stuff, not sure if he's put it up
in public anywhere. Maybe we could maintain it in linux/Documentation/
or similar?
Anyway for now I guess we keep this?
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index d8cc956c97..65c4401130 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1301,6 +1301,9 @@ struct PowerPCCPU {
>> bool pre_2_10_migration;
>> bool pre_3_0_migration;
>> int32_t mig_slb_nr;
>> +
>> + bool in_spapr_nested;
>> + CPUPPCState *nested_host_state;
>> };
>
> These new fields belong to SpaprCpuState. I shouldn't be too hard to adapt.
Thanks for the pointer, that's what I was looking for. Must not have
looked very hard :)
Thanks,
Nick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
2022-02-16 1:16 ` Nicholas Piggin
@ 2022-02-16 10:23 ` Cédric Le Goater
0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-16 10:23 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/16/22 02:16, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of February 16, 2022 4:21 am:
>> On 2/15/22 04:16, Nicholas Piggin wrote:
>>> This implements the Nested KVM HV hcall API for spapr under TCG.
>>>
>>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>>> L1 is switched back in returned from the hcall when a HV exception
>>> is sent to the vhyp. Register state is copied in and out according to
>>> the nested KVM HV hcall API specification.
>>>
>>> The hdecr timer is started when the L2 is switched in, and it provides
>>> the HDEC / 0x980 return to L1.
>>>
>>> The MMU re-uses the bare metal radix 2-level page table walker by
>>> using the get_pate method to point the MMU to the nested partition
>>> table entry. MMU faults due to partition scope errors raise HV
>>> exceptions and accordingly are routed back to the L1.
>>>
>>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>>> and exit).
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> hw/ppc/spapr.c | 32 +++-
>>> hw/ppc/spapr_caps.c | 11 +-
>>> hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++++
>>> include/hw/ppc/spapr.h | 74 +++++++++-
>>> target/ppc/cpu.h | 3 +
>>> 5 files changed, 431 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 3a5cf92c94..6988e3ec76 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>>> {
>>> SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>>>
>>> - assert(lpid == 0);
>>> + if (!cpu->in_spapr_nested) {
>>
>> Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
>> it belongs to PowerPCCPU. See the end of the patch, for a proposal.
>
> SpaprCpuState. Certainly that's a better place, I must have missed it.
>
>>
>> btw, this helps the ordering of files :
>>
>> [diff]
>> orderFile = /path/to/qemu/scripts/git.orderfile
>>
>>> + assert(lpid == 0);
>>>
>>> - /* Copy PATE1:GR into PATE0:HR */
>>> - entry->dw0 = spapr->patb_entry & PATE0_HR;
>>> - entry->dw1 = spapr->patb_entry;
>>> + /* Copy PATE1:GR into PATE0:HR */
>>> + entry->dw0 = spapr->patb_entry & PATE0_HR;
>>> + entry->dw1 = spapr->patb_entry;
>>> +
>>> + } else {
>>> + uint64_t patb, pats;
>>> +
>>> + assert(lpid != 0);
>>> +
>>> + patb = spapr->nested_ptcr & PTCR_PATB;
>>> + pats = spapr->nested_ptcr & PTCR_PATS;
>>> +
>>> + /* Calculate number of entries */
>>> + pats = 1ull << (pats + 12 - 4);
>>> + if (pats <= lpid) {
>>> + return false;
>>> + }
>>> +
>>> + /* Grab entry */
>>> + patb += 16 * lpid;
>>> + entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
>>> + entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
>>> + }
>>>
>>> return true;
>>> }
>>> @@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
>>>
>>> static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
>>> {
>>> - return false;
>>> + return cpu->in_spapr_nested;
>>> }
>>>
>>> static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
>>> @@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>> nc->nmi_monitor_handler = spapr_nmi;
>>> smc->phb_placement = spapr_phb_placement;
>>> vhc->cpu_in_nested = spapr_cpu_in_nested;
>>> + vhc->deliver_hv_excp = spapr_exit_nested;
>>> vhc->hypercall = emulate_spapr_hypercall;
>>> vhc->hpt_mask = spapr_hpt_mask;
>>> vhc->map_hptes = spapr_map_hptes;
>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>> index 5cc80776d0..4d8bb2ad2c 100644
>>> --- a/hw/ppc/spapr_caps.c
>>> +++ b/hw/ppc/spapr_caps.c
>>> @@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>>> {
>>> ERRP_GUARD();
>>> PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>>> + CPUPPCState *env = &cpu->env;
>>>
>>> if (!val) {
>>> /* capability disabled by default */
>>> return;
>>> }
>>>
>>> - if (tcg_enabled()) {
>>> - error_setg(errp, "No Nested KVM-HV support in TCG");
>>
>> I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
>> KVM-over-pseries. I think the platform name is important. Anyhow, this is
>> a more global discussion but we should talk about it someday because these
>> HV mode are becoming confusing ! We have PR also :)
>
> The cap is nested-hv and QEMU describes it nested KVM HV. Are we stuck
> with that? That could make a name change even more confusing.
>
> It's really a new backend for the KVM HV front end. Like how POWER8 /
> POWER9 bare metal backends are completely different now.
>
> But I guess that does not help the end user to understand. On the other
> hand, the user might not think "HV" is the HV mode of the CPU and just
> thinks of it as "hypervisor".
>
> I like paravirt-hv but nested-hv is not too bad. Anyway I'm happy to
> change it.
>
>>
>>
>>> + if (!(env->insns_flags2 & PPC2_ISA300)) {
>>> + error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
>>> error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
>>
>> return ?
>
> Yep.
>
>>> +static target_ulong h_enter_nested(PowerPCCPU *cpu,
>>> + SpaprMachineState *spapr,
>>> + target_ulong opcode,
>>> + target_ulong *args)
>>> +{
>>> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> + CPUState *cs = CPU(cpu);
>>> + CPUPPCState *env = &cpu->env;
>>> + target_ulong hv_ptr = args[0];
>>> + target_ulong regs_ptr = args[1];
>>> + target_ulong hdec, now = cpu_ppc_load_tbl(env);
>>> + target_ulong lpcr, lpcr_mask;
>>> + struct kvmppc_hv_guest_state *hvstate;
>>> + struct kvmppc_hv_guest_state hv_state;
>>> + struct kvmppc_pt_regs *regs;
>>> + hwaddr len;
>>> + uint32_t cr;
>>> + int i;
>>> +
>>> + if (cpu->in_spapr_nested) {
>>> + return H_FUNCTION;
>>
>> That would be an L3 :)
>
> Well if the L2 makes the hcall, vhyp won't handle it but rather it
> will cause L2 exit to L1 and the L1 will handle the H_ENTER_NESTED
> hcall. So we can (and have) run an L3 guest under the L2 of this
> machine :)
>
> This is probably more of an assert(!cpu->in_spapr_nested). Actually
> that assert could go in the general spapr hypercall handler.
>
>>
>>> + }
>>> + if (spapr->nested_ptcr == 0) {
>>> + return H_NOT_AVAILABLE;
>>> + }
>>> +
>>> + len = sizeof(*hvstate);
>>> + hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
>>
>> When a CPU is available, I would prefer :
>>
>> hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>> MEMTXATTRS_UNSPECIFIED);
>>
>> like ppc_hash64_map_hptes() does. This is minor.
>
> I'll check it out. Still not entire sure about read+write access
> though.
>
>>
>>> + if (!hvstate || len != sizeof(*hvstate)) {
>>> + return H_PARAMETER;
>>> + }
>>> +
>>> + memcpy(&hv_state, hvstate, len);
>>> +
>>> + cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
>>
>> checkpatch will complain with the above comments.
>
> Yeah it did. Turns out I also had a bug where I missed setting write
> access further down.
>
>>
>>> +
>>> + /*
>>> + * We accept versions 1 and 2. Version 2 fields are unused because TCG
>>> + * does not implement DAWR*.
>>> + */
>>> + if (hv_state.version > HV_GUEST_STATE_VERSION) {
>>> + return H_PARAMETER;
>>> + }
>>> +
>>> + cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
>>
>> I think we could preallocate this buffer once we know nested are supported,
>> or if we keep it, it could be our 'in_spapr_nested' indicator.
>
> That's true. I kind of liked to allocate on demand, but for performance
> and robustness might be better to keep it around (could allocate when we
> see a H_SET_PARTITION_TABLE.
>
> I'll just keep it as is for the first iteration. Probably in fact we
> would rather make a specific structure for it that only has what we
> require rather than the entire CPUPPCState so all this can be optimised
> a bit in a later round.
Sure. keep in mind that the pseries machine migrates and this is extra state
to carry on the other side. vmstate_spapr_cpu_state should be modified.
>>> +struct kvmppc_hv_guest_state {
>>> + uint64_t version; /* version of this structure layout, must be first */
>>> + uint32_t lpid;
>>> + uint32_t vcpu_token;
>>> + /* These registers are hypervisor privileged (at least for writing) */
>>> + uint64_t lpcr;
>>> + uint64_t pcr;
>>> + uint64_t amor;
>>> + uint64_t dpdes;
>>> + uint64_t hfscr;
>>> + int64_t tb_offset;
>>> + uint64_t dawr0;
>>> + uint64_t dawrx0;
>>> + uint64_t ciabr;
>>> + uint64_t hdec_expiry;
>>> + uint64_t purr;
>>> + uint64_t spurr;
>>> + uint64_t ic;
>>> + uint64_t vtb;
>>> + uint64_t hdar;
>>> + uint64_t hdsisr;
>>> + uint64_t heir;
>>> + uint64_t asdr;
>>> + /* These are OS privileged but need to be set late in guest entry */
>>> + uint64_t srr0;
>>> + uint64_t srr1;
>>> + uint64_t sprg[4];
>>> + uint64_t pidr;
>>> + uint64_t cfar;
>>> + uint64_t ppr;
>>> + /* Version 1 ends here */
>>> + uint64_t dawr1;
>>> + uint64_t dawrx1;
>>> + /* Version 2 ends here */
>>> +};
>>> +
>>> +/* Latest version of hv_guest_state structure */
>>> +#define HV_GUEST_STATE_VERSION 2
>>> +
>>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>>> +struct kvmppc_pt_regs {
>>> + uint64_t gpr[32];
>>> + uint64_t nip;
>>> + uint64_t msr;
>>> + uint64_t orig_gpr3; /* Used for restarting system calls */
>>> + uint64_t ctr;
>>> + uint64_t link;
>>> + uint64_t xer;
>>> + uint64_t ccr;
>>> + uint64_t softe; /* Soft enabled/disabled */
>>> + uint64_t trap; /* Reason for being here */
>>> + uint64_t dar; /* Fault registers */
>>> + uint64_t dsisr; /* on 4xx/Book-E used for ESR */
>>> + uint64_t result; /* Result of a system call */
>>> +};
>>
>> The above structs are shared with KVM for this QEMU implementation.
>>
>> I don't think they belong to asm-powerpc/kvm.h but how could we keep them
>> in sync ? The version should be protecting us from unexpected changes.
>
> Not sure how we should do that. How are other PAPR API definitions kept
> in synch? I guess they just have a document spec for the upstream. Paul
> made a spec document for the nested HV stuff, not sure if he's put it up
> in public anywhere. Maybe we could maintain it in linux/Documentation/
> or similar?
yes. under linux/Documentation/virt/kvm/
> Anyway for now I guess we keep this?
Yes. May be in its own private header. Something like hw/ppc/spapr_nested.h
Thanks,
C.
>>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>>> index d8cc956c97..65c4401130 100644
>>> --- a/target/ppc/cpu.h
>>> +++ b/target/ppc/cpu.h
>>> @@ -1301,6 +1301,9 @@ struct PowerPCCPU {
>>> bool pre_2_10_migration;
>>> bool pre_3_0_migration;
>>> int32_t mig_slb_nr;
>>> +
>>> + bool in_spapr_nested;
>>> + CPUPPCState *nested_host_state;
>>> };
>>
>> These new fields belong to SpaprCpuState. I shouldn't be too hard to adapt.
>
> Thanks for the pointer, that's what I was looking for. Must not have
> looked very hard :)
>
> Thanks,
> Nick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor
2022-02-15 3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
` (8 preceding siblings ...)
2022-02-15 3:16 ` [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor Nicholas Piggin
@ 2022-02-15 18:33 ` Cédric Le Goater
2022-02-15 18:45 ` Daniel Henrique Barboza
9 siblings, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2022-02-15 18:33 UTC (permalink / raw)
To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Fabiano Rosas
On 2/15/22 04:16, Nicholas Piggin wrote:
> Here is the rollup of patches in much better shape since the RFC.
> I include the 2 first ones unchanged from independent submission
> just to be clear that this series requires them.
>
> Thanks Cedric and Fabiano for wading through my poor quality RFC
> code, very good changes suggested and I hope I got most of them
> and this one is easier to follow.
This is in good shape and functional. I will try to propose a small
buildroot environment for it, so that we don't have to start a full
distro to test.
I would like to talk about the naming. KVM-HV is I think "reserved"
to the PowerNV platform (baremetal). We also have KVM-PR which runs
KVM guests on various platforms, including pseries.
How can we call this yet another KVM PPC implementation ?
Thanks,
C.
>
> Thanks,
> Nick
>
> Nicholas Piggin (9):
> target/ppc: raise HV interrupts for partition table entry problems
> spapr: prevent hdec timer being set up under virtual hypervisor
> ppc: allow the hdecr timer to be created/destroyed
> target/ppc: add vhyp addressing mode helper for radix MMU
> target/ppc: make vhyp get_pate method take lpid and return success
> target/ppc: add helper for books vhyp hypercall handler
> target/ppc: Add powerpc_reset_excp_state helper
> target/ppc: Introduce a vhyp framework for nested HV support
> spapr: implement nested-hv capability for the virtual hypervisor
>
> hw/ppc/pegasos2.c | 6 +
> hw/ppc/ppc.c | 22 ++-
> hw/ppc/spapr.c | 41 ++++-
> hw/ppc/spapr_caps.c | 11 +-
> hw/ppc/spapr_cpu_core.c | 6 +-
> hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/ppc.h | 3 +
> include/hw/ppc/spapr.h | 74 ++++++++-
> target/ppc/cpu.h | 8 +-
> target/ppc/excp_helper.c | 129 ++++++++++++----
> target/ppc/mmu-radix64.c | 41 ++++-
> 11 files changed, 610 insertions(+), 52 deletions(-)
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor
2022-02-15 18:33 ` [PATCH 0/9] ppc: nested KVM HV for spapr " Cédric Le Goater
@ 2022-02-15 18:45 ` Daniel Henrique Barboza
2022-02-15 19:20 ` Fabiano Rosas
0 siblings, 1 reply; 30+ messages in thread
From: Daniel Henrique Barboza @ 2022-02-15 18:45 UTC (permalink / raw)
To: Cédric Le Goater, Nicholas Piggin, qemu-ppc
Cc: qemu-devel, Fabiano Rosas
On 2/15/22 15:33, Cédric Le Goater wrote:
> On 2/15/22 04:16, Nicholas Piggin wrote:
>> Here is the rollup of patches in much better shape since the RFC.
>> I include the 2 first ones unchanged from independent submission
>> just to be clear that this series requires them.
>>
>> Thanks Cedric and Fabiano for wading through my poor quality RFC
>> code, very good changes suggested and I hope I got most of them
>> and this one is easier to follow.
>
> This is in good shape and functional. I will try to propose a small
> buildroot environment for it, so that we don't have to start a full
> distro to test.
>
> I would like to talk about the naming. KVM-HV is I think "reserved"
> to the PowerNV platform (baremetal). We also have KVM-PR which runs
> KVM guests on various platforms, including pseries.
>
> How can we call this yet another KVM PPC implementation ?
Do we need a new name? I believe Nick uses the stock kvm_hv kernel module in this
implementation.
If we want a name to differ between the different KVM-HV usages, well, I'd suggest
KVM-EHV (Emulated HV) or KVM-NHV (Nested HV) or KVM-VHV (Virtual HV) or anything
that suggests that this is a different flavor of using KVM-HV.
Thanks,
Daniel
>
> Thanks,
>
> C.
>
>>
>> Thanks,
>> Nick
>>
>> Nicholas Piggin (9):
>> target/ppc: raise HV interrupts for partition table entry problems
>> spapr: prevent hdec timer being set up under virtual hypervisor
>> ppc: allow the hdecr timer to be created/destroyed
>> target/ppc: add vhyp addressing mode helper for radix MMU
>> target/ppc: make vhyp get_pate method take lpid and return success
>> target/ppc: add helper for books vhyp hypercall handler
>> target/ppc: Add powerpc_reset_excp_state helper
>> target/ppc: Introduce a vhyp framework for nested HV support
>> spapr: implement nested-hv capability for the virtual hypervisor
>>
>> hw/ppc/pegasos2.c | 6 +
>> hw/ppc/ppc.c | 22 ++-
>> hw/ppc/spapr.c | 41 ++++-
>> hw/ppc/spapr_caps.c | 11 +-
>> hw/ppc/spapr_cpu_core.c | 6 +-
>> hw/ppc/spapr_hcall.c | 321 +++++++++++++++++++++++++++++++++++++++
>> include/hw/ppc/ppc.h | 3 +
>> include/hw/ppc/spapr.h | 74 ++++++++-
>> target/ppc/cpu.h | 8 +-
>> target/ppc/excp_helper.c | 129 ++++++++++++----
>> target/ppc/mmu-radix64.c | 41 ++++-
>> 11 files changed, 610 insertions(+), 52 deletions(-)
>>
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor
2022-02-15 18:45 ` Daniel Henrique Barboza
@ 2022-02-15 19:20 ` Fabiano Rosas
2022-02-16 9:09 ` Nicholas Piggin
0 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2022-02-15 19:20 UTC (permalink / raw)
To: Daniel Henrique Barboza, Cédric Le Goater, Nicholas Piggin,
qemu-ppc
Cc: qemu-devel
Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> On 2/15/22 15:33, Cédric Le Goater wrote:
>> On 2/15/22 04:16, Nicholas Piggin wrote:
>>> Here is the rollup of patches in much better shape since the RFC.
>>> I include the 2 first ones unchanged from independent submission
>>> just to be clear that this series requires them.
>>>
>>> Thanks Cedric and Fabiano for wading through my poor quality RFC
>>> code, very good changes suggested and I hope I got most of them
>>> and this one is easier to follow.
>>
>> This is in good shape and functional. I will try to propose a small
>> buildroot environment for it, so that we don't have to start a full
>> distro to test.
>>
>> I would like to talk about the naming. KVM-HV is I think "reserved"
>> to the PowerNV platform (baremetal). We also have KVM-PR which runs
>> KVM guests on various platforms, including pseries.
>>
>> How can we call this yet another KVM PPC implementation ?
>
> Do we need a new name? I believe Nick uses the stock kvm_hv kernel module in this
> implementation.
>
> If we want a name to differ between the different KVM-HV usages, well, I'd suggest
> KVM-EHV (Emulated HV) or KVM-NHV (Nested HV) or KVM-VHV (Virtual HV) or anything
> that suggests that this is a different flavor of using KVM-HV.
I'd say it's imperative to have a clear indication that this is
TCG. Otherwise we'll have people trying to weird stuff with it and
complaining that Nested KVM is bugged.
Some ideas:
Emulated Nested KVM
Emulated Nested HV
Nested TCG
The first one is perhaps more accurate, but we'd end up having "kvm"
mentioned in TCG code and that is super confusing.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor
2022-02-15 19:20 ` Fabiano Rosas
@ 2022-02-16 9:09 ` Nicholas Piggin
0 siblings, 0 replies; 30+ messages in thread
From: Nicholas Piggin @ 2022-02-16 9:09 UTC (permalink / raw)
To: Cédric, Le Goater, Daniel Henrique Barboza, Fabiano Rosas,
qemu-ppc
Cc: qemu-devel
Excerpts from Fabiano Rosas's message of February 16, 2022 5:20 am:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>
>> On 2/15/22 15:33, Cédric Le Goater wrote:
>>> On 2/15/22 04:16, Nicholas Piggin wrote:
>>>> Here is the rollup of patches in much better shape since the RFC.
>>>> I include the 2 first ones unchanged from independent submission
>>>> just to be clear that this series requires them.
>>>>
>>>> Thanks Cedric and Fabiano for wading through my poor quality RFC
>>>> code, very good changes suggested and I hope I got most of them
>>>> and this one is easier to follow.
>>>
>>> This is in good shape and functional. I will try to propose a small
>>> buildroot environment for it, so that we don't have to start a full
>>> distro to test.
>>>
>>> I would like to talk about the naming. KVM-HV is I think "reserved"
>>> to the PowerNV platform (baremetal). We also have KVM-PR which runs
>>> KVM guests on various platforms, including pseries.
>>>
>>> How can we call this yet another KVM PPC implementation ?
>>
>> Do we need a new name? I believe Nick uses the stock kvm_hv kernel module in this
>> implementation.
>>
>> If we want a name to differ between the different KVM-HV usages, well, I'd suggest
>> KVM-EHV (Emulated HV) or KVM-NHV (Nested HV) or KVM-VHV (Virtual HV) or anything
>> that suggests that this is a different flavor of using KVM-HV.
>
> I'd say it's imperative to have a clear indication that this is
> TCG. Otherwise we'll have people trying to weird stuff with it and
> complaining that Nested KVM is bugged.
Difficult to convey that in the L2 I think, but that is the case
no matter what we call it in the L0 AFAIKS.
> Some ideas:
>
> Emulated Nested KVM
> Emulated Nested HV
> Nested TCG
>
> The first one is perhaps more accurate, but we'd end up having "kvm"
> mentioned in TCG code and that is super confusing.
It provides the "nested HV" hypervisor API so it can support guests
that use the nested HV KVM backend. The matter between TCG and real
metal is on top of that -- the user knows TCG is emulated.
So, not sure how to go. We could remove the KVM name. The cap itself
is just called nested-hv and that's what KVM uses. I think KVM here
was added in the description just so you would konw that KVM can be
run on it.
Thanks,
Nick
^ permalink raw reply [flat|nested] 30+ messages in thread