Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Cédric Le Goater" <clg@kaod.org>
To: Nicholas Piggin <npiggin@gmail.com>, <qemu-ppc@nongnu.org>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@linux.ibm.com>
Subject: Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
Date: Wed, 16 Feb 2022 11:23:46 +0100	[thread overview]
Message-ID: <42fcc87d-dd7b-c42b-a36d-5ccd5a314348@kaod.org> (raw)
In-Reply-To: <1644972569.qjmfk874wg.astroid@bobo.none>

On 2/16/22 02:16, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of February 16, 2022 4:21 am:
>> On 2/15/22 04:16, Nicholas Piggin wrote:
>>> This implements the Nested KVM HV hcall API for spapr under TCG.
>>>
>>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>>> L1 is switched back in returned from the hcall when a HV exception
>>> is sent to the vhyp. Register state is copied in and out according to
>>> the nested KVM HV hcall API specification.
>>>
>>> The hdecr timer is started when the L2 is switched in, and it provides
>>> the HDEC / 0x980 return to L1.
>>>
>>> The MMU re-uses the bare metal radix 2-level page table walker by
>>> using the get_pate method to point the MMU to the nested partition
>>> table entry. MMU faults due to partition scope errors raise HV
>>> exceptions and accordingly are routed back to the L1.
>>>
>>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>>> and exit).
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>    hw/ppc/spapr.c         |  32 +++-
>>>    hw/ppc/spapr_caps.c    |  11 +-
>>>    hw/ppc/spapr_hcall.c   | 321 +++++++++++++++++++++++++++++++++++++++++
>>>    include/hw/ppc/spapr.h |  74 +++++++++-
>>>    target/ppc/cpu.h       |   3 +
>>>    5 files changed, 431 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 3a5cf92c94..6988e3ec76 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>>>    {
>>>        SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>>>    
>>> -    assert(lpid == 0);
>>> +    if (!cpu->in_spapr_nested) {
>>
>> Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
>> it belongs to PowerPCCPU. See the end of the patch, for a proposal.
> 
> SpaprCpuState. Certainly that's a better place, I must have missed it.
> 
>>
>> btw, this helps the ordering of files :
>>
>> [diff]
>> 	orderFile = /path/to/qemu/scripts/git.orderfile
>>
>>> +        assert(lpid == 0);
>>>    
>>> -    /* Copy PATE1:GR into PATE0:HR */
>>> -    entry->dw0 = spapr->patb_entry & PATE0_HR;
>>> -    entry->dw1 = spapr->patb_entry;
>>> +        /* Copy PATE1:GR into PATE0:HR */
>>> +        entry->dw0 = spapr->patb_entry & PATE0_HR;
>>> +        entry->dw1 = spapr->patb_entry;
>>> +
>>> +    } else {
>>> +        uint64_t patb, pats;
>>> +
>>> +        assert(lpid != 0);
>>> +
>>> +        patb = spapr->nested_ptcr & PTCR_PATB;
>>> +        pats = spapr->nested_ptcr & PTCR_PATS;
>>> +
>>> +        /* Calculate number of entries */
>>> +        pats = 1ull << (pats + 12 - 4);
>>> +        if (pats <= lpid) {
>>> +            return false;
>>> +        }
>>> +
>>> +        /* Grab entry */
>>> +        patb += 16 * lpid;
>>> +        entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
>>> +        entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
>>> +    }
>>>    
>>>        return true;
>>>    }
>>> @@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
>>>    
>>>    static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
>>>    {
>>> -    return false;
>>> +    return cpu->in_spapr_nested;
>>>    }
>>>    
>>>    static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
>>> @@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>        nc->nmi_monitor_handler = spapr_nmi;
>>>        smc->phb_placement = spapr_phb_placement;
>>>        vhc->cpu_in_nested = spapr_cpu_in_nested;
>>> +    vhc->deliver_hv_excp = spapr_exit_nested;
>>>        vhc->hypercall = emulate_spapr_hypercall;
>>>        vhc->hpt_mask = spapr_hpt_mask;
>>>        vhc->map_hptes = spapr_map_hptes;
>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>> index 5cc80776d0..4d8bb2ad2c 100644
>>> --- a/hw/ppc/spapr_caps.c
>>> +++ b/hw/ppc/spapr_caps.c
>>> @@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>>>    {
>>>        ERRP_GUARD();
>>>        PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>>> +    CPUPPCState *env = &cpu->env;
>>>    
>>>        if (!val) {
>>>            /* capability disabled by default */
>>>            return;
>>>        }
>>>    
>>> -    if (tcg_enabled()) {
>>> -        error_setg(errp, "No Nested KVM-HV support in TCG");
>>
>> I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
>> KVM-over-pseries. I think the platform name is important. Anyhow, this is
>> a more global discussion but we should talk about it someday because these
>> HV mode are becoming confusing ! We have PR also :)
> 
> The cap is nested-hv and QEMU describes it nested KVM HV. Are we stuck
> with that? That could make a name change even more confusing.
> 
> It's really a new backend for the KVM HV front end. Like how POWER8 /
> POWER9 bare metal backends are completely different now.
> 
> But I guess that does not help the end user to understand. On the other
> hand, the user might not think "HV" is the HV mode of the CPU and just
> thinks of it as "hypervisor".
> 
> I like paravirt-hv but nested-hv is not too bad. Anyway I'm happy to
> change it.
> 
>>
>>
>>> +    if (!(env->insns_flags2 & PPC2_ISA300)) {
>>> +        error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
>>>            error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
>>
>> return ?
> 
> Yep.
> 
>>> +static target_ulong h_enter_nested(PowerPCCPU *cpu,
>>> +                                   SpaprMachineState *spapr,
>>> +                                   target_ulong opcode,
>>> +                                   target_ulong *args)
>>> +{
>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> +    CPUState *cs = CPU(cpu);
>>> +    CPUPPCState *env = &cpu->env;
>>> +    target_ulong hv_ptr = args[0];
>>> +    target_ulong regs_ptr = args[1];
>>> +    target_ulong hdec, now = cpu_ppc_load_tbl(env);
>>> +    target_ulong lpcr, lpcr_mask;
>>> +    struct kvmppc_hv_guest_state *hvstate;
>>> +    struct kvmppc_hv_guest_state hv_state;
>>> +    struct kvmppc_pt_regs *regs;
>>> +    hwaddr len;
>>> +    uint32_t cr;
>>> +    int i;
>>> +
>>> +    if (cpu->in_spapr_nested) {
>>> +        return H_FUNCTION;
>>
>> That would be an L3 :)
> 
> Well if the L2 makes the hcall, vhyp won't handle it but rather it
> will cause L2 exit to L1 and the L1 will handle the H_ENTER_NESTED
> hcall. So we can (and have) run an L3 guest under the L2 of this
> machine :)
>
> This is probably more of an assert(!cpu->in_spapr_nested). Actually
> that assert could go in the general spapr hypercall handler.
> 
>>
>>> +    }
>>> +    if (spapr->nested_ptcr == 0) {
>>> +        return H_NOT_AVAILABLE;
>>> +    }
>>> +
>>> +    len = sizeof(*hvstate);
>>> +    hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
>>
>> When a CPU is available, I would prefer :
>>
>>       hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>>                                 MEMTXATTRS_UNSPECIFIED);
>>       
>> like ppc_hash64_map_hptes() does. This is minor.
> 
> I'll check it out. Still not entire sure about read+write access
> though.
> 
>>
>>> +    if (!hvstate || len != sizeof(*hvstate)) {
>>> +        return H_PARAMETER;
>>> +    }
>>> +
>>> +    memcpy(&hv_state, hvstate, len);
>>> +
>>> +    cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len */);
>>
>> checkpatch will complain with the above comments.
> 
> Yeah it did. Turns out I also had a bug where I missed setting write
> access further down.
> 
>>
>>> +
>>> +    /*
>>> +     * We accept versions 1 and 2. Version 2 fields are unused because TCG
>>> +     * does not implement DAWR*.
>>> +     */
>>> +    if (hv_state.version > HV_GUEST_STATE_VERSION) {
>>> +        return H_PARAMETER;
>>> +    }
>>> +
>>> +    cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
>>
>> I think we could preallocate this buffer once we know nested are supported,
>> or if we keep it, it could be our 'in_spapr_nested' indicator.
> 
> That's true. I kind of liked to allocate on demand, but for performance
> and robustness might be better to keep it around (could allocate when we
> see a H_SET_PARTITION_TABLE.
> 
> I'll just keep it as is for the first iteration. Probably in fact we
> would rather make a specific structure for it that only has what we
> require rather than the entire CPUPPCState so all this can be optimised
> a bit in a later round.

Sure. keep in mind that the pseries machine migrates and this is extra state
to carry on the other side. vmstate_spapr_cpu_state should be modified.

>>> +struct kvmppc_hv_guest_state {
>>> +    uint64_t version;        /* version of this structure layout, must be first */
>>> +    uint32_t lpid;
>>> +    uint32_t vcpu_token;
>>> +    /* These registers are hypervisor privileged (at least for writing) */
>>> +    uint64_t lpcr;
>>> +    uint64_t pcr;
>>> +    uint64_t amor;
>>> +    uint64_t dpdes;
>>> +    uint64_t hfscr;
>>> +    int64_t tb_offset;
>>> +    uint64_t dawr0;
>>> +    uint64_t dawrx0;
>>> +    uint64_t ciabr;
>>> +    uint64_t hdec_expiry;
>>> +    uint64_t purr;
>>> +    uint64_t spurr;
>>> +    uint64_t ic;
>>> +    uint64_t vtb;
>>> +    uint64_t hdar;
>>> +    uint64_t hdsisr;
>>> +    uint64_t heir;
>>> +    uint64_t asdr;
>>> +    /* These are OS privileged but need to be set late in guest entry */
>>> +    uint64_t srr0;
>>> +    uint64_t srr1;
>>> +    uint64_t sprg[4];
>>> +    uint64_t pidr;
>>> +    uint64_t cfar;
>>> +    uint64_t ppr;
>>> +    /* Version 1 ends here */
>>> +    uint64_t dawr1;
>>> +    uint64_t dawrx1;
>>> +    /* Version 2 ends here */
>>> +};
>>> +
>>> +/* Latest version of hv_guest_state structure */
>>> +#define HV_GUEST_STATE_VERSION  2
>>> +
>>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>>> +struct kvmppc_pt_regs {
>>> +    uint64_t gpr[32];
>>> +    uint64_t nip;
>>> +    uint64_t msr;
>>> +    uint64_t orig_gpr3;    /* Used for restarting system calls */
>>> +    uint64_t ctr;
>>> +    uint64_t link;
>>> +    uint64_t xer;
>>> +    uint64_t ccr;
>>> +    uint64_t softe;        /* Soft enabled/disabled */
>>> +    uint64_t trap;         /* Reason for being here */
>>> +    uint64_t dar;          /* Fault registers */
>>> +    uint64_t dsisr;        /* on 4xx/Book-E used for ESR */
>>> +    uint64_t result;       /* Result of a system call */
>>> +};
>>
>> The above structs are shared with KVM for this QEMU implementation.
>>
>> I don't think they belong to asm-powerpc/kvm.h but how could we keep them
>> in sync ? The version should be protecting us from unexpected changes.
> 
> Not sure how we should do that. How are other PAPR API definitions kept
> in synch? I guess they just have a document spec for the upstream. Paul
> made a spec document for the nested HV stuff, not sure if he's put it up
> in public anywhere. Maybe we could maintain it in linux/Documentation/
> or similar?

yes. under linux/Documentation/virt/kvm/

> Anyway for now I guess we keep this?

Yes. May be in its own private header. Something like hw/ppc/spapr_nested.h


Thanks,

C.


>>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>>> index d8cc956c97..65c4401130 100644
>>> --- a/target/ppc/cpu.h
>>> +++ b/target/ppc/cpu.h
>>> @@ -1301,6 +1301,9 @@ struct PowerPCCPU {
>>>        bool pre_2_10_migration;
>>>        bool pre_3_0_migration;
>>>        int32_t mig_slb_nr;
>>> +
>>> +    bool in_spapr_nested;
>>> +    CPUPPCState *nested_host_state;
>>>    };
>>
>> These new fields belong to SpaprCpuState. I shouldn't be too hard to adapt.
> 
> Thanks for the pointer, that's what I was looking for. Must not have
> looked very hard :)
> 
> Thanks,
> Nick

next prev parent reply	other threads:[~2022-02-16 10:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-15  3:16 [PATCH 0/9] ppc: nested KVM HV for spapr virtual hypervisor Nicholas Piggin
2022-02-15  3:16 ` [PATCH 1/9] target/ppc: raise HV interrupts for partition table entry problems Nicholas Piggin
2022-02-15  8:29   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 2/9] spapr: prevent hdec timer being set up under virtual hypervisor Nicholas Piggin
2022-02-15 18:34   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 3/9] ppc: allow the hdecr timer to be created/destroyed Nicholas Piggin
2022-02-15 18:36   ` Cédric Le Goater
2022-02-16  0:36     ` Nicholas Piggin
2022-02-15  3:16 ` [PATCH 4/9] target/ppc: add vhyp addressing mode helper for radix MMU Nicholas Piggin
2022-02-15 10:01   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 5/9] target/ppc: make vhyp get_pate method take lpid and return success Nicholas Piggin
2022-02-15 10:03   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 6/9] target/ppc: add helper for books vhyp hypercall handler Nicholas Piggin
2022-02-15 10:04   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 7/9] target/ppc: Add powerpc_reset_excp_state helper Nicholas Piggin
2022-02-15 10:04   ` Cédric Le Goater
2022-02-15  3:16 ` [PATCH 8/9] target/ppc: Introduce a vhyp framework for nested HV support Nicholas Piggin
2022-02-15 15:59   ` Fabiano Rosas
2022-02-15 17:28   ` Cédric Le Goater
2022-02-15 19:19     ` BALATON Zoltan
2022-02-16  0:49       ` Nicholas Piggin
2022-02-15  3:16 ` [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor Nicholas Piggin
2022-02-15 16:01   ` Fabiano Rosas
2022-02-15 18:21   ` Cédric Le Goater
2022-02-16  1:16     ` Nicholas Piggin
2022-02-16 10:23       ` Cédric Le Goater [this message]
2022-02-15 18:33 ` [PATCH 0/9] ppc: nested KVM HV for spapr " Cédric Le Goater
2022-02-15 18:45   ` Daniel Henrique Barboza
2022-02-15 19:20     ` Fabiano Rosas
2022-02-16  9:09       ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42fcc87d-dd7b-c42b-a36d-5ccd5a314348@kaod.org \
    --to=clg@kaod.org \
    --cc=farosas@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).