From: "Cédric Le Goater" <clg@kaod.org>
To: Fabiano Rosas <farosas@linux.ibm.com>,
Nicholas Piggin <npiggin@gmail.com>, <kvm-ppc@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v3 19/41] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path
Date: Mon, 22 Mar 2021 17:12:39 +0100 [thread overview]
Message-ID: <a335d4a5-9d98-9f27-cfd3-b45dd1c07c9e@kaod.org> (raw)
In-Reply-To: <87o8fh21iq.fsf@linux.ibm.com>
On 3/17/21 5:22 PM, Fabiano Rosas wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>>
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>>
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>>
>> So it's much less important to handle hcalls in real mode in P9.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>
> <snip>
>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 497f216ad724..1f2ba8955c6a 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1147,7 +1147,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>> * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>> * that the cede logic in kvmppc_run_single_vcpu() works properly.
>> */
>> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
>> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>
> The comment above needs to be updated I think.
>
>> {
>> vcpu->arch.shregs.msr |= MSR_EE;
>> vcpu->arch.ceded = 1;
>> @@ -1403,9 +1403,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>> /* hcall - punt to userspace */
>> int i;
>>
>> - /* hypercall with MSR_PR has already been handled in rmode,
>> - * and never reaches here.
>> - */
>> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> + /*
>> + * Guest userspace executed sc 1, reflect it back as a
>> + * privileged program check interrupt.
>> + */
>> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
>> + r = RESUME_GUEST;
>> + break;
>> + }
>>
>> run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
>> for (i = 0; i < 9; ++i)
>> @@ -3740,15 +3746,36 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>> /* H_CEDE has to be handled now, not later */
>> if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
>> kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
>> - kvmppc_nested_cede(vcpu);
>> + kvmppc_cede(vcpu);
>> kvmppc_set_gpr(vcpu, 3, 0);
>> trap = 0;
>> }
>> } else {
>> kvmppc_xive_push_vcpu(vcpu);
>> trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
>> - kvmppc_xive_pull_vcpu(vcpu);
>> + /* H_CEDE has to be handled now, not later */
>> + /* XICS hcalls must be handled before xive is pulled */
>> + if (trap == BOOK3S_INTERRUPT_SYSCALL &&
>> + !(vcpu->arch.shregs.msr & MSR_PR)) {
>> + unsigned long req = kvmppc_get_gpr(vcpu, 3);
>>
>> + if (req == H_CEDE) {
>> + kvmppc_cede(vcpu);
>> + kvmppc_xive_cede_vcpu(vcpu); /* may un-cede */
>> + kvmppc_set_gpr(vcpu, 3, 0);
>> + trap = 0;
>> + }
>> + if (req == H_EOI || req == H_CPPR ||
>> + req == H_IPI || req == H_IPOLL ||
>> + req == H_XIRR || req == H_XIRR_X) {
>> + unsigned long ret;
>> +
>> + ret = kvmppc_xive_xics_hcall(vcpu, req);
>> + kvmppc_set_gpr(vcpu, 3, ret);
>> + trap = 0;
>> + }
>> + }
>
> I tried running L2 with xive=off and this code slows down the boot
> considerably. I think we're missing a !vcpu->arch.nested in the
> conditional.
L2 by default will always use the XIVE emulation in QEMU. If you deactivate
XIVE support in the L2, with "xive=off" in the OS, or "ic-mode=xics" in the
L1 QEMU, it will use the legacy XICS mode, emulated in the L1 KVM-on-pseries.
And yes, the QEMU XIVE emulation tends to be faster. I don't exactly know
why. Probably because of less exit/entries ?
C.
> This may also be missing these checks from kvmppc_pseries_do_hcall:
>
> if (kvmppc_xics_enabled(vcpu)) {
> if (xics_on_xive()) {
> ret = H_NOT_AVAILABLE;
> return RESUME_GUEST;
> }
> ret = kvmppc_xics_hcall(vcpu, req);
> (...)
>
> For H_CEDE there might be a similar situation since we're shadowing the
> code above that runs after H_ENTER_NESTED by setting trap to 0 here.
>
>> + kvmppc_xive_pull_vcpu(vcpu);
>> }
>>
>> vcpu->arch.slb_max = 0;
>> @@ -4408,8 +4435,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>> else
>> r = kvmppc_run_vcpu(vcpu);
>>
>> - if (run->exit_reason == KVM_EXIT_PAPR_HCALL &&
>> - !(vcpu->arch.shregs.msr & MSR_PR)) {
>> + if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
>> + if (WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_PR)) {
>> + r = RESUME_GUEST;
>> + continue;
>> + }
>> trace_kvm_hcall_enter(vcpu);
>> r = kvmppc_pseries_do_hcall(vcpu);
>> trace_kvm_hcall_exit(vcpu, r);
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index c11597f815e4..2d0d14ed1d92 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1397,9 +1397,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>> mr r4,r9
>> bge fast_guest_return
>> 2:
>> + /* If we came in through the P9 short path, no real mode hcalls */
>> + lwz r0, STACK_SLOT_SHORT_PATH(r1)
>> + cmpwi r0, 0
>> + bne no_try_real
>> /* See if this is an hcall we can handle in real mode */
>> cmpwi r12,BOOK3S_INTERRUPT_SYSCALL
>> beq hcall_try_real_mode
>> +no_try_real:
>>
>> /* Hypervisor doorbell - exit only if host IPI flag set */
>> cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index 52cdb9e2660a..1e4871bbcad4 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -158,6 +158,40 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
>> }
>> EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
>>
>> +void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> + void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
>> +
>> + if (!esc_vaddr)
>> + return;
>> +
>> + /* we are using XIVE with single escalation */
>> +
>> + if (vcpu->arch.xive_esc_on) {
>> + /*
>> + * If we still have a pending escalation, abort the cede,
>> + * and we must set PQ to 10 rather than 00 so that we don't
>> + * potentially end up with two entries for the escalation
>> + * interrupt in the XIVE interrupt queue. In that case
>> + * we also don't want to set xive_esc_on to 1 here in
>> + * case we race with xive_esc_irq().
>> + */
>> + vcpu->arch.ceded = 0;
>> + /*
>> + * The escalation interrupts are special as we don't EOI them.
>> + * There is no need to use the load-after-store ordering offset
>> + * to set PQ to 10 as we won't use StoreEOI.
>> + */
>> + __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_10);
>> + } else {
>> + vcpu->arch.xive_esc_on = true;
>> + mb();
>> + __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
>> + }
>> + mb();
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_xive_cede_vcpu);
>> +
>> /*
>> * This is a simple trigger for a generic XIVE IRQ. This must
>> * only be called for interrupts that support a trigger page
>> @@ -2106,6 +2140,32 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>> return 0;
>> }
>>
>> +int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
>> +{
>> + struct kvmppc_vcore *vc = vcpu->arch.vcore;
>> +
>> + switch (req) {
>> + case H_XIRR:
>> + return xive_vm_h_xirr(vcpu);
>> + case H_CPPR:
>> + return xive_vm_h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
>> + case H_EOI:
>> + return xive_vm_h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
>> + case H_IPI:
>> + return xive_vm_h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
>> + kvmppc_get_gpr(vcpu, 5));
>> + case H_IPOLL:
>> + return xive_vm_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
>> + case H_XIRR_X:
>> + xive_vm_h_xirr(vcpu);
>> + kvmppc_set_gpr(vcpu, 5, get_tb() + vc->tb_offset);
>> + return H_SUCCESS;
>> + }
>> +
>> + return H_UNSUPPORTED;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_xive_xics_hcall);
>> +
>> int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
>> {
>> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
next prev parent reply other threads:[~2021-03-22 16:13 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-05 15:05 [PATCH v3 00/41] KVM: PPC: Book3S: C-ify the P9 entry/exit code Nicholas Piggin
2021-03-05 15:05 ` [PATCH v3 01/41] KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2 Nicholas Piggin
2021-03-08 15:26 ` Fabiano Rosas
2021-03-09 1:11 ` Nicholas Piggin
2021-03-05 15:05 ` [PATCH v3 02/41] KVM: PPC: Book3S HV: Prevent radix guests from setting LPCR[TC] Nicholas Piggin
2021-03-08 15:47 ` Fabiano Rosas
2021-03-09 1:14 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 03/41] KVM: PPC: Book3S HV: Remove redundant mtspr PSPB Nicholas Piggin
2021-03-12 5:07 ` Daniel Axtens
2021-03-05 15:06 ` [PATCH v3 04/41] KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 05/41] KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 06/41] powerpc/64s: Remove KVM handler support from CBE_RAS interrupts Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 07/41] powerpc/64s: remove KVM SKIP test from instruction breakpoint handler Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 08/41] KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 09/41] KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 10/41] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 11/41] KVM: PPC: Book3S 64: add hcall interrupt handler Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 12/41] KVM: PPC: Book3S 64: Move hcall early register setup to KVM Nicholas Piggin
2021-03-12 5:45 ` Daniel Axtens
2021-03-16 3:43 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 13/41] KVM: PPC: Book3S 64: Move interrupt " Nicholas Piggin
2021-03-20 7:19 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 14/41] KVM: PPC: Book3S 64: move bad_host_intr check to HV handler Nicholas Piggin
2021-03-20 9:07 ` Alexey Kardashevskiy
2021-03-22 3:18 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 15/41] KVM: PPC: Book3S 64: Minimise hcall handler calling convention differences Nicholas Piggin
2021-03-22 2:09 ` Alexey Kardashevskiy
2021-03-22 4:06 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 16/41] KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together Nicholas Piggin
2021-03-22 4:24 ` Alexey Kardashevskiy
2021-03-22 5:25 ` Nicholas Piggin
2021-03-22 6:21 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 17/41] KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C Nicholas Piggin
2021-03-22 5:05 ` Alexey Kardashevskiy
2021-03-22 16:19 ` Cédric Le Goater
2021-03-22 18:13 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 18/41] KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry Nicholas Piggin
2021-03-22 5:30 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 19/41] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path Nicholas Piggin
2021-03-17 16:22 ` Fabiano Rosas
2021-03-17 22:41 ` Nicholas Piggin
2021-03-22 16:12 ` Cédric Le Goater [this message]
2021-03-22 7:30 ` Alexey Kardashevskiy
2021-03-22 13:15 ` Nicholas Piggin
2021-03-22 16:01 ` Cédric Le Goater
2021-03-22 18:22 ` Nicholas Piggin
2021-03-23 7:26 ` Cédric Le Goater
2021-03-05 15:06 ` [PATCH v3 20/41] KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCR Nicholas Piggin
2021-03-08 17:52 ` Fabiano Rosas
2021-03-22 8:39 ` Alexey Kardashevskiy
2021-03-22 13:24 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 21/41] KVM: PPC: Book3S HV P9: Use large decrementer for HDEC Nicholas Piggin
2021-03-22 7:58 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 22/41] KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 23/41] KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit Nicholas Piggin
2021-03-12 12:55 ` Fabiano Rosas
2021-03-05 15:06 ` [PATCH v3 24/41] powerpc: add set_dec_or_work API for safely updating decrementer Nicholas Piggin
2021-03-22 9:38 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 25/41] KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races Nicholas Piggin
2021-03-23 1:43 ` Alexey Kardashevskiy
2021-03-05 15:06 ` [PATCH v3 26/41] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 27/41] KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into __kvmhv_vcpu_entry_p9 Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 28/41] KVM: PPC: Book3S HV P9: Read machine check registers while MSR[RI] is 0 Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 29/41] KVM: PPC: Book3S HV P9: Improve exit timing accounting coverage Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 30/41] KVM: PPC: Book3S HV P9: Move SPR loading after expiry time check Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 31/41] KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 32/41] KVM: PPC: Book3S HV P9: Switch to guest MMU context as late as possible Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 33/41] KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 34/41] KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9 Nicholas Piggin
2021-03-17 15:11 ` Aneesh Kumar K.V
2021-03-22 3:27 ` Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 35/41] KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 36/41] KVM: PPC: Book3S HV P9: Allow all P9 processors to enable nested HV Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 37/41] KVM: PPC: Book3S HV: small pseries_do_hcall cleanup Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 38/41] KVM: PPC: Book3S HV: add virtual mode handlers for HPT hcalls and page faults Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 39/41] KVM: PPC: Book3S HV P9: implement hash guest support Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 40/41] KVM: PPC: Book3S HV P9: implement hash host / " Nicholas Piggin
2021-03-05 15:06 ` [PATCH v3 41/41] KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path Nicholas Piggin
2021-03-16 6:06 ` [PATCH v3 00/41] KVM: PPC: Book3S: C-ify the P9 entry/exit code Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a335d4a5-9d98-9f27-cfd3-b45dd1c07c9e@kaod.org \
--to=clg@kaod.org \
--cc=farosas@linux.ibm.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).