From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
Alex Williamson <alex.williamson@redhat.com>,
Paul Mackerras <paulus@samba.org>,
kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel v2 05/11] KVM: PPC: Use preregistered memory API to access TCE list
Date: Wed, 21 Dec 2016 15:08:43 +1100 [thread overview]
Message-ID: <20161221040843.GD13024@umbus.fritz.box> (raw)
In-Reply-To: <20161218012900.18142-6-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 5832 bytes --]
On Sun, Dec 18, 2016 at 12:28:54PM +1100, Alexey Kardashevskiy wrote:
> VFIO on sPAPR already implements guest memory pre-registration
> when the entire guest RAM gets pinned. This can be used to translate
> the physical address of a guest page containing the TCE list
> from H_PUT_TCE_INDIRECT.
>
> This makes use of the pre-registrered memory API to access TCE list
> pages in order to avoid unnecessary locking on the KVM memory
> reverse map as we know that all of guest memory is pinned and
> we have a flat array mapping GPA to HPA which makes it simpler and
> quicker to index into that array (even with looking up the
> kernel page tables in vmalloc_to_phys) than it is to find the memslot,
> lock the rmap entry, look up the user page tables, and unlock the rmap
> entry. Note that the rmap pointer is initialized to NULL
> where declared (not in this patch).
>
> If a requested chunk of memory has not been preregistered,
> this will fail with H_TOO_HARD so the virtual mode handle can
> handle the request.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * updated the commit log with David's comment
> ---
> arch/powerpc/kvm/book3s_64_vio_hv.c | 65 ++++++++++++++++++++++++++++---------
> 1 file changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index d461c440889a..a3be4bd6188f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -180,6 +180,17 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
> +{
> + return mm_iommu_preregistered(vcpu->kvm->mm);
> +}
> +
> +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
> + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
> +{
> + return mm_iommu_lookup_rm(vcpu->kvm->mm, ua, size);
> +}
I don't see that there's much point to these inlines.
> long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
> {
> @@ -260,23 +271,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> if (ret != H_SUCCESS)
> return ret;
>
> - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> - return H_TOO_HARD;
> + if (kvmppc_preregistered(vcpu)) {
> + /*
> + * We get here if guest memory was pre-registered which
> + * is normally VFIO case and gpa->hpa translation does not
> + * depend on hpt.
> + */
> + struct mm_iommu_table_group_mem_t *mem;
>
> - rmap = (void *) vmalloc_to_phys(rmap);
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
> + return H_TOO_HARD;
>
> - /*
> - * Synchronize with the MMU notifier callbacks in
> - * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> - * While we have the rmap lock, code running on other CPUs
> - * cannot finish unmapping the host real page that backs
> - * this guest real page, so we are OK to access the host
> - * real page.
> - */
> - lock_rmap(rmap);
> - if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> - ret = H_TOO_HARD;
> - goto unlock_exit;
> + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
> + if (!mem || mm_iommu_ua_to_hpa_rm(mem, ua, &tces))
> + return H_TOO_HARD;
> + } else {
> + /*
> + * This is emulated devices case.
> + * We do not require memory to be preregistered in this case
> + * so lock rmap and do __find_linux_pte_or_hugepte().
> + */
Hmm. So this isn't wrong as such, but the logic and comments are
both misleading. The 'if' here isn't really about VFIO vs. emulated -
it's about whether the mm has *any* preregistered chunks, without any
regard to which particular device you're talking about. For example
if your guest has two PHBs, one with VFIO devices and the other with
emulated devices, then the emulated devices will still go through the
"VFIO" case here.
Really what you have here is a fast case when the tce_list is in
preregistered memory, and a fallback case when it isn't. But that's
obscured by the fact that if for whatever reason you have some
preregistered memory but it doesn't cover the tce_list, then you don't
go to the fallback case here, but instead fall right back to the
virtual mode handler.
So, I think you should either:
1) Fallback to the code below whenever you can't access the
tce_list via prereg memory, regardless of whether there's any
_other_ prereg memory
or
2) Drop the code below entirely and always return H_TOO_HARD if
you can't get the tce_list from prereg.
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> + return H_TOO_HARD;
> +
> + rmap = (void *) vmalloc_to_phys(rmap);
> +
> + /*
> + * Synchronize with the MMU notifier callbacks in
> + * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> + * While we have the rmap lock, code running on other CPUs
> + * cannot finish unmapping the host real page that backs
> + * this guest real page, so we are OK to access the host
> + * real page.
> + */
> + lock_rmap(rmap);
> + if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> + ret = H_TOO_HARD;
> + goto unlock_exit;
> + }
> }
>
> for (i = 0; i < npages; ++i) {
> @@ -290,7 +322,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> }
>
> unlock_exit:
> - unlock_rmap(rmap);
> + if (rmap)
> + unlock_rmap(rmap);
>
> return ret;
> }
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-12-21 4:15 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-18 1:28 [PATCH kernel v2 00/11] powerpc/kvm/vfio: Enable in-kernel acceleration Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 01/11] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 02/11] powerpc/iommu: Cleanup iommu_table disposal Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 03/11] powerpc/vfio_spapr_tce: Add reference counting to iommu_table Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 04/11] powerpc/mmu: Add real mode support for IOMMU preregistered memory Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 05/11] KVM: PPC: Use preregistered memory API to access TCE list Alexey Kardashevskiy
2016-12-21 4:08 ` David Gibson [this message]
2016-12-21 8:57 ` Alexey Kardashevskiy
2017-01-11 6:35 ` Alexey Kardashevskiy
2017-01-12 5:49 ` David Gibson
2016-12-18 1:28 ` [PATCH kernel v2 06/11] powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange() Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 07/11] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 08/11] KVM: PPC: Pass kvm* to kvmppc_find_table() Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 09/11] vfio iommu: Add helpers to (un)register blocking notifiers per group Alexey Kardashevskiy
2016-12-21 6:04 ` David Gibson
2016-12-22 1:25 ` Alexey Kardashevskiy
2016-12-18 1:28 ` [PATCH kernel v2 10/11] vfio: Check for unregistered notifiers when group is actually released Alexey Kardashevskiy
2016-12-19 10:41 ` Jike Song
2016-12-19 16:28 ` Alex Williamson
2016-12-19 22:41 ` Alexey Kardashevskiy
2016-12-18 1:29 ` [PATCH kernel v2 11/11] KVM: PPC: Add in-kernel acceleration for VFIO Alexey Kardashevskiy
2016-12-20 6:52 ` [PATCH kernel v3] " Alexey Kardashevskiy
2017-01-12 5:04 ` David Gibson
2017-01-12 8:09 ` Alexey Kardashevskiy
2017-01-12 23:53 ` David Gibson
2017-01-13 2:23 ` Alexey Kardashevskiy
2017-01-13 2:38 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161221040843.GD13024@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).