From mboxrd@z Thu Jan 1 00:00:00 1970 From: Laurent Dufour Date: Thu, 23 Jul 2020 14:06:43 +0000 Subject: Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping Message-Id: <0631397b-44af-ea3b-b70b-e4a0dc2c0366@linux.ibm.com> List-Id: References: <20200721104202.15727-1-ldufour@linux.ibm.com> <20200721104202.15727-3-ldufour@linux.ibm.com> <20200723033600.GS7902@in.ibm.com> <4a3caeaf-cd0c-fcd7-0a97-f367a5f78dac@linux.ibm.com> In-Reply-To: <4a3caeaf-cd0c-fcd7-0a97-f367a5f78dac@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: bharata@linux.ibm.com, linuxram@us.ibm.com Cc: linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, paulus@samba.org, sukadev@linux.ibm.com, linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com Le 23/07/2020 à 14:32, Laurent Dufour a écrit : > Le 23/07/2020 à 05:36, Bharata B Rao a écrit : >> On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: >>> When a secure memslot is dropped, all the pages backed in the secure device >>> (aka really backed by secure memory by the Ultravisor) should be paged out >>> to a normal page. Previously, this was achieved by triggering the page >>> fault mechanism which is calling kvmppc_svm_page_out() on each pages. >>> >>> This can't work when hot unplugging a memory slot because the memory slot >>> is flagged as invalid and gfn_to_pfn() is then not trying to access the >>> page, so the page fault mechanism is not triggered. >>> >>> Since the final goal is to make a call to kvmppc_svm_page_out() it seems >>> simpler to directly calling it instead of triggering such a mechanism. This >>> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a >>> memslot. >>> >>> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, >>> the call to __kvmppc_svm_page_out() is made. >>> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the >>> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In >>> addition, the mmap_sem is help in read mode during that time, not in write >>> mode since the virual memory layout is not impacted, and >>> kvm->arch.uvmem_lock prevents concurrent operation on the secure device. >>> >>> Cc: Ram Pai >>> Cc: Bharata B Rao >>> Cc: Paul Mackerras >>> Signed-off-by: Laurent Dufour >>> --- >>>   arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++---------- >>>   1 file changed, 37 insertions(+), 17 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> index 5a4b02d3f651..ba5c7c77cc3a 100644 >>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct >>> vm_area_struct *vma, >>>    * fault on them, do fault time migration to replace the device PTEs in >>>    * QEMU page table with normal PTEs from newly allocated pages. >>>    */ >>> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, >>> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, >>>                    struct kvm *kvm, bool skip_page_out) >>>   { >>>       int i; >>>       struct kvmppc_uvmem_page_pvt *pvt; >>> -    unsigned long pfn, uvmem_pfn; >>> -    unsigned long gfn = free->base_gfn; >>> +    struct page *uvmem_page; >>> +    struct vm_area_struct *vma = NULL; >>> +    unsigned long uvmem_pfn, gfn; >>> +    unsigned long addr, end; >>> + >>> +    mmap_read_lock(kvm->mm); >>> + >>> +    addr = slot->userspace_addr; >> >> We typically use gfn_to_hva() for that, but that won't work for a >> memslot that is already marked INVALID which is the case here. >> I think it is ok to access slot->userspace_addr here of an INVALID >> memslot, but just thought of explictly bringing this up. > > Which explicitly mentioned above in the patch's description: > > This can't work when hot unplugging a memory slot because the memory slot > is flagged as invalid and gfn_to_pfn() is then not trying to access the > page, so the page fault mechanism is not triggered. > >> >>> +    end = addr + (slot->npages * PAGE_SIZE); >>> -    for (i = free->npages; i; --i, ++gfn) { >>> -        struct page *uvmem_page; >>> +    gfn = slot->base_gfn; >>> +    for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) { >>> + >>> +        /* Fetch the VMA if addr is not in the latest fetched one */ >>> +        if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) { >>> +            vma = find_vma_intersection(kvm->mm, addr, end); >>> +            if (!vma || >>> +                vma->vm_start > addr || vma->vm_end < end) { >>> +                pr_err("Can't find VMA for gfn:0x%lx\n", gfn); >>> +                break; >>> +            } >>> +        } >> >> In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning >> the memslot, but it uses a different logic for the same. Why can't these >> two cases use the same method to walk the VMAs? Is there anything subtly >> different between the two cases? > > This is probably doable. At the time I wrote that patch, the > kvmppc_memslot_page_merge() was not yet introduced AFAIR. > > This being said, I'd help a lot to factorize that code... I let Ram dealing with > that ;) Indeed I don't think this is relevant, the loop in kvmppc_memslot_page_merge() deals with one call (to ksm_advise) per VMA, while this code is dealing with one call per page of the VMA, which completely different. I don't think merging the both will be a good idea. Cheers, Laurent.