linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: cdall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
Date: Thu, 31 Aug 2017 11:23:05 +0200	[thread overview]
Message-ID: <20170831092305.GA13572@cbox> (raw)
In-Reply-To: <1503649901-5834-5-git-send-email-florent.revest@arm.com>

Hi Florent,

I'd like for the UEFI folks and arm64 kernel maintainers to express
their views on this overall approach before I do an in-depth review, but
I have some random comments based on reading this patch:

On Fri, Aug 25, 2017 at 09:31:34AM +0100, Florent Revest wrote:
> Usual KVM virtual machines map guest's physical addresses from a process
> userspace memory. However, with the new concept of internal VMs, a virtual
> machine can be created from the kernel, without any link to a userspace
> context. Hence, some of the KVM's architecture-specific code needs to be
> modified to take this kind of VMs into account.
> 
> The approach chosen with this patch is to let internal VMs idmap physical
> addresses into intermediary physical addresses by calling
> kvm_set_memory_region with a kvm_userspace_memory_region where the
> guest_phys_addr field points both to the original PAs and to the IPAs. The
> userspace_addr field of this struct is therefore ignored with internal VMs.
> 
> This patch extends the capabilities of the arm and arm64 stage2 MMU code
> to handle internal VMs. Three things are changed:
> 
> - Various parts of the MMU code which are related to a userspace context
> are now only executed if kvm->mm is present.
> 
> - When this pointer is NULL, struct kvm_userspace_memory_regions are
> treated by internal_vm_prep_mem as idmaps of physical memory.
> 
> - A set of 256 additional private memslots is now reserved on arm64 for the
> usage of internal VMs memory idmapping.
> 
> Note: this patch should have pretty much no performance impact on the
> critical path of traditional VMs since only one unlikely branch had to be
> added to the page fault handler.
> 
> Signed-off-by: Florent Revest <florent.revest@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/mmu.c                | 76 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 74 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d686300..65aab35 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -32,6 +32,7 @@
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> 
>  #define KVM_USER_MEM_SLOTS 512
> +#define KVM_PRIVATE_MEM_SLOTS 256
>  #define KVM_HALT_POLL_NS_DEFAULT 500000
> 
>  #include <kvm/arm_vgic.h>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 2ea21da..1d2d3df 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>         phys_addr_t size = PAGE_SIZE * memslot->npages;
>         hva_t reg_end = hva + size;
> 
> +       if (unlikely(!kvm->mm)) {

I think you should consider using a predicate so that it's clear that
this is for in-kernel VMs and not just some random situation where mm
can be NULL.

> +               unmap_stage2_range(kvm, addr, size);
> +               return;
> +       }
> +
>         /*
>          * A memory region could potentially cover multiple VMAs, and any holes
>          * between them, so iterate over all of them to find out if we should
> @@ -819,7 +824,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>         int idx;
> 
>         idx = srcu_read_lock(&kvm->srcu);
> -       down_read(&current->mm->mmap_sem);
> +       if (likely(kvm->mm))
> +               down_read(&current->mm->mmap_sem);
>         spin_lock(&kvm->mmu_lock);
> 
>         slots = kvm_memslots(kvm);
> @@ -827,7 +833,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>                 stage2_unmap_memslot(kvm, memslot);
> 
>         spin_unlock(&kvm->mmu_lock);
> -       up_read(&current->mm->mmap_sem);
> +       if (likely(kvm->mm))
> +               up_read(&current->mm->mmap_sem);
>         srcu_read_unlock(&kvm->srcu, idx);
>  }
> 
> @@ -1303,6 +1310,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 return -EFAULT;
>         }
> 
> +       if (unlikely(!kvm->mm)) {
> +               kvm_err("Unexpected internal VM page fault\n");
> +               kvm_inject_vabt(vcpu);
> +               return 0;
> +       }
> +

So it's unclear to me why we don't need any special casing in
kvm_handle_guest_abort, related to MMIO exits etc.  You probably assume
that we will never do emulation, but that should be described and
addressed somewhere before I can critically review this patch.

>         /* Let's check if we will get back a huge page backed by hugetlbfs */
>         down_read(&current->mm->mmap_sem);
>         vma = find_vma_intersection(current->mm, hva, hva + 1);
> @@ -1850,6 +1863,54 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>                 kvm_mmu_wp_memory_region(kvm, mem->slot);
>  }
> 
> +/*
> + * internal_vm_prep_mem - maps a range of hpa to gpa at stage2
> + *
> + * While userspace VMs manage gpas using hvas, internal virtual machines need a
> + * way to map physical addresses to a guest. In order to avoid code duplication,
> + * the kvm_set_memory_region call is kept for internal VMs, however it usually
> + * expects a struct kvm_userspace_memory_region with a userspace_addr field.
> + * With internal VMs, this field is ignored and physical memory memory pointed
> + * by guest_phys_addr can only be idmapped.
> + */
> +static int internal_vm_prep_mem(struct kvm *kvm,
> +                               const struct kvm_userspace_memory_region *mem)
> +{
> +       phys_addr_t addr, end;
> +       unsigned long pfn;
> +       int ret;
> +       struct kvm_mmu_memory_cache cache = { 0 };
> +
> +       end = mem->guest_phys_addr + mem->memory_size;
> +       pfn = __phys_to_pfn(mem->guest_phys_addr);
> +       addr = mem->guest_phys_addr;

My main concern here is that we don't do any checks on this region and
we could be mapping device memory here as well.  Are we intending that
to be ok, and are we then relying on the guest to use proper memory
attributes ?

> +
> +       for (; addr < end; addr += PAGE_SIZE) {
> +               pte_t pte = pfn_pte(pfn, PAGE_S2);
> +
> +               pte = kvm_s2pte_mkwrite(pte);
> +
> +               ret = mmu_topup_memory_cache(&cache,
> +                                            KVM_MMU_CACHE_MIN_PAGES,
> +                                            KVM_NR_MEM_OBJS);

You should be able to allocate all you need up front instead of doing it
in sequences.

> +               if (ret) {
> +                       mmu_free_memory_cache(&cache);
> +                       return ret;
> +               }
> +               spin_lock(&kvm->mmu_lock);
> +               ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> +               spin_unlock(&kvm->mmu_lock);

Since you're likely to allocate some large contiguous chunks here, can
you have a look at using section mappings?

> +               if (ret) {
> +                       mmu_free_memory_cache(&cache);
> +                       return ret;
> +               }
> +
> +               pfn++;
> +       }
> +
> +       return ret;
> +}
> +
>  int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                                    struct kvm_memory_slot *memslot,
>                                    const struct kvm_userspace_memory_region *mem,
> @@ -1872,6 +1933,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>             (KVM_PHYS_SIZE >> PAGE_SHIFT))
>                 return -EFAULT;
> 
> +       if (unlikely(!kvm->mm)) {
> +               ret = internal_vm_prep_mem(kvm, mem);
> +               if (ret)
> +                       goto out;
> +               goto out_internal_vm;
> +       }
> +
>         down_read(&current->mm->mmap_sem);
>         /*
>          * A memory region could potentially cover multiple VMAs, and any holes
> @@ -1930,6 +1998,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                 hva = vm_end;
>         } while (hva < reg_end);
> 
> +out_internal_vm:
>         if (change == KVM_MR_FLAGS_ONLY)
>                 goto out;
> 
> @@ -1940,7 +2009,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                 stage2_flush_memslot(kvm, memslot);
>         spin_unlock(&kvm->mmu_lock);
>  out:
> -       up_read(&current->mm->mmap_sem);
> +       if (kvm->mm)
> +               up_read(&current->mm->mmap_sem);
>         return ret;
>  }
> 
> --
> 1.9.1
> 

Thanks,
-Christoffer

  reply	other threads:[~2017-08-31  9:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25  8:31 [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing Florent Revest
2017-08-25  8:31 ` [RFC 01/11] arm64: Add an SMCCC function IDs header Florent Revest
2017-08-25  8:31 ` [RFC 02/11] KVM: arm64: Return an Unknown ID on unhandled HVC Florent Revest
2017-08-25  8:31 ` [RFC 03/11] KVM: Allow VM lifecycle management without userspace Florent Revest
2017-08-25  8:31 ` [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs Florent Revest
2017-08-31  9:23   ` Christoffer Dall [this message]
2017-09-26 21:14     ` Florent Revest
2017-10-16 20:45       ` Christoffer Dall
2017-08-25  8:31 ` [RFC 05/11] KVM: Expose VM/VCPU creation functions Florent Revest
2017-08-25  8:31 ` [RFC 06/11] KVM, arm64: Expose a VCPU initialization function Florent Revest
2017-08-25  8:31 ` [RFC 07/11] KVM: Allow initialization before the module target Florent Revest
2017-08-25  8:31 ` [RFC 08/11] KVM, arm, arm64: Initialize KVM's core earlier Florent Revest
2017-08-25  8:31 ` [RFC 09/11] EFI, arm, arm64: Enable EFI Runtime Services later Florent Revest
2017-08-25  8:31 ` [RFC 10/11] efi, arm64: Sandbox Runtime Services in a VM Florent Revest
2017-08-25  8:31 ` [RFC 11/11] KVM, arm64: Don't trap internal VMs SMC calls Florent Revest
2017-08-25  9:40 ` [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing Florent Revest
2017-08-31  9:26 ` Christoffer Dall
2017-09-26 21:14   ` Florent Revest
2017-10-16 20:47     ` Christoffer Dall
2017-09-22 21:44 ` Ard Biesheuvel
2017-09-26 21:14   ` Florent Revest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170831092305.GA13572@cbox \
    --to=cdall@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).