Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christoffer Dall <cdall@linaro.org>
To: Florent Revest <revestflo@gmail.com>
Cc: linux-efi@vger.kernel.org, kvm@vger.kernel.org,
	matt@codeblueprint.co.uk, catalin.marinas@arm.com,
	ard.biesheuvel@linaro.org, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, leif.lindholm@arm.com,
	marc.zyngier@arm.com, pbonzini@redhat.com,
	Florent Revest <florent.revest@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
Date: Mon, 16 Oct 2017 22:45:05 +0200	[thread overview]
Message-ID: <20171016204505.GN1845@lvm> (raw)
In-Reply-To: <1506460485.5507.57.camel@gmail.com>

On Tue, Sep 26, 2017 at 11:14:45PM +0200, Florent Revest wrote:
> On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > > index 2ea21da..1d2d3df 100644
> > > --- a/virt/kvm/arm/mmu.c
> > > +++ b/virt/kvm/arm/mmu.c
> > > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > > *kvm,
> > >         phys_addr_t size = PAGE_SIZE * memslot->npages;
> > >         hva_t reg_end = hva + size;
> > > 
> > > +       if (unlikely(!kvm->mm)) {
> > I think you should consider using a predicate so that it's clear that
> > this is for in-kernel VMs and not just some random situation where mm
> > can be NULL.
> 
> Internal VMs should be the only usage when kvm->mm would be NULL.
> However if you'd prefer it otherwise, I'll make sure this condition
> will be made clearer.
> 

My point was then when I see (!kvm->mm) it looks like a bug, but if I
saw is_in_kernel_vm(kvm) then it looks like a feature.

> > So it's unclear to me why we don't need any special casing in
> > kvm_handle_guest_abort, related to MMIO exits etc.  You probably
> > assume that we will never do emulation, but that should be described
> > and addressed somewhere before I can critically review this patch.
> 
> This is indeed what I was assuming. This RFC does not allow MMIO with
> internal VMs. I can not think of a usage when this would be useful. I'd
> make sure this would be documented in an eventual later RFC.
> 

OK, sounds good.  It's important for me as a reviewer to be able to tell
the differenc between 'assumed valid guest behavior' and 'limitations of
in-kernel VM support' which are handled in such and such way.


> > > +static int internal_vm_prep_mem(struct kvm *kvm,
> > > +                               const struct
> > > kvm_userspace_memory_region *mem)
> > > +{
> > > +       phys_addr_t addr, end;
> > > +       unsigned long pfn;
> > > +       int ret;
> > > +       struct kvm_mmu_memory_cache cache = { 0 };
> > > +
> > > +       end = mem->guest_phys_addr + mem->memory_size;
> > > +       pfn = __phys_to_pfn(mem->guest_phys_addr);
> > > +       addr = mem->guest_phys_addr;
> > My main concern here is that we don't do any checks on this region
> > and we could be mapping device memory here as well.  Are we intending
> > that to be ok, and are we then relying on the guest to use proper
> > memory attributes ?
> 
> Indeed, being able to map device memory is intended. It is needed for
> Runtime Services sandboxing. It also relies on the guest being
> correctly configured.
> 

So the reason why we wanted to enforce device attribute mappings in
stage 2 was to avoid a guest having the potential to do cached writes to
a device, which would hit at a later time while no longer running the
VM, potentially breaking isolation through manipulation of a device.

This seems to break with that level of isolation, and that property of
in-kernel VMs should be clearly pointed out somewhere.

> > > +
> > > +       for (; addr < end; addr += PAGE_SIZE) {
> > > +               pte_t pte = pfn_pte(pfn, PAGE_S2);
> > > +
> > > +               pte = kvm_s2pte_mkwrite(pte);
> > > +
> > > +               ret = mmu_topup_memory_cache(&cache,
> > > +                                            KVM_MMU_CACHE_MIN_PAGE
> > > S,
> > > +                                            KVM_NR_MEM_OBJS);
> > You should be able to allocate all you need up front instead of doing
> > it in sequences.
> 
> Ok.
> 
> > > 
> > > +               if (ret) {
> > > +                       mmu_free_memory_cache(&cache);
> > > +                       return ret;
> > > +               }
> > > +               spin_lock(&kvm->mmu_lock);
> > > +               ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> > > +               spin_unlock(&kvm->mmu_lock);
> > Since you're likely to allocate some large contiguous chunks here,
> > can you have a look at using section mappings?
> 
> Will do.
> 

Thanks!
-Christoffer

WARNING: multiple messages have this Message-ID (diff)

From: cdall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
Date: Mon, 16 Oct 2017 22:45:05 +0200	[thread overview]
Message-ID: <20171016204505.GN1845@lvm> (raw)
In-Reply-To: <1506460485.5507.57.camel@gmail.com>

On Tue, Sep 26, 2017 at 11:14:45PM +0200, Florent Revest wrote:
> On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > > index 2ea21da..1d2d3df 100644
> > > --- a/virt/kvm/arm/mmu.c
> > > +++ b/virt/kvm/arm/mmu.c
> > > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > > *kvm,
> > > ????????phys_addr_t size = PAGE_SIZE * memslot->npages;
> > > ????????hva_t reg_end = hva + size;
> > > 
> > > +???????if (unlikely(!kvm->mm)) {
> > I think you should consider using a predicate so that it's clear that
> > this is for in-kernel VMs and not just some random situation where mm
> > can be NULL.
> 
> Internal VMs should be the only usage when kvm->mm would be NULL.
> However if you'd prefer it otherwise, I'll make sure this condition
> will be made clearer.
> 

My point was then when I see (!kvm->mm) it looks like a bug, but if I
saw is_in_kernel_vm(kvm) then it looks like a feature.

> > So it's unclear to me why we don't need any special casing in
> > kvm_handle_guest_abort, related to MMIO exits etc.??You probably
> > assume that we will never do emulation, but that should be described
> > and addressed somewhere before I can critically review this patch.
> 
> This is indeed what I was assuming. This RFC does not allow MMIO with
> internal VMs. I can not think of a usage when this would be useful. I'd
> make sure this would be documented in an eventual later RFC.
> 

OK, sounds good.  It's important for me as a reviewer to be able to tell
the differenc between 'assumed valid guest behavior' and 'limitations of
in-kernel VM support' which are handled in such and such way.


> > > +static int internal_vm_prep_mem(struct kvm *kvm,
> > > +???????????????????????????????const struct
> > > kvm_userspace_memory_region *mem)
> > > +{
> > > +???????phys_addr_t addr, end;
> > > +???????unsigned long pfn;
> > > +???????int ret;
> > > +???????struct kvm_mmu_memory_cache cache = { 0 };
> > > +
> > > +???????end = mem->guest_phys_addr + mem->memory_size;
> > > +???????pfn = __phys_to_pfn(mem->guest_phys_addr);
> > > +???????addr = mem->guest_phys_addr;
> > My main concern here is that we don't do any checks on this region
> > and we could be mapping device memory here as well.??Are we intending
> > that to be ok, and are we then relying on the guest to use proper
> > memory attributes ?
> 
> Indeed, being able to map device memory is intended. It is needed for
> Runtime Services sandboxing. It also relies on the guest being
> correctly configured.
> 

So the reason why we wanted to enforce device attribute mappings in
stage 2 was to avoid a guest having the potential to do cached writes to
a device, which would hit at a later time while no longer running the
VM, potentially breaking isolation through manipulation of a device.

This seems to break with that level of isolation, and that property of
in-kernel VMs should be clearly pointed out somewhere.

> > > +
> > > +???????for (; addr < end; addr += PAGE_SIZE) {
> > > +???????????????pte_t pte = pfn_pte(pfn, PAGE_S2);
> > > +
> > > +???????????????pte = kvm_s2pte_mkwrite(pte);
> > > +
> > > +???????????????ret = mmu_topup_memory_cache(&cache,
> > > +????????????????????????????????????????????KVM_MMU_CACHE_MIN_PAGE
> > > S,
> > > +????????????????????????????????????????????KVM_NR_MEM_OBJS);
> > You should be able to allocate all you need up front instead of doing
> > it in sequences.
> 
> Ok.
> 
> > > 
> > > +???????????????if (ret) {
> > > +???????????????????????mmu_free_memory_cache(&cache);
> > > +???????????????????????return ret;
> > > +???????????????}
> > > +???????????????spin_lock(&kvm->mmu_lock);
> > > +???????????????ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> > > +???????????????spin_unlock(&kvm->mmu_lock);
> > Since you're likely to allocate some large contiguous chunks here,
> > can you have a look at using section mappings?
> 
> Will do.
> 

Thanks!
-Christoffer

WARNING: multiple messages have this Message-ID (diff)

From: Christoffer Dall <cdall@linaro.org>
To: Florent Revest <revestflo@gmail.com>
Cc: Florent Revest <florent.revest@arm.com>,
	linux-arm-kernel@lists.infradead.org, matt@codeblueprint.co.uk,
	ard.biesheuvel@linaro.org, pbonzini@redhat.com,
	rkrcmar@redhat.com, christoffer.dall@linaro.org,
	catalin.marinas@arm.com, will.deacon@arm.com,
	mark.rutland@arm.com, marc.zyngier@arm.com,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	leif.lindholm@arm.com
Subject: Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
Date: Mon, 16 Oct 2017 22:45:05 +0200	[thread overview]
Message-ID: <20171016204505.GN1845@lvm> (raw)
In-Reply-To: <1506460485.5507.57.camel@gmail.com>

On Tue, Sep 26, 2017 at 11:14:45PM +0200, Florent Revest wrote:
> On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > > index 2ea21da..1d2d3df 100644
> > > --- a/virt/kvm/arm/mmu.c
> > > +++ b/virt/kvm/arm/mmu.c
> > > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > > *kvm,
> > >         phys_addr_t size = PAGE_SIZE * memslot->npages;
> > >         hva_t reg_end = hva + size;
> > > 
> > > +       if (unlikely(!kvm->mm)) {
> > I think you should consider using a predicate so that it's clear that
> > this is for in-kernel VMs and not just some random situation where mm
> > can be NULL.
> 
> Internal VMs should be the only usage when kvm->mm would be NULL.
> However if you'd prefer it otherwise, I'll make sure this condition
> will be made clearer.
> 

My point was then when I see (!kvm->mm) it looks like a bug, but if I
saw is_in_kernel_vm(kvm) then it looks like a feature.

> > So it's unclear to me why we don't need any special casing in
> > kvm_handle_guest_abort, related to MMIO exits etc.  You probably
> > assume that we will never do emulation, but that should be described
> > and addressed somewhere before I can critically review this patch.
> 
> This is indeed what I was assuming. This RFC does not allow MMIO with
> internal VMs. I can not think of a usage when this would be useful. I'd
> make sure this would be documented in an eventual later RFC.
> 

OK, sounds good.  It's important for me as a reviewer to be able to tell
the differenc between 'assumed valid guest behavior' and 'limitations of
in-kernel VM support' which are handled in such and such way.


> > > +static int internal_vm_prep_mem(struct kvm *kvm,
> > > +                               const struct
> > > kvm_userspace_memory_region *mem)
> > > +{
> > > +       phys_addr_t addr, end;
> > > +       unsigned long pfn;
> > > +       int ret;
> > > +       struct kvm_mmu_memory_cache cache = { 0 };
> > > +
> > > +       end = mem->guest_phys_addr + mem->memory_size;
> > > +       pfn = __phys_to_pfn(mem->guest_phys_addr);
> > > +       addr = mem->guest_phys_addr;
> > My main concern here is that we don't do any checks on this region
> > and we could be mapping device memory here as well.  Are we intending
> > that to be ok, and are we then relying on the guest to use proper
> > memory attributes ?
> 
> Indeed, being able to map device memory is intended. It is needed for
> Runtime Services sandboxing. It also relies on the guest being
> correctly configured.
> 

So the reason why we wanted to enforce device attribute mappings in
stage 2 was to avoid a guest having the potential to do cached writes to
a device, which would hit at a later time while no longer running the
VM, potentially breaking isolation through manipulation of a device.

This seems to break with that level of isolation, and that property of
in-kernel VMs should be clearly pointed out somewhere.

> > > +
> > > +       for (; addr < end; addr += PAGE_SIZE) {
> > > +               pte_t pte = pfn_pte(pfn, PAGE_S2);
> > > +
> > > +               pte = kvm_s2pte_mkwrite(pte);
> > > +
> > > +               ret = mmu_topup_memory_cache(&cache,
> > > +                                            KVM_MMU_CACHE_MIN_PAGE
> > > S,
> > > +                                            KVM_NR_MEM_OBJS);
> > You should be able to allocate all you need up front instead of doing
> > it in sequences.
> 
> Ok.
> 
> > > 
> > > +               if (ret) {
> > > +                       mmu_free_memory_cache(&cache);
> > > +                       return ret;
> > > +               }
> > > +               spin_lock(&kvm->mmu_lock);
> > > +               ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> > > +               spin_unlock(&kvm->mmu_lock);
> > Since you're likely to allocate some large contiguous chunks here,
> > can you have a look at using section mappings?
> 
> Will do.
> 

Thanks!
-Christoffer

next prev parent reply	other threads:[~2017-10-16 20:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25  8:31 [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing Florent Revest
2017-08-25  8:31 ` Florent Revest
2017-08-25  8:31 ` Florent Revest
2017-08-25  8:31 ` [RFC 01/11] arm64: Add an SMCCC function IDs header Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 02/11] KVM: arm64: Return an Unknown ID on unhandled HVC Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 03/11] KVM: Allow VM lifecycle management without userspace Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-31  9:23   ` Christoffer Dall
2017-08-31  9:23     ` Christoffer Dall
2017-09-26 21:14     ` Florent Revest
2017-09-26 21:14       ` Florent Revest
2017-10-16 20:45       ` Christoffer Dall [this message]
2017-10-16 20:45         ` Christoffer Dall
2017-10-16 20:45         ` Christoffer Dall
2017-08-25  8:31 ` [RFC 05/11] KVM: Expose VM/VCPU creation functions Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 06/11] KVM, arm64: Expose a VCPU initialization function Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 07/11] KVM: Allow initialization before the module target Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 08/11] KVM, arm, arm64: Initialize KVM's core earlier Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 09/11] EFI, arm, arm64: Enable EFI Runtime Services later Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 10/11] efi, arm64: Sandbox Runtime Services in a VM Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  8:31 ` [RFC 11/11] KVM, arm64: Don't trap internal VMs SMC calls Florent Revest
2017-08-25  8:31   ` Florent Revest
2017-08-25  9:40 ` [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing Florent Revest
2017-08-25  9:40   ` Florent Revest
2017-08-25  9:40   ` Florent Revest
2017-08-31  9:26 ` Christoffer Dall
2017-08-31  9:26   ` Christoffer Dall
2017-09-26 21:14   ` Florent Revest
2017-09-26 21:14     ` Florent Revest
2017-10-16 20:47     ` Christoffer Dall
2017-10-16 20:47       ` Christoffer Dall
2017-09-22 21:44 ` Ard Biesheuvel
2017-09-22 21:44   ` Ard Biesheuvel
2017-09-26 21:14   ` Florent Revest
2017-09-26 21:14     ` Florent Revest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171016204505.GN1845@lvm \
    --to=cdall@linaro.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=florent.revest@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=leif.lindholm@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=revestflo@gmail.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.