Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: KVM devel mailing list <kvm@vger.kernel.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	Marc Zyngier <marc.zyngier@arm.com>,
	p.fedin@samsung.com
Subject: Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness
Date: Tue, 10 Nov 2015 14:40:48 +0100	[thread overview]
Message-ID: <20151110134048.GE12968@cbox> (raw)
In-Reply-To: <CAKv+Gu_q0inrFoMNeQdF-f_msLEsyb_Fux0d4tQYmMb6Fgttrw@mail.gmail.com>

On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
> On 10 November 2015 at 13:22, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
> >> Hi all,
> >>
> >> I wonder if this is a better way to address the problem. It looks at
> >> the nature of the memory rather than the nature of the mapping, which
> >> is probably a more reliable indicator of whether cache maintenance is
> >> required when performing the unmap.
> >>
> >>
> >> -----------8<----------------
> >> The open coded tests for checking whether a PTE maps a page as
> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> >> which is not guaranteed to work since the type of a mapping is
> >> not a set of mutually exclusive bits
> >>
> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
> >> index itself does not contain any information whatsoever about the
> >> type of the mapping), and for stage-2 mappings it is a bit field where
> >> normal memory and device types are defined as follows:
> >>
> >>     #define MT_S2_NORMAL            0xf
> >>     #define MT_S2_DEVICE_nGnRE      0x1
> >>
> >> I.e., masking *and* comparing with the latter matches on the former,
> >> and we have been getting lucky merely because the S2 device mappings
> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
> >> as device mappings.
> >>
> >> Since the unmap_range() code path (which contains one instance of the
> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
> >> considering the difference between the two, it is non-trivial to fix
> >> this by rewriting the tests in place, as it would involve passing
> >> down the type of mapping through all the functions.
> >>
> >> However, since HYP mappings and stage-2 mappings both deal with host
> >> physical addresses, we can simply check whether the mapping is backed
> >> by memory that is managed by the host kernel, and only perform the
> >> D-cache maintenance if this is the case.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm/kvm/mmu.c | 15 +++++++--------
> >>  1 file changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 6984342da13d..7dace909d5cf 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> >>       __kvm_flush_dcache_pud(pud);
> >>  }
> >>
> >> +static bool kvm_is_device_pfn(unsigned long pfn)
> >> +{
> >> +     return !pfn_valid(pfn);
> >> +}
> >> +
> >>  /**
> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
> >>   * @kvm:     pointer to kvm structure.
> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> >>                       kvm_tlb_flush_vmid_ipa(kvm, addr);
> >>
> >>                       /* No need to invalidate the cache for device mappings */
> >> -                     if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >> +                     if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>                               kvm_flush_dcache_pte(old_pte);
> >>
> >>                       put_page(virt_to_page(pte));
> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> >>
> >>       pte = pte_offset_kernel(pmd, addr);
> >>       do {
> >> -             if (!pte_none(*pte) &&
> >> -                 (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >> +             if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>                       kvm_flush_dcache_pte(*pte);
> >>       } while (pte++, addr += PAGE_SIZE, addr != end);
> >>  }
> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >>       return kvm_vcpu_dabt_iswrite(vcpu);
> >>  }
> >>
> >> -static bool kvm_is_device_pfn(unsigned long pfn)
> >> -{
> >> -     return !pfn_valid(pfn);
> >> -}
> >> -
> >>  /**
> >>   * stage2_wp_ptes - write protect PMD range
> >>   * @pmd:     pointer to pmd entry
> >> --
> >> 1.9.1
> >>
> >
> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> > regions where the vma has (vm_flags & VM_PFNMAP).
> >
> > Will these, and only these, cases be covered by the pfn_valid check?
> >
> 
> The pfn_valid() check will ensure that cache maintenance is only
> performed on regions that are known to the host as memory, are managed
> by the host (i.e., there is a struct page associated with them) and
> will be accessed by the host via cacheable mappings (they are covered
> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
> are highmem). If you look at the commit that introduced these tests
> (363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the
> concern it addresses is that the guest may perform uncached accesses
> to regions that the host has mapped cacheable, meaning guest writes
> may be shadowed by clean cachelines, making them invisble to cache
> coherent I/O. So afaict, pfn_valid() is an appropriate check here.

right, this I agree with.

> 
> pfn_valid() will not misidentify device regions as memory (unless the
> host is really broken) so this should fix Pavel's case. The converse
> case (a memory region misidentified as a device) could only happen for
> a carve-out (i.e., via the /reserved-memory node) that is mapped
> inside the guest via a pass-through (VM_PFNMAP) mapping. That case is
> already dodgy, since the guest accesses would be forced
> uncached/ordered due to the fact that those mappings are forced
> PAGE_S2_DEVICE at stage 2 (as you mention), and would also be
> misidentified by the current code (due to the PAGE_S2_DEVICE
> attributes)
> 

ok, but such pages should never be swapped out by the host, so I think
we're still ok here.

Will you send an updated proper patch to the list?  You can add my
RB if you want.

-Christoffer

  reply	other threads:[~2015-11-10 13:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAKv+Gu_agkv7aDqj7fgdoWU+9k=eJ-0YU7WrDKm1_wtr_xYQ6g@mail.gmail.com>
     [not found] ` <1447148737-15363-1-git-send-email-ard.biesheuvel@linaro.org>
2015-11-10  9:47   ` [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness Ard Biesheuvel
2015-11-10 10:27     ` Pavel Fedin
     [not found]   ` <20151110122203.GD12968@cbox>
2015-11-10 13:15     ` Ard Biesheuvel
2015-11-10 13:40       ` Christoffer Dall [this message]
2015-11-10 13:48         ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151110134048.GE12968@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=marc.zyngier@arm.com \
    --cc=p.fedin@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox