From: christoffer.dall@arm.com (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
Date: Tue, 4 Sep 2018 15:37:27 +0200 [thread overview]
Message-ID: <20180904133727.GR4029@e113682-lin.lund.arm.com> (raw)
In-Reply-To: <87sh2p5xmu.fsf@e105922-lin.cambridge.arm.com>
On Tue, Sep 04, 2018 at 12:07:37PM +0100, Punit Agrawal wrote:
> Christoffer Dall <christoffer.dall@arm.com> writes:
>
> > On Mon, Sep 03, 2018 at 06:29:30PM +0100, Punit Agrawal wrote:
> >> Christoffer Dall <christoffer.dall@arm.com> writes:
> >>
> >> > [Adding Andrea and Steve in CC]
> >> >
> >> > On Thu, Aug 23, 2018 at 04:33:42PM +0100, Marc Zyngier wrote:
> >> >> When triggering a CoW, we unmap the RO page via an MMU notifier
> >> >> (invalidate_range_start), and then populate the new PTE using another
> >> >> one (change_pte). In the meantime, we'll have copied the old page
> >> >> into the new one.
> >> >>
> >> >> The problem is that the data for the new page is sitting in the
> >> >> cache, and should the guest have an uncached mapping to that page
> >> >> (or its MMU off), following accesses will bypass the cache.
> >> >>
> >> >> In a way, this is similar to what happens on a translation fault:
> >> >> We need to clean the page to the PoC before mapping it. So let's just
> >> >> do that.
> >> >>
> >> >> This fixes a KVM unit test regression observed on a HiSilicon platform,
> >> >> and subsequently reproduced on Seattle.
> >> >>
> >> >> Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation fault")
> >> >> Reported-by: Mike Galbraith <efault@gmx.de>
> >> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> >> ---
> >> >> virt/kvm/arm/mmu.c | 9 ++++++++-
> >> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> >> index 1d90d79706bd..287c8e274655 100644
> >> >> --- a/virt/kvm/arm/mmu.c
> >> >> +++ b/virt/kvm/arm/mmu.c
> >> >> @@ -1811,13 +1811,20 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
> >> >> void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> >> >> {
> >> >> unsigned long end = hva + PAGE_SIZE;
> >> >> + kvm_pfn_t pfn = pte_pfn(pte);
> >> >> pte_t stage2_pte;
> >> >>
> >> >> if (!kvm->arch.pgd)
> >> >> return;
> >> >>
> >> >> trace_kvm_set_spte_hva(hva);
> >> >> - stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2);
> >> >> +
> >> >> + /*
> >> >> + * We've moved a page around, probably through CoW, so let's treat
> >> >> + * just like a translation fault and clean the cache to the PoC.
> >> >> + */
> >> >> + clean_dcache_guest_page(pfn, PAGE_SIZE);
> >> >> + stage2_pte = pfn_pte(pfn, PAGE_S2);
> >> >> handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
> >> >> }
> >> >
> >> > How does this work for pmd mappings?
> >>
> >> kvm_set_spte_hva() isn't called for PMD mappings. But...
> >>
> >> >
> >> > Are we guaranteed that a pmd mapping (hugetlbfs or THP) is split before
> >> > a CoW happens?
> >> >
> >> > Steve tells me that we share THP mappings on fork and that we back THPs
> >> > by a zero page, so CoW with THP should be possible.
> >> >
> >>
> >> ...the problem seems to affect handling write permission faults for CoW
> >> or zero pages.
> >>
> >> The memory gets unmapped at stage 2 due to the invalidate notifier (in
> >> hugetlb_cow() for hugetlbfs and do_huge_pmd_wp_page() for THP)
> >
> > So just to make sure I get this right. For a pte CoW, Linux calls the
> > set_spte function to simply change the pte mapping, without doing any
> > unmapping at stage 2,
>
> No.
>
> I hadn't checked into the PTE CoW for zero pages when replying but was
> relying on Marc's commit log. I've outlined the flow below.
>
> > but with pmd, we unmap and wait to take another fault as an
> > alternative method?
>
> Having looked at handling of CoW handling for the different page sizes,
> here's my understanding of the steps for CoW faults - note the slight
> variance when dealing with PTE entries.
>
> * Guest takes a stage 2 permission fault (user_mem_abort())
>
> * The host mapping is updated to point to another page (either zeroed or
> contents copied). This happens via the get_user_pages_unlocked()
> invoked via gfn_to_pfn_prot().
>
> * For all page sizes, mmu_invalidate_range_start() notifiers are called
> which will unmap the memory at stage 2.
>
> * For PTE (wp_page_copy), set_pte_at_notify() is called which eventually
> calls kvm_set_pte_hva() modified in $SUBJECT.
>
> For hugepages (hugetlb_cow) and annonymous THP (do_huge_pmd_wp_page)
> there are no notifying versions of page table entry updaters so stage
> 2 entries remain unmapped.
>
> * mmu_notifier_invalidate_range_end() is called. This updates
> mmu_notifier_seq which will abort the stage 2 fault handling in
> user_mem_abort().
>
> * As stage 2 was left unmapped for hugepages (no change_pte notifier),
> there'll be a subsequent translation fault where the mapped page will
> be cleaned/invalidated.
>
> > Why the two different approaches depending on the mapping size?
>
> I'm not sure why hugepage change notifier doesn't exist.
>
Probably it's simply not implemented yet, or not considered important
because the number of faults are reduced with larger mappings anyhow.
> >
> >> while the
> >> cache maintenance for the newly allocated page will be skipped due to
> >> the !FSC_PERM.
> >
> > If the above holds, then the subsequent fault would be similar to any
> > other fault which should work via the normal mechanism (dcache clean +
> > XN on fault, icache invalidation on permission fault).
> >>
> >> Hmm... smells like there might be a problem here. I'll try and put
> >> together a fix.
> >>
> >
> > It's not clear to me if we have a problem, or just different ways of
> > handling the PMD vs. PTE CoW case?
>
> With the above context, I don't think there is a problem for hugepage
> handling. The missing dcache maintenance addressed by $SUBJECT should be
> sufficient for PTEs.
>
Agreed.
Thanks for the explanation!
Christoffer
prev parent reply other threads:[~2018-09-04 13:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-23 15:33 [PATCH] KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW Marc Zyngier
2018-08-30 12:20 ` Christoffer Dall
2018-09-03 17:29 ` Punit Agrawal
2018-09-04 9:03 ` Christoffer Dall
2018-09-04 11:07 ` Punit Agrawal
2018-09-04 13:37 ` Christoffer Dall [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180904133727.GR4029@e113682-lin.lund.arm.com \
--to=christoffer.dall@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).