From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM
Date: Mon, 19 Nov 2012 15:07:37 +0000 [thread overview]
Message-ID: <20121119150737.GE3205@mudshark.cambridge.arm.com> (raw)
In-Reply-To: <20121110154342.2836.9669.stgit@chazy-air>
On Sat, Nov 10, 2012 at 03:43:42PM +0000, Christoffer Dall wrote:
> Handles the guest faults in KVM by mapping in corresponding user pages
> in the 2nd stage page tables.
>
> We invalidate the instruction cache by MVA whenever we map a page to the
> guest (no, we cannot only do it when we have an iabt because the guest
> may happily read/write a page before hitting the icache) if the hardware
> uses VIPT or PIPT. In the latter case, we can invalidate only that
> physical page. In the first case, all bets are off and we simply must
> invalidate the whole affair. Not that VIVT icaches are tagged with
> vmids, and we are out of the woods on that one. Alexander Graf was nice
> enough to remind us of this massive pain.
>
> There is also a subtle bug hidden somewhere, which we currently hide by
> marking all pages dirty even when the pages are only mapped read-only. The
> current hypothesis is that marking pages dirty may exercise the IO system and
> data cache more and therefore we don't see stale data in the guest, but it's
> purely guesswork. The bug is manifested by seemingly random kernel crashes in
> guests when the host is under extreme memory pressure and swapping is enabled.
>
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
[...]
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index f45be86..6c9ee3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -21,9 +21,11 @@
> #include <linux/io.h>
> #include <asm/idmap.h>
> #include <asm/pgalloc.h>
> +#include <asm/cacheflush.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_mmu.h>
> #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
> #include <asm/mach/map.h>
> #include <trace/events/kvm.h>
>
> @@ -503,9 +505,150 @@ out:
> return ret;
> }
>
> +static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> +{
> + /*
> + * If we are going to insert an instruction page and the icache is
> + * either VIPT or PIPT, there is a potential problem where the host
Why are PIPT caches affected by this? The virtual address is irrelevant.
> + * (or another VM) may have used this page at the same virtual address
> + * as this guest, and we read incorrect data from the icache. If
> + * we're using a PIPT cache, we can invalidate just that page, but if
> + * we are using a VIPT cache we need to invalidate the entire icache -
> + * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
> + */
> + if (icache_is_pipt()) {
> + unsigned long hva = gfn_to_hva(kvm, gfn);
> + __cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
> + } else if (!icache_is_vivt_asid_tagged()) {
> + /* any kind of VIPT cache */
> + __flush_icache_all();
> + }
so what if it *is* vivt_asid_tagged? Surely that necessitates nuking the
thing, unless it's VMID tagged as well (does that even exist?).
> +}
> +
> +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> + gfn_t gfn, struct kvm_memory_slot *memslot,
> + bool is_iabt, unsigned long fault_status)
> +{
> + pte_t new_pte;
> + pfn_t pfn;
> + int ret;
> + bool write_fault, writable;
> + unsigned long mmu_seq;
> + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +
> + if (is_iabt)
> + write_fault = false;
> + else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
Put this hsr parsing in a macro/function? Then you can just assign
write_fault directly.
> + write_fault = false;
> + else
> + write_fault = true;
> +
> + if (fault_status == FSC_PERM && !write_fault) {
> + kvm_err("Unexpected L2 read permission error\n");
> + return -EFAULT;
> + }
> +
> + /* We need minimum second+third level pages */
> + ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> + if (ret)
> + return ret;
> +
> + mmu_seq = vcpu->kvm->mmu_notifier_seq;
> + smp_rmb();
What's this barrier for and why isn't there a write barrier paired with
it?
> +
> + pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> + if (is_error_pfn(pfn))
> + return -EFAULT;
> +
> + new_pte = pfn_pte(pfn, PAGE_S2);
> + coherent_icache_guest_page(vcpu->kvm, gfn);
> +
> + spin_lock(&vcpu->kvm->mmu_lock);
> + if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> + goto out_unlock;
> + if (writable) {
> + pte_val(new_pte) |= L_PTE_S2_RDWR;
> + kvm_set_pfn_dirty(pfn);
> + }
> + stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte, false);
> +
> +out_unlock:
> + spin_unlock(&vcpu->kvm->mmu_lock);
> + /*
> + * XXX TODO FIXME:
> +- * This is _really_ *weird* !!!
> +- * We should be calling the _clean version, because we set the pfn dirty
> + * if we map the page writable, but this causes memory failures in
> + * guests under heavy memory pressure on the host and heavy swapping.
> + */
We need to get to the bottom of this, or expand this comment and make it
more widely known that there is something not understood in KVM VM code
for ARM, otherwise we'll be shipping code that we know contains a serious
flaw and I worry that, being the first release, it will end up getting
deployed fairly widely (although the bug reports might be useful...).
Will
next prev parent reply other threads:[~2012-11-19 15:07 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-10 15:42 [PATCH v4 00/14] KVM/ARM Implementation Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 01/14] ARM: Add page table and page defines needed by KVM Christoffer Dall
2012-11-19 14:14 ` Will Deacon
2012-11-29 15:57 ` Christoffer Dall
2012-11-30 11:46 ` Will Deacon
2012-11-30 15:54 ` Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 02/14] ARM: Section based HYP idmap Christoffer Dall
2012-11-19 14:16 ` Will Deacon
2012-11-29 18:59 ` Christoffer Dall
2012-11-30 10:58 ` Will Deacon
2012-11-30 16:29 ` Christoffer Dall
2012-11-19 14:25 ` Rob Herring
2012-11-10 15:42 ` [PATCH v4 03/14] ARM: Factor out cpuid implementor and part number Christoffer Dall
2012-11-19 14:21 ` Will Deacon
2012-11-29 21:38 ` Christoffer Dall
2012-11-30 10:21 ` Will Deacon
2012-11-30 15:42 ` Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 04/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
2012-11-19 14:41 ` Will Deacon
2012-11-29 22:36 ` Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 05/14] KVM: ARM: Hypervisor inititalization Christoffer Dall
2012-11-19 14:51 ` Will Deacon
2012-11-19 15:27 ` Cyril Chemparathy
2012-11-30 5:41 ` Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 06/14] KVM: ARM: Memory virtualization setup Christoffer Dall
2012-11-19 14:53 ` Will Deacon
2012-11-19 15:05 ` Christoffer Dall
2012-11-10 15:42 ` [PATCH v4 07/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
2012-11-19 14:55 ` Will Deacon
2012-11-19 15:04 ` Christoffer Dall
2012-11-19 15:26 ` Will Deacon
2012-11-19 16:09 ` Christoffer Dall
2012-11-19 16:21 ` Will Deacon
2012-11-30 6:13 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 08/14] KVM: ARM: World-switch implementation Christoffer Dall
2012-11-19 14:57 ` Will Deacon
2012-11-30 6:37 ` Christoffer Dall
2012-11-30 15:15 ` Will Deacon
2012-11-30 16:47 ` Christoffer Dall
2012-11-30 17:14 ` Will Deacon
2012-11-30 18:49 ` Christoffer Dall
2012-12-03 10:33 ` Marc Zyngier
2012-12-03 15:05 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 09/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
2012-11-19 15:01 ` Will Deacon
2012-11-19 15:27 ` [kvmarm] " Peter Maydell
2012-11-20 2:18 ` Rusty Russell
2012-11-30 20:22 ` Christoffer Dall
2012-12-03 11:05 ` Will Deacon
2012-12-03 19:09 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 10/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
2012-11-19 15:02 ` Will Deacon
2012-11-30 6:42 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 11/14] KVM: ARM: Demux CCSIDR in the userspace API Christoffer Dall
2012-11-19 15:03 ` Will Deacon
2012-11-30 6:45 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 12/14] KVM: ARM: VFP userspace interface Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
2012-11-19 15:07 ` Will Deacon [this message]
2012-11-30 21:40 ` Christoffer Dall
2012-12-03 13:06 ` Will Deacon
2012-12-03 15:02 ` Christoffer Dall
2012-11-10 15:43 ` [PATCH v4 14/14] KVM: ARM: Handle I/O aborts Christoffer Dall
2012-11-19 15:09 ` Will Deacon
2012-11-30 14:46 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121119150737.GE3205@mudshark.cambridge.arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).