From: Laszlo Ersek <lersek@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
kvmarm@lists.cs.columbia.edu, christoffer.dall@linaro.org,
marc.zyngier@arm.com, drjones@redhat.com, wei@redhat.com
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
Date: Mon, 17 Nov 2014 16:49:25 +0100 [thread overview]
Message-ID: <546A1905.6080607@redhat.com> (raw)
In-Reply-To: <546A146E.1020804@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]
On 11/17/14 16:29, Paolo Bonzini wrote:
>
>
> On 17/11/2014 15:58, Ard Biesheuvel wrote:
>> Readonly memslots are often used to implement emulation of ROMs and
>> NOR flashes, in which case the guest may legally map these regions as
>> uncached.
>> To deal with the incoherency associated with uncached guest mappings,
>> treat all readonly memslots as incoherent, and ensure that pages that
>> belong to regions tagged as such are flushed to DRAM before being passed
>> to the guest.
>
> On x86, the processor combines the cacheability values from the two
> levels of page tables. Is there no way to do the same on ARM?
Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
(Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
host) memory attributes.
When qemu writes, as part of emulating the flash programming commands,
to the RAMBlock that *otherwise* backs the flash range (as a r/o
memslot), those writes (from host userspace) tend to end up in dcache.
But, when the guest flips back the flash to romd mode, and tries to read
back the values from the flash as plain ROM, the dcache is completely
bypassed due to the strict stage1 mapping, and the guest goes directly
to DRAM.
Where qemu's earlier writes are not yet / necessarily visible.
Please see my original patch (which was incomplete) in the attachment,
it has a very verbose commit message.
Anyway, I'll let others explain; they can word it better than I can :)
FWIW,
Series
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
I ported this series to a 3.17.0+ based kernel, and tested it. It works
fine. The ROM-like view of the NOR flash now reflects the previously
programmed contents.
Series
Tested-by: Laszlo Ersek <lersek@redhat.com>
Thanks!
Laszlo
[-- Attachment #2: 0001-arm-arm64-KVM-clean-cache-on-page-fault-also-when-IP.upstream.patch --]
[-- Type: text/x-patch, Size: 6063 bytes --]
>From a2b4da9b03f03ccdb8b0988a5cc64d1967f00398 Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <lersek@redhat.com>
Date: Sun, 16 Nov 2014 01:43:11 +0100
Subject: [PATCH] arm, arm64: KVM: clean cache on page fault also when IPA is
uncached (WIP)
This patch builds on Marc Zyngier's commit
2d58b733c87689d3d5144e4ac94ea861cc729145.
(1) The guest bypasses the cache *not only* when the VCPU's dcache is
disabled (see bit 0 and bit 2 in SCTLR_EL1, "MMU enable" and "Cache
enable", respectively -- vcpu_has_cache_enabled()).
The guest bypasses the cache *also* when the Stage 1 memory attributes
say "device memory" about the Intermediate Page Address in question,
independently of the Stage 2 memory attributes. Refer to:
Table D5-38
Combining the stage 1 and stage 2 memory type assignments
in the ARM ARM. (This is likely similar to MTRRs on x86.)
(2) In edk2 (EFI Development Kit II), the ARM NOR flash driver,
ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashFvbDxe.c
uses the AddMemorySpace() and SetMemorySpaceAttributes() Global
Coherency Domain Services of DXE (Driver eXecution Environment) to
*justifiedly* set the attributes of the guest memory covering the
flash chip to EFI_MEMORY_UC ("uncached").
According to the AArch64 bindings for UEFI (see "2.3.6.1 Memory types"
in the UEFI-2.4A specification), EFI_MEMORY_UC is mapped to:
ARM Memory Type:
MAIR attribute encoding ARM Memory Type:
EFI Memory Type Attr<n> [7:4] [3:0] Meaning
--------------- ----------------------- ------------------------
EFI_MEMORY_UC 0000 0000 Device-nGnRnE (Device
(Not cacheable) non-Gathering,
non-Reordering, no Early
Write Acknowledgement)
This is correctly implemented in edk2, in the ArmConfigureMmu()
function, via the ArmSetMAIR() call and the MAIR_ATTR() macro:
The TT_ATTR_INDX_DEVICE_MEMORY (== 0) memory attribute index, which is
used for EFI_MEMORY_UC memory, is associated with the
MAIR_ATTR_DEVICE_MEMORY (== 0x00, see above) memory attribute value,
in the MAIR_ELx register.
As a consequence of (1) and (2), when edk2 code running in the guest
accesses an IPA falling in the flash range, it will completely bypass
the cache.
Therefore, when such a page is faulted in in user_mem_abort(), we must
flush the data cache; otherwise the guest will see stale data in the flash
chip.
This patch is not complete because I have no clue how to calculate the
memory attribute for "fault_ipa" in user_mem_abort(). Right now I set
"fault_ipa_uncached" to constant true, which might incur some performance
penalty for data faults, but it certainly improves correctness -- the
ArmVirtualizationPkg platform build of edk2 actually boots as a KVM guest
on APM Mustang.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
arch/arm/include/asm/kvm_mmu.h | 5 +++--
arch/arm64/include/asm/kvm_mmu.h | 5 +++--
arch/arm/kvm/mmu.c | 10 ++++++++--
3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index acb0d57..f867060 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -161,9 +161,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
}
static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
- unsigned long size)
+ unsigned long size,
+ bool ipa_uncached)
{
- if (!vcpu_has_cache_enabled(vcpu))
+ if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
kvm_flush_dcache_to_poc((void *)hva, size);
/*
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0caf7a5..123b521 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -243,9 +243,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
}
static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
- unsigned long size)
+ unsigned long size,
+ bool ipa_uncached)
{
- if (!vcpu_has_cache_enabled(vcpu))
+ if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
kvm_flush_dcache_to_poc((void *)hva, size);
if (!icache_is_aliasing()) { /* PIPT */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 57a403a..e2323b7 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -847,6 +847,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct vm_area_struct *vma;
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
+ bool fault_ipa_uncached;
write_fault = kvm_is_write_fault(vcpu);
if (fault_status == FSC_PERM && !write_fault) {
@@ -913,6 +914,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+ /* TBD */
+ fault_ipa_uncached = true;
+
if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
@@ -920,7 +924,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_set_s2pmd_writable(&new_pmd);
kvm_set_pfn_dirty(pfn);
}
- coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE);
+ coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE,
+ fault_ipa_uncached);
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -928,7 +933,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_set_s2pte_writable(&new_pte);
kvm_set_pfn_dirty(pfn);
}
- coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
+ coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
+ fault_ipa_uncached);
ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
}
--
1.8.3.1
next prev parent reply other threads:[~2014-11-17 15:49 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-17 14:58 [PATCH 1/3] kvm: add a memslot flag for incoherent memory regions Ard Biesheuvel
2014-11-17 14:58 ` [PATCH 2/3] arm, arm64: KVM: allow forced dcache flush on page faults Ard Biesheuvel
2014-11-17 14:58 ` [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots Ard Biesheuvel
2014-11-17 15:29 ` Paolo Bonzini
2014-11-17 15:39 ` Marc Zyngier
2014-11-17 16:03 ` Paolo Bonzini
2014-11-17 15:49 ` Laszlo Ersek [this message]
2014-11-19 23:32 ` Mario Smarduch
2014-11-20 8:08 ` Laszlo Ersek
2014-11-20 18:35 ` Mario Smarduch
2014-11-20 18:40 ` Peter Maydell
2014-11-20 19:15 ` Mario Smarduch
2014-11-20 19:49 ` Jon Masters
2014-11-20 20:10 ` Peter Maydell
2014-11-20 21:13 ` Laszlo Ersek
2014-11-20 21:59 ` Peter Maydell
2014-11-21 11:19 ` Christoffer Dall
2014-11-22 1:50 ` Mario Smarduch
2014-11-22 10:18 ` Christoffer Dall
2014-11-22 10:26 ` Laszlo Ersek
2014-11-22 12:27 ` Peter Maydell
2014-11-19 8:51 ` Ard Biesheuvel
2014-11-19 11:02 ` Paolo Bonzini
2014-11-19 11:03 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=546A1905.6080607@redhat.com \
--to=lersek@redhat.com \
--cc=ard.biesheuvel@linaro.org \
--cc=christoffer.dall@linaro.org \
--cc=drjones@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=marc.zyngier@arm.com \
--cc=pbonzini@redhat.com \
--cc=wei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox