[PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable
@ 2025-06-18  6:55 ankita
  2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

Grace based platforms such as Grace Hopper/Blackwell Superchips have
CPU accessible cache coherent GPU memory. The GPU device memory is
essentially a DDR memory and retains properties such as cacheability,
unaligned accesses, atomics and handling of executable faults. This
requires the device memory to be mapped as NORMAL in stage-2.

Today KVM forces the memory to either NORMAL or DEVICE_nGnRE depending
on whether the memory region is added to the kernel. The KVM code is
thus restrictive and prevents device memory that is not added to the
kernel to be marked as cacheable. The patch aims to solve this.

A cachebility check is made by consulting the VMA pgprot value. If
the pgprot mapping type is cacheable, it is considered safe to be
mapped cacheable as the KVM S2 will have the same Normal memory type
as the VMA has in the S1 and KVM has no additional responsibility
for safety.

Note when FWB (Force Write Back) is not enabled, the kernel expects to
trivially do cache management by flushing the memory by linearly
converting a kvm_pte to phys_addr to a KVA. The cache management thus
relies on memory being mapped. Since the GPU device memory is not kernel
mapped, exit when the FWB is not supported. Similarly, ARM64_HAS_CACHE_DIC
allows KVM to avoid flushing the icache and turns icache_inval_pou() into
a NOP. So the cacheable PFNMAP is made contingent on these two hardware
features.

The ability to safely do the cacheable mapping of PFNMAP is exposed
through a KVM capability for userspace consumption.

The changes are heavily influenced by the discussions among
maintainers Marc Zyngier and Oliver Upton besides Jason Gunthorpe,
Catalin Marinas, David Hildenbrand, Sean Christopherson [1]. Many
thanks for their valuable suggestions.

Applied over next-20250610 and tested on the Grace Blackwell
platform by booting up VM, loading NVIDIA module [2] and running
nvidia-smi in the VM.

To run CUDA workloads, there is a dependency on the IOMMUFD and the
Nested Page Table patches being worked on separately by Nicolin Chen.
(nicolinc@nvidia.com). NVIDIA has provided git repositories which
includes all the requisite kernel [3] and Qemu [4] patches in case
one wants to try.

v6 -> v7
1. New patch to rename symbols to more accurately reflect the
CMO usage functionality (Jason Gunthorpe).
2. Updated the block cacheable PFNMAP patch invert the cacheability
check function (Sean Christopherson).
3. Removed the memslot flag KVM_MEM_ENABLE_CACHEABLE_PFNMAP.
(Jason Gunthorpe, Sean Christopherson, Oliver Upton).
4. Commit message changes in 2/5. (Jason Gunthorpe)

v5 -> v6
1. 2/5 updated to add kvm_arch_supports_cacheable_pfnmap weak
definition to avoid build warnings. (Donald Dutile).

v4 -> v5
1. Invert the check to allow MT_DEVICE_* or NORMAL_NC instead of
disallowing MT_NORMAL in 1/5. (Catalin Marinas)
2. Removed usage of stage2_has_fwb and directly using the FWB
cap check. (Oliver Upton)
3. Introduced kvm_arch_supports_cacheable_pfnmap to check if
the prereq features are present. (David Hildenbrand)

v3 -> v4
1. Fixed a security bug due to mismatched attributes between S1 and
S2 mapping to move it to a separate patch. Suggestion by
Jason Gunthorpe (jgg@nvidia.com).
2. New minor patch to change the scope of the FWB support indicator
function.
3. Patch to introduce a new memslot flag. Suggestion by Oliver Upton
(oliver.upton@linux.dev) and Marc Zyngier (maz@kernel.org)
4. Patch to introduce a new KVM cap to expose cacheable PFNMAP support.
Suggestion by Marc Zyngier (maz@kernel.org).
5. Added checks for ARM64_HAS_CACHE_DIC. Suggestion by Catalin Marinas
(catalin.marinas@arm.com)

v2 -> v3
1. Restricted the new changes to check for cacheability to VM_PFNMAP
   based on David Hildenbrand's (david@redhat.com) suggestion.
2. Removed the MTE checks based on Jason Gunthorpe's (jgg@nvidia.com)
   observation that it already done earlier in
   kvm_arch_prepare_memory_region.
3. Dropped the pfn_valid() checks based on suggestions by
   Catalin Marinas (catalin.marinas@arm.com).
4. Removed the code for exec fault handling as it is not needed
   anymore.

v1 -> v2
1. Removed kvm_is_device_pfn() as a determiner for device type memory
   determination. Instead using pfn_valid()
2. Added handling for MTE.
3. Minor cleanup.

Link: https://lore.kernel.org/all/20250310103008.3471-1-ankita@nvidia.com [1]
Link: https://github.com/NVIDIA/open-gpu-kernel-modules [2]
Link: https://github.com/NVIDIA/NV-Kernels/tree/6.8_ghvirt [3]
Link: https://github.com/NVIDIA/QEMU/tree/6.8_ghvirt_iommufd_vcmdq [4]

v6 Link:
Link: https://lore.kernel.org/all/20250524013943.2832-1-ankita@nvidia.com/

Ankit Agrawal (5):
  KVM: arm64: Rename symbols to reflect whether CMO may be used
  KVM: arm64: Block cacheable PFNMAP mapping
  KVM: arm64: New function to determine hardware cache management
    support
  KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  KVM: arm64: Expose new KVM cap for cacheable PFNMAP

 Documentation/virt/kvm/api.rst | 13 ++++-
 arch/arm64/kvm/arm.c           |  7 +++
 arch/arm64/kvm/mmu.c           | 98 ++++++++++++++++++++++++++++++----
 include/linux/kvm_host.h       |  2 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/kvm_main.c            |  5 ++
 6 files changed, 115 insertions(+), 11 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used
  2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
@ 2025-06-18  6:55 ` ankita
  2025-06-18 14:28   ` Catalin Marinas
  2025-06-18 14:35   ` Catalin Marinas
  2025-06-18  6:55 ` [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping ankita
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

Currently, the kvm_is_device_pfn() detects if the memory is kernel
mapped. It thus implies whether KVM can use Cache Maintenance
Operations (CMOs) on that PFN. Rename the function to reflect this.

Additionally, the "device" variable is effectively trying to setup the S2
to prevent CMOs. Calling it 'disable_cmo' would make this code clearer.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 arch/arm64/kvm/mmu.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2942ec92c5a4..3d77a278fc4f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -193,9 +193,9 @@ int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm,
 	return 0;
 }
 
-static bool kvm_is_device_pfn(unsigned long pfn)
+static bool kvm_can_use_cmo_pfn(unsigned long pfn)
 {
-	return !pfn_is_map_memory(pfn);
+	return pfn_is_map_memory(pfn);
 }
 
 static void *stage2_memcache_zalloc_page(void *arg)
@@ -1478,7 +1478,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, mte_allowed;
-	bool device = false, vfio_allow_any_uc = false;
+	bool disable_cmo = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
@@ -1642,7 +1642,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (!kvm_can_use_cmo_pfn(pfn)) {
 		/*
 		 * If the page was identified as device early by looking at
 		 * the VMA flags, vma_pagesize is already representing the
@@ -1653,7 +1653,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * In both cases, we don't let transparent_hugepage_adjust()
 		 * change things at the last minute.
 		 */
-		device = true;
+		disable_cmo = true;
 	} else if (logging_active && !write_fault) {
 		/*
 		 * Only actually map the page as writable if this was a write
@@ -1662,7 +1662,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		writable = false;
 	}
 
-	if (exec_fault && device)
+	if (exec_fault && disable_cmo)
 		return -ENOEXEC;
 
 	/*
@@ -1695,7 +1695,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (vma_pagesize == PAGE_SIZE && !(force_pte || disable_cmo)) {
 		if (fault_is_perm && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
@@ -1709,7 +1709,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 	}
 
-	if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
+	if (!fault_is_perm && !disable_cmo && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (mte_allowed) {
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
@@ -1725,7 +1725,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault)
 		prot |= KVM_PGTABLE_PROT_X;
 
-	if (device) {
+	if (disable_cmo) {
 		if (vfio_allow_any_uc)
 			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
 		else
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping
  2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
  2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
@ 2025-06-18  6:55 ` ankita
  2025-06-18 15:46   ` Catalin Marinas
  2025-06-18  6:55 ` [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support ankita
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

Fixes a security bug due to mismatched attributes between S1 and
S2 mapping.

Currently, it is possible for a region to be cacheable in the userspace
VMA, but mapped non cached in S2. This creates a potential issue where
the VMM may sanitize cacheable memory across VMs using cacheable stores,
ensuring it is zeroed. However, if KVM subsequently assigns this memory
to a VM as uncached, the VM could end up accessing stale, non-zeroed data
from a previous VM, leading to unintended data exposure. This is a security
risk.

Block such mismatch attributes case by returning EINVAL when userspace
try to map PFNMAP cacheable. Only allow NORMAL_NC and DEVICE_*.

CC: Oliver Upton <oliver.upton@linux.dev>
CC: Sean Christopherson <seanjc@google.com>
CC: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 arch/arm64/kvm/mmu.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3d77a278fc4f..d6e0d5f46b45 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1470,6 +1470,22 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+/*
+ * Determine the memory region cacheability from VMA's pgprot. This
+ * is used to set the stage 2 PTEs.
+ */
+static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
+{
+	switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
+	case MT_NORMAL_NC:
+	case MT_DEVICE_nGnRnE:
+	case MT_DEVICE_nGnRE:
+		return false;
+	default:
+		return true;
+	}
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1477,7 +1493,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
-	bool exec_fault, mte_allowed;
+	bool exec_fault, mte_allowed, is_vma_cacheable = false;
 	bool disable_cmo = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
@@ -1619,6 +1635,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
 
+	is_vma_cacheable = kvm_vma_is_cacheable(vma);
+
 	/* Don't use the VMA after the unlock -- it may have vanished */
 	vma = NULL;
 
@@ -1643,6 +1661,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 
 	if (!kvm_can_use_cmo_pfn(pfn)) {
+		if (is_vma_cacheable)
+			return -EINVAL;
+
 		/*
 		 * If the page was identified as device early by looking at
 		 * the VMA flags, vma_pagesize is already representing the
@@ -1726,6 +1747,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		prot |= KVM_PGTABLE_PROT_X;
 
 	if (disable_cmo) {
+		if (is_vma_cacheable) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
 		if (vfio_allow_any_uc)
 			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
 		else
@@ -2221,6 +2247,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				ret = -EINVAL;
 				break;
 			}
+
+			/* Cacheable PFNMAP is not allowed */
+			if (kvm_vma_is_cacheable(vma)) {
+				ret = -EINVAL;
+				break;
+			}
 		}
 		hva = min(reg_end, vma->vm_end);
 	} while (hva < reg_end);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support
  2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
  2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
  2025-06-18  6:55 ` [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping ankita
@ 2025-06-18  6:55 ` ankita
  2025-06-18 16:12   ` Catalin Marinas
  2025-06-18  6:55 ` [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
  2025-06-18  6:55 ` [PATCH v7 5/5] KVM: arm64: Expose new KVM cap for cacheable PFNMAP ankita
  4 siblings, 1 reply; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

VM_PFNMAP VMA's are allowed to contain PTE's which point to physical
addresses that does not have a struct page and may not be in the kernel
direct map.

However ARM64 KVM relies on a simple conversion from physaddr to a
kernel virtual address when it does cache maintenance as the CMO
instructions work on virtual addresses. This simple approach does not
work for physical addresses from VM_PFNMAP since those addresses may
not have a kernel virtual address, or it may be difficult to find it.

Fortunately if the ARM64 CPU has two features, S2FWB and CACHE DIC,
then KVM no longer needs to do cache flushing and NOP's all the
CMOs. This has the effect of no longer requiring a KVA for addresses
mapped into the S2.

Add a new function, kvm_arch_supports_cacheable_pfnmap(), to report
this capability. From a core prespective it means the arch can accept
a cachable VM_PFNMAP as a memslot. From an ARM64 perspective it means
that no KVA is required.

CC: Jason Gunthorpe <jgg@nvidia.com>
CC: David Hildenbrand <david@redhat.com>
CC: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 arch/arm64/kvm/mmu.c     | 23 +++++++++++++++++++++++
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      |  5 +++++
 3 files changed, 30 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d6e0d5f46b45..a71b77df7c96 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1287,6 +1287,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_nested_s2_wp(kvm);
 }
 
+/**
+ * kvm_arch_supports_cacheable_pfnmap() - Determine whether hardware
+ *      supports cache management.
+ *
+ * ARM64 KVM relies on a simple conversion from physaddr to a kernel
+ * virtual address (KVA) when it does cache maintenance as the CMO
+ * instructions work on virtual addresses. This is incompatible with
+ * VM_PFNMAP VMAs which may not have a kernel direct mapping to a
+ * virtual address.
+ *
+ * With S2FWB and CACHE DIC features, KVM need not do cache flushing
+ * and CMOs are NOP'd. This has the effect of no longer requiring a
+ * KVA for addresses mapped into the S2. The presence of these features
+ * are thus necessary to support cacheable S2 mapping of VM_PFNMAP.
+ *
+ * Return: True if FWB and DIC is supported.
+ */
+bool kvm_arch_supports_cacheable_pfnmap(void)
+{
+	return cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) &&
+	       cpus_have_final_cap(ARM64_HAS_CACHE_DIC);
+}
+
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bde4fb5c6aa..c91d5b5f8c39 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1235,6 +1235,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm);
 /* flush memory translations pointing to 'slot' */
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot);
+/* hardware supports cache management */
+bool kvm_arch_supports_cacheable_pfnmap(void);
 
 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
 		       struct page **pages, int nr_pages);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eec82775c5bf..feacfb203a70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1583,6 +1583,11 @@ static void kvm_replace_memslot(struct kvm *kvm,
 #define KVM_SET_USER_MEMORY_REGION_V1_FLAGS \
 	(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)
 
+bool __weak kvm_arch_supports_cacheable_pfnmap(void)
+{
+	return false;
+}
+
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region2 *mem)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
                   ` (2 preceding siblings ...)
  2025-06-18  6:55 ` [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support ankita
@ 2025-06-18  6:55 ` ankita
  2025-06-18 16:34   ` Catalin Marinas
  2025-06-18  6:55 ` [PATCH v7 5/5] KVM: arm64: Expose new KVM cap for cacheable PFNMAP ankita
  4 siblings, 1 reply; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

Today KVM forces the memory to either NORMAL or DEVICE_nGnRE
based on pfn_is_map_memory (which tracks whether the device memory
is in the kernel map) and ignores the per-VMA flags that indicates the
memory attributes. The KVM code is thus restrictive and allows only for
the memory that is added to the kernel to be marked as cacheable.

The device memory such as on the Grace Hopper/Blackwell systems
is interchangeable with DDR memory and retains properties such as
cacheability, unaligned accesses, atomics and handling of executable
faults. This requires the device memory to be mapped as NORMAL in
stage-2.

Given that the GPU device memory is not added to the kernel (but is rather
VMA mapped through remap_pfn_range() in nvgrace-gpu module which sets
VM_PFNMAP), pfn_is_map_memory() is false and thus KVM prevents such memory
to be mapped Normal cacheable. The patch aims to solve this use case.

Note when FWB is not enabled, the kernel expects to trivially do
cache management by flushing the memory by linearly converting a
kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). The
cache management thus relies on memory being mapped. Moreover
ARM64_HAS_CACHE_DIC CPU cap allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. These two capabilities
are thus a requirement of the cacheable PFNMAP feature. Make use of
kvm_arch_supports_cacheable_pfnmap() to check them.

A cachebility check is made by consulting the VMA pgprot value.
If the pgprot mapping type is cacheable, it is safe to be mapped S2
cacheable as the KVM S2 will have the same Normal memory type as the
VMA has in the S1 and KVM has no additional responsibility for safety.
Checking pgprot as NORMAL is thus a KVM sanity check.

Add check for COW VM_PFNMAP and refuse such mapping.

No additional checks for MTE are needed as kvm_arch_prepare_memory_region()
already tests it at an early stage during memslot creation. There would
not even be a fault if the memslot is not created.

CC: Oliver Upton <oliver.upton@linux.dev>
CC: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 arch/arm64/kvm/mmu.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a71b77df7c96..6a3955e07b5e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1660,6 +1660,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

 	is_vma_cacheable = kvm_vma_is_cacheable(vma);

+	/* Reject COW VM_PFNMAP */
+	if ((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags))
+		return -EINVAL;
+
 	/* Don't use the VMA after the unlock -- it may have vanished */
 	vma = NULL;

@@ -1684,9 +1688,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;

 	if (!kvm_can_use_cmo_pfn(pfn)) {
-		if (is_vma_cacheable)
-			return -EINVAL;
-
 		/*
 		 * If the page was identified as device early by looking at
 		 * the VMA flags, vma_pagesize is already representing the
@@ -1696,8 +1697,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 *
 		 * In both cases, we don't let transparent_hugepage_adjust()
 		 * change things at the last minute.
+		 *
+		 * Do not set device as the device memory is cacheable. Note
+		 * that such mapping is safe as the KVM S2 will have the same
+		 * Normal memory type as the VMA has in the S1.
 		 */
-		disable_cmo = true;
+		if (!is_vma_cacheable)
+			disable_cmo = true;
 	} else if (logging_active && !write_fault) {
 		/*
 		 * Only actually map the page as writable if this was a write
@@ -1784,6 +1790,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		prot |= KVM_PGTABLE_PROT_X;
 	}

+	/*
+	 *  When FWB is unsupported KVM needs to do cache flushes
+	 *  (via dcache_clean_inval_poc()) of the underlying memory. This is
+	 *  only possible if the memory is already mapped into the kernel map.
+	 *
+	 *  Outright reject as the cacheable device memory is not present in
+	 *  the kernel map and not suitable for cache management.
+	 */
+	if (is_vma_cacheable && !kvm_arch_supports_cacheable_pfnmap()) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
@@ -2271,8 +2290,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				break;
 			}

-			/* Cacheable PFNMAP is not allowed */
-			if (kvm_vma_is_cacheable(vma)) {
+			/*
+			 * Cacheable PFNMAP is allowed only if the hardware
+			 * supports it.
+			 */
+			if (kvm_vma_is_cacheable(vma) &&
+			    !kvm_arch_supports_cacheable_pfnmap()) {
 				ret = -EINVAL;
 				break;
 			}
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 5/5] KVM: arm64: Expose new KVM cap for cacheable PFNMAP
  2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
                   ` (3 preceding siblings ...)
  2025-06-18  6:55 ` [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
@ 2025-06-18  6:55 ` ankita
  4 siblings, 0 replies; 18+ messages in thread
From: ankita @ 2025-06-18  6:55 UTC (permalink / raw)
  To: ankita, jgg, maz, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, ryan.roberts, shahuang,
	lpieralisi, david, ddutile, seanjc
  Cc: aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

From: Ankit Agrawal <ankita@nvidia.com>

Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.

The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.

This capability would allow userspace to discover the support.
This would be used in conjunction with the
KVM_MEM_ENABLE_CACHEABLE_PFNMAP memslot flag. Userspace is
required to query this capability before it can set the memslot
flag.

This cap could also be used by userspace to prevent live-migration
across FWB and non-FWB hosts.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
CC: Oliver Upton <oliver.upton@linux.dev>
CC: David Hildenbrand <david@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 Documentation/virt/kvm/api.rst | 13 ++++++++++++-
 arch/arm64/kvm/arm.c           |  7 +++++++
 include/uapi/linux/kvm.h       |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1bd2d42e6424..615cdbdd505f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8528,7 +8528,7 @@ ENOSYS for the others.
 When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
 type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
 
-7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
+7.42 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
 -------------------------------------
 
 :Architectures: arm64
@@ -8557,6 +8557,17 @@ given VM.
 When this capability is enabled, KVM resets the VCPU when setting
 MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
 
+7.43 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED
+-------------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: None
+
+This capability indicate to the userspace whether a PFNMAP memory region
+can be safely mapped as cacheable. This relies on the presence of
+force write back (FWB) feature support on the hardware.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index de2b4e9c9f9f..9fb8901dcd86 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -408,6 +408,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
 		r = BIT(0);
 		break;
+	case KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED:
+		if (!kvm)
+			r = -EINVAL;
+		else
+			r = kvm_arch_supports_cacheable_pfnmap();
+		break;
+
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d00b85cb168c..ed9a46875a49 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -934,6 +934,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_ARM_EL2 240
 #define KVM_CAP_ARM_EL2_E2H0 241
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
+#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used
  2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
@ 2025-06-18 14:28   ` Catalin Marinas
  2025-06-18 14:35   ` Catalin Marinas
  1 sibling, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2025-06-18 14:28 UTC (permalink / raw)
  To: ankita
  Cc: jgg, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 06:55:37AM +0000, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
> 
> Currently, the kvm_is_device_pfn() detects if the memory is kernel
> mapped. It thus implies whether KVM can use Cache Maintenance
> Operations (CMOs) on that PFN. Rename the function to reflect this.
> 
> Additionally, the "device" variable is effectively trying to setup the S2
> to prevent CMOs. Calling it 'disable_cmo' would make this code clearer.

I'm not sure CMOs is the only reason. Another is to prevent the guest
from mapping device memory as something other than Device with
possible implications for external aborts.

-- 
Catalin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used
  2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
  2025-06-18 14:28   ` Catalin Marinas
@ 2025-06-18 14:35   ` Catalin Marinas
  2025-06-19  2:22     ` Ankit Agrawal
  1 sibling, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-06-18 14:35 UTC (permalink / raw)
  To: ankita
  Cc: jgg, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 06:55:37AM +0000, ankita@nvidia.com wrote:
> -static bool kvm_is_device_pfn(unsigned long pfn)
> +static bool kvm_can_use_cmo_pfn(unsigned long pfn)
>  {
> -	return !pfn_is_map_memory(pfn);
> +	return pfn_is_map_memory(pfn);
>  }

I wonder, why not just use pfn_is_map_memory() directly? At a quick
grep, it's only used in one place and your patches don't seem to modify
this function further.

-- 
Catalin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping
  2025-06-18  6:55 ` [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping ankita
@ 2025-06-18 15:46   ` Catalin Marinas
  2025-06-19  2:21     ` Ankit Agrawal
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-06-18 15:46 UTC (permalink / raw)
  To: ankita
  Cc: jgg, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 06:55:38AM +0000, ankita@nvidia.com wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 3d77a278fc4f..d6e0d5f46b45 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1470,6 +1470,22 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>  	return vma->vm_flags & VM_MTE_ALLOWED;
>  }
>  
> +/*
> + * Determine the memory region cacheability from VMA's pgprot. This
> + * is used to set the stage 2 PTEs.
> + */
> +static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
> +{
> +	switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
> +	case MT_NORMAL_NC:
> +	case MT_DEVICE_nGnRnE:
> +	case MT_DEVICE_nGnRE:
> +		return false;
> +	default:
> +		return true;
> +	}
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_s2_trans *nested,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1477,7 +1493,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  {
>  	int ret = 0;
>  	bool write_fault, writable, force_pte = false;
> -	bool exec_fault, mte_allowed;
> +	bool exec_fault, mte_allowed, is_vma_cacheable = false;

Nit: do we need to initialise is_vma_cacheable here? It did not seem
used until the kvm_vma_is_cacheable() call. Anyway, it's harmless.

>  	bool disable_cmo = false, vfio_allow_any_uc = false;
>  	unsigned long mmu_seq;
>  	phys_addr_t ipa = fault_ipa;
> @@ -1619,6 +1635,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
>  
> +	is_vma_cacheable = kvm_vma_is_cacheable(vma);
> +
>  	/* Don't use the VMA after the unlock -- it may have vanished */
>  	vma = NULL;
>  
> @@ -1643,6 +1661,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		return -EFAULT;
>  
>  	if (!kvm_can_use_cmo_pfn(pfn)) {
> +		if (is_vma_cacheable)
> +			return -EINVAL;
> +
>  		/*
>  		 * If the page was identified as device early by looking at
>  		 * the VMA flags, vma_pagesize is already representing the

This block also sets 'disable_cmo' (originally 'device') to true.

> @@ -1726,6 +1747,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		prot |= KVM_PGTABLE_PROT_X;
>  
>  	if (disable_cmo) {
> +		if (is_vma_cacheable) {
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}

so, is there anything else changing 'disable_cmo' up to this point? If
not, I'd drop the second is_vma_cacheable check.

-- 
Catalin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support
  2025-06-18  6:55 ` [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support ankita
@ 2025-06-18 16:12   ` Catalin Marinas
  0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2025-06-18 16:12 UTC (permalink / raw)
  To: ankita
  Cc: jgg, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 06:55:39AM +0000, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
> 
> VM_PFNMAP VMA's are allowed to contain PTE's which point to physical
> addresses that does not have a struct page and may not be in the kernel
> direct map.
> 
> However ARM64 KVM relies on a simple conversion from physaddr to a
> kernel virtual address when it does cache maintenance as the CMO
> instructions work on virtual addresses. This simple approach does not
> work for physical addresses from VM_PFNMAP since those addresses may
> not have a kernel virtual address, or it may be difficult to find it.
> 
> Fortunately if the ARM64 CPU has two features, S2FWB and CACHE DIC,
> then KVM no longer needs to do cache flushing and NOP's all the
> CMOs. This has the effect of no longer requiring a KVA for addresses
> mapped into the S2.
> 
> Add a new function, kvm_arch_supports_cacheable_pfnmap(), to report
> this capability. From a core prespective it means the arch can accept
> a cachable VM_PFNMAP as a memslot. From an ARM64 perspective it means
> that no KVA is required.
> 
> CC: Jason Gunthorpe <jgg@nvidia.com>
> CC: David Hildenbrand <david@redhat.com>
> CC: Donald Dutile <ddutile@redhat.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-18  6:55 ` [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
@ 2025-06-18 16:34   ` Catalin Marinas
  2025-06-18 16:38     ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-06-18 16:34 UTC (permalink / raw)
  To: ankita
  Cc: jgg, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 06:55:40AM +0000, ankita@nvidia.com wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a71b77df7c96..6a3955e07b5e 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1660,6 +1660,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	is_vma_cacheable = kvm_vma_is_cacheable(vma);
>  
> +	/* Reject COW VM_PFNMAP */
> +	if ((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags))
> +		return -EINVAL;

It may help to add a comment here why this needs to be rejected. I
forgot the details but tracked it down to an email from David a few
months ago:

https://lore.kernel.org/all/a2d95399-62ad-46d3-9e48-6fa90fd2c2f3@redhat.com/

> +
>  	/* Don't use the VMA after the unlock -- it may have vanished */
>  	vma = NULL;
>  
> @@ -1684,9 +1688,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		return -EFAULT;
>  
>  	if (!kvm_can_use_cmo_pfn(pfn)) {
> -		if (is_vma_cacheable)
> -			return -EINVAL;
> -
>  		/*
>  		 * If the page was identified as device early by looking at
>  		 * the VMA flags, vma_pagesize is already representing the
> @@ -1696,8 +1697,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		 *
>  		 * In both cases, we don't let transparent_hugepage_adjust()
>  		 * change things at the last minute.
> +		 *
> +		 * Do not set device as the device memory is cacheable. Note
> +		 * that such mapping is safe as the KVM S2 will have the same
> +		 * Normal memory type as the VMA has in the S1.
>  		 */
> -		disable_cmo = true;
> +		if (!is_vma_cacheable)
> +			disable_cmo = true;

I'm tempted to stick to the 'device' variable name. Or something like
s2_noncacheable. As I commented, it's not just about disabling CMOs.

>  	} else if (logging_active && !write_fault) {
>  		/*
>  		 * Only actually map the page as writable if this was a write
> @@ -1784,6 +1790,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		prot |= KVM_PGTABLE_PROT_X;
>  	}
>  
> +	/*
> +	 *  When FWB is unsupported KVM needs to do cache flushes
> +	 *  (via dcache_clean_inval_poc()) of the underlying memory. This is
> +	 *  only possible if the memory is already mapped into the kernel map.
> +	 *
> +	 *  Outright reject as the cacheable device memory is not present in
> +	 *  the kernel map and not suitable for cache management.
> +	 */
> +	if (is_vma_cacheable && !kvm_arch_supports_cacheable_pfnmap()) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}

I'm missing the full context around this hunk but, judging by
indentation, does it also reject any cacheable vma even if it is not
PFNMAP on pre-FWB hardware?

-- 
Catalin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-18 16:34   ` Catalin Marinas
@ 2025-06-18 16:38     ` Jason Gunthorpe
  2025-06-19 12:14       ` Ankit Agrawal
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-18 16:38 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: ankita, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	will, ryan.roberts, shahuang, lpieralisi, david, ddutile, seanjc,
	aniketa, cjia, kwankhede, kjaju, targupta, vsethi, acurrid,
	apopple, jhubbard, danw, zhiw, mochs, udhoke, dnigam,
	alex.williamson, sebastianene, coltonlewis, kevin.tian, yi.l.liu,
	ardb, akpm, gshan, linux-mm, tabba, qperret, kvmarm, linux-kernel,
	linux-arm-kernel, maobibo

On Wed, Jun 18, 2025 at 05:34:16PM +0100, Catalin Marinas wrote:
> > +		 *
> > +		 * Do not set device as the device memory is cacheable. Note
> > +		 * that such mapping is safe as the KVM S2 will have the same
> > +		 * Normal memory type as the VMA has in the S1.
> >  		 */
> > -		disable_cmo = true;
> > +		if (!is_vma_cacheable)
> > +			disable_cmo = true;
> 
> I'm tempted to stick to the 'device' variable name. Or something like
> s2_noncacheable. As I commented, it's not just about disabling CMOs.

I think it would be clearer to have two concepts/variable then because
the cases where it is really about preventing cachable access to
prevent aborts are not linked to the logic that checks pfn valid. We
have to detect those cases separately (through the VMA flags was it?).

Having these two things together is IMHO confusing..

Jason


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping
  2025-06-18 15:46   ` Catalin Marinas
@ 2025-06-19  2:21     ` Ankit Agrawal
  0 siblings, 0 replies; 18+ messages in thread
From: Ankit Agrawal @ 2025-06-19  2:21 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Jason Gunthorpe, maz@kernel.org, oliver.upton@linux.dev,
	joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com,
	will@kernel.org, ryan.roberts@arm.com, shahuang@redhat.com,
	lpieralisi@kernel.org, david@redhat.com, ddutile@redhat.com,
	seanjc@google.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Krishnakant Jaju, Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid,
	Alistair Popple, John Hubbard, Dan Williams, Zhi Wang, Matt Ochs,
	Uday Dhoke, Dheeraj Nigam, alex.williamson@redhat.com,
	sebastianene@google.com, coltonlewis@google.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org,
	akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org,
	tabba@google.com, qperret@google.com, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn

Thanks Catalin for reviewing the patch.

>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>                         struct kvm_s2_trans *nested,
>>                         struct kvm_memory_slot *memslot, unsigned long hva,
>> @@ -1477,7 +1493,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  {
>>       int ret = 0;
>>       bool write_fault, writable, force_pte = false;
>> -     bool exec_fault, mte_allowed;
>> +     bool exec_fault, mte_allowed, is_vma_cacheable = false;
>
> Nit: do we need to initialise is_vma_cacheable here? It did not seem
> used until the kvm_vma_is_cacheable() call. Anyway, it's harmless.

Shouldn't be necessary. I'll remove it.

>>       if (disable_cmo) {
>> +             if (is_vma_cacheable) {
>> +                     ret = -EINVAL;
>> +                     goto out_unlock;
>> +             }
>
> so, is there anything else changing 'disable_cmo' up to this point? If
> not, I'd drop the second is_vma_cacheable check.
>
> --
> Catalin

Ack, we don't need the second check.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used
  2025-06-18 14:35   ` Catalin Marinas
@ 2025-06-19  2:22     ` Ankit Agrawal
  0 siblings, 0 replies; 18+ messages in thread
From: Ankit Agrawal @ 2025-06-19  2:22 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Jason Gunthorpe, maz@kernel.org, oliver.upton@linux.dev,
	joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com,
	will@kernel.org, ryan.roberts@arm.com, shahuang@redhat.com,
	lpieralisi@kernel.org, david@redhat.com, ddutile@redhat.com,
	seanjc@google.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Krishnakant Jaju, Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid,
	Alistair Popple, John Hubbard, Dan Williams, Zhi Wang, Matt Ochs,
	Uday Dhoke, Dheeraj Nigam, alex.williamson@redhat.com,
	sebastianene@google.com, coltonlewis@google.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org,
	akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org,
	tabba@google.com, qperret@google.com, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn

Thanks Catalin for reviewing the patch.

>> -static bool kvm_is_device_pfn(unsigned long pfn)
>> +static bool kvm_can_use_cmo_pfn(unsigned long pfn)
>>  {
>> -     return !pfn_is_map_memory(pfn);
>> +     return pfn_is_map_memory(pfn);
>>  }
>
> I wonder, why not just use pfn_is_map_memory() directly? At a quick
> grep, it's only used in one place and your patches don't seem to modify
> this function further.

Ok sure, will make the change.

>> Currently, the kvm_is_device_pfn() detects if the memory is kernel
>> mapped. It thus implies whether KVM can use Cache Maintenance
>> Operations (CMOs) on that PFN. Rename the function to reflect this.
>>
>> Additionally, the "device" variable is effectively trying to setup the S2
>> to prevent CMOs. Calling it 'disable_cmo' would make this code clearer.
>
> I'm not sure CMOs is the only reason. Another is to prevent the guest
> from mapping device memory as something other than Device with
> possible implications for external aborts.

Ack.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-18 16:38     ` Jason Gunthorpe
@ 2025-06-19 12:14       ` Ankit Agrawal
  2025-06-19 14:16         ` Jason Gunthorpe
  2025-06-19 16:03         ` Donald Dutile
  0 siblings, 2 replies; 18+ messages in thread
From: Ankit Agrawal @ 2025-06-19 12:14 UTC (permalink / raw)
  To: Jason Gunthorpe, Catalin Marinas
  Cc: maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
	ryan.roberts@arm.com, shahuang@redhat.com, lpieralisi@kernel.org,
	david@redhat.com, ddutile@redhat.com, seanjc@google.com,
	Aniket Agashe, Neo Jia, Kirti Wankhede, Krishnakant Jaju,
	Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid, Alistair Popple,
	John Hubbard, Dan Williams, Zhi Wang, Matt Ochs, Uday Dhoke,
	Dheeraj Nigam, alex.williamson@redhat.com,
	sebastianene@google.com, coltonlewis@google.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org,
	akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org,
	tabba@google.com, qperret@google.com, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn

>> > -           disable_cmo = true;
>> > +           if (!is_vma_cacheable)
>> > +                   disable_cmo = true;
>>
>> I'm tempted to stick to the 'device' variable name. Or something like
>> s2_noncacheable. As I commented, it's not just about disabling CMOs.
>
> I think it would be clearer to have two concepts/variable then because
> the cases where it is really about preventing cachable access to
> prevent aborts are not linked to the logic that checks pfn valid. We
> have to detect those cases separately (through the VMA flags was it?).
>
> Having these two things together is IMHO confusing..
>
> Jason

Thanks Catalin and Jason for the comments.

Considering the feedback, I think we may do the following here:
1. Rename the device variable to S2_noncacheable to represent if the S2
    is going to be marked non cacheable. Otherwise S2 will be mapped NORMAL.
2. Detect what PFN has to be marked S2_noncacheable. If a PFN is not in the
    kernel map, mark as S2 except for PFNMAP + VMA cacheable.
3. Prohibit cacheable PFNMAP if hardware doesn't support FWB and CACHE DIC.
4. Prohibit S2 non cached mapping for cacheable VMA for all cases, whether
    pre-FWB hardware or not.

This would be how the patch would look.

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 339194441a25..979668d475bd 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1516,8 +1516,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
        int ret = 0;
        bool write_fault, writable, force_pte = false;
-       bool exec_fault, mte_allowed, is_vma_cacheable;
-       bool device = false, vfio_allow_any_uc = false;
+       bool exec_fault, mte_allowed, is_vma_cacheable, cacheable_pfnmap = false;
+       bool s2_noncacheable = false, vfio_allow_any_uc = false;
        unsigned long mmu_seq;
        phys_addr_t ipa = fault_ipa;
        struct kvm *kvm = vcpu->kvm;
@@ -1660,6 +1660,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

        is_vma_cacheable = kvm_vma_is_cacheable(vma);

+       if (vma->vm_flags & VM_PFNMAP) {
+               /* Reject COW VM_PFNMAP */
+               if (is_cow_mapping(vma->vm_flags))
+                       return -EINVAL;
+
+               if (is_vma_cacheable)
+                       cacheable_pfnmap = true;
+       }
+
        /* Don't use the VMA after the unlock -- it may have vanished */
        vma = NULL;

@@ -1684,8 +1693,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
                return -EFAULT;

        if (kvm_is_device_pfn(pfn)) {
-               if (is_vma_cacheable)
-                       return -EINVAL;
+               /*
+                * When FWB is unsupported KVM needs to do cache flushes
+                * (via dcache_clean_inval_poc()) of the underlying memory. This is
+                * only possible if the memory is already mapped into the kernel map.
+                *
+                * Outright reject as the cacheable device memory is not present in
+                * the kernel map and not suitable for cache management.
+                */
+               if (cacheable_pfnmap && !kvm_arch_supports_cacheable_pfnmap())
+                       return -EFAULT;

                /*
                 * If the page was identified as device early by looking at
@@ -1696,8 +1713,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
                 *
                 * In both cases, we don't let transparent_hugepage_adjust()
                 * change things at the last minute.
+                *
+                * Allow S2 to be mapped cacheable for PFNMAP device memory
+                * marked as cacheable in VMA. Note that such mapping is safe
+                * as the KVM S2 will have the same Normal memory type as the
+                * VMA has in the S1.
                 */
-               device = true;
+               if (!cacheable_pfnmap)
+                       s2_noncacheable = true;
        } else if (logging_active && !write_fault) {
                /*
                 * Only actually map the page as writable if this was a write
@@ -1706,7 +1729,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
                writable = false;
        }

-       if (exec_fault && device)
+       /*
+        * Prohibit a region to be mapped non cacheable in S2 and marked as
+        * cacheabled in the userspace VMA. Such mismatched mapping is a
+        * security risk.
+        */
+       if (is_vma_cacheable && s2_noncacheable)
+               return -EINVAL;
+
+       if (exec_fault && s2_noncacheable)
                return -ENOEXEC;

        /*
@@ -1739,7 +1770,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
         * If we are not forced to use page mapping, check if we are
         * backed by a THP and thus use block mapping if possible.
         */
-       if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+       if (vma_pagesize == PAGE_SIZE && !(force_pte || s2_noncacheable)) {
                if (fault_is_perm && fault_granule > PAGE_SIZE)
                        vma_pagesize = fault_granule;
                else
@@ -1753,7 +1784,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
                }
        }

-       if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
+       if (!fault_is_perm && !s2_noncacheable && kvm_has_mte(kvm)) {
                /* Check the VMM hasn't introduced a new disallowed VMA */
                if (mte_allowed) {
                        sanitise_mte_tags(kvm, pfn, vma_pagesize);
@@ -1769,7 +1800,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        if (exec_fault)
                prot |= KVM_PGTABLE_PROT_X;

-       if (device) {
+       if (s2_noncacheable) {
                if (vfio_allow_any_uc)
                        prot |= KVM_PGTABLE_PROT_NORMAL_NC;
                else
@@ -2266,8 +2297,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
                                break;
                        }

-                       /* Cacheable PFNMAP is not allowed */
-                       if (kvm_vma_is_cacheable(vma)) {
+                       /*
+                        * Cacheable PFNMAP is allowed only if the hardware
+                        * supports it.
+                        */
+                       if (kvm_vma_is_cacheable(vma) &&
+                           !kvm_arch_supports_cacheable_pfnmap()) {
                                ret = -EINVAL;
                                break;
                        }



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-19 12:14       ` Ankit Agrawal
@ 2025-06-19 14:16         ` Jason Gunthorpe
  2025-06-19 16:03         ` Donald Dutile
  1 sibling, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-19 14:16 UTC (permalink / raw)
  To: Ankit Agrawal
  Cc: Catalin Marinas, maz@kernel.org, oliver.upton@linux.dev,
	joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com,
	will@kernel.org, ryan.roberts@arm.com, shahuang@redhat.com,
	lpieralisi@kernel.org, david@redhat.com, ddutile@redhat.com,
	seanjc@google.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Krishnakant Jaju, Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid,
	Alistair Popple, John Hubbard, Dan Williams, Zhi Wang, Matt Ochs,
	Uday Dhoke, Dheeraj Nigam, alex.williamson@redhat.com,
	sebastianene@google.com, coltonlewis@google.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org,
	akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org,
	tabba@google.com, qperret@google.com, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn

On Thu, Jun 19, 2025 at 12:14:38PM +0000, Ankit Agrawal wrote:
> >> > -           disable_cmo = true;
> >> > +           if (!is_vma_cacheable)
> >> > +                   disable_cmo = true;
> >>
> >> I'm tempted to stick to the 'device' variable name. Or something like
> >> s2_noncacheable. As I commented, it's not just about disabling CMOs.
> >
> > I think it would be clearer to have two concepts/variable then because
> > the cases where it is really about preventing cachable access to
> > prevent aborts are not linked to the logic that checks pfn valid. We
> > have to detect those cases separately (through the VMA flags was it?).
> >
> > Having these two things together is IMHO confusing..
> >
> > Jason
> 
> Thanks Catalin and Jason for the comments.
> 
> Considering the feedback, I think we may do the following here:
> 1. Rename the device variable to S2_noncacheable to represent if the S2
>     is going to be marked non cacheable. Otherwise S2 will be mapped
>     NORMAL.

How about "s2_force_noncachable" for extra clarity what is going on.

> 2. Detect what PFN has to be marked S2_noncacheable. If a PFN is not in the
>     kernel map, mark as S2 except for PFNMAP + VMA cacheable.
> 3. Prohibit cacheable PFNMAP if hardware doesn't support FWB and CACHE DIC.
> 4. Prohibit S2 non cached mapping for cacheable VMA for all cases, whether
>     pre-FWB hardware or not.

Logic sounds right
 
> This would be how the patch would look.
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 339194441a25..979668d475bd 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1516,8 +1516,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  {
>         int ret = 0;
>         bool write_fault, writable, force_pte = false;
> -       bool exec_fault, mte_allowed, is_vma_cacheable;
> -       bool device = false, vfio_allow_any_uc = false;
> +       bool exec_fault, mte_allowed, is_vma_cacheable, cacheable_pfnmap = false;
> +       bool s2_noncacheable = false, vfio_allow_any_uc = false;
>         unsigned long mmu_seq;
>         phys_addr_t ipa = fault_ipa;
>         struct kvm *kvm = vcpu->kvm;
> @@ -1660,6 +1660,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> 
>         is_vma_cacheable = kvm_vma_is_cacheable(vma);
> 
> +       if (vma->vm_flags & VM_PFNMAP) {
> +               /* Reject COW VM_PFNMAP */
> +               if (is_cow_mapping(vma->vm_flags))
> +                       return -EINVAL;

The comment should explain why we have to reject COW PFNMAP, it is
obvious that is what the code does.

> +
> +               if (is_vma_cacheable)
> +                       cacheable_pfnmap = true;
> +       }
> +
>         /* Don't use the VMA after the unlock -- it may have vanished */
>         vma = NULL;
> 
> @@ -1684,8 +1693,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 return -EFAULT;
> 
>         if (kvm_is_device_pfn(pfn)) {

We are changing this to !pfn_is_map_memory() ?

We should really only call pfn_is_map_memory() if VM_PFNMAP or
VM_MIXEDMAP, otherwise the VMA has only struct pages in it.

Can it look more like this?

if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory()) {
   /* the memory is non-struct page memory, it cannot be cache flushed
       and may be unsafe to be accessed as cachable */

       if (cachable_pfnmap) {
           /* the VMA owner has said the physical address is safe for cachable
              access. When FWB ..... */
	   if (!kvm_arch_supports_cacheable_pfnmap())
	       return -EFAULT;
	   /* Cannot degrade cachable to non cachable */
	   if (s2_force_noncachable)
	   	   return -EINVAL;
       } else {
           /* Assume the address is unsafe for cachable access */
	   s2_force_noncachable = true;
      }
}
/* nothing beyond here writes to s2_forcE_noncachable? */

Jason


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-19 12:14       ` Ankit Agrawal
  2025-06-19 14:16         ` Jason Gunthorpe
@ 2025-06-19 16:03         ` Donald Dutile
  2025-06-19 16:46           ` Ankit Agrawal
  1 sibling, 1 reply; 18+ messages in thread
From: Donald Dutile @ 2025-06-19 16:03 UTC (permalink / raw)
  To: Ankit Agrawal, Jason Gunthorpe, Catalin Marinas
  Cc: maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
	ryan.roberts@arm.com, shahuang@redhat.com, lpieralisi@kernel.org,
	david@redhat.com, seanjc@google.com, Aniket Agashe, Neo Jia,
	Kirti Wankhede, Krishnakant Jaju, Tarun Gupta (SW-GPU),
	Vikram Sethi, Andy Currid, Alistair Popple, John Hubbard,
	Dan Williams, Zhi Wang, Matt Ochs, Uday Dhoke, Dheeraj Nigam,
	alex.williamson@redhat.com, sebastianene@google.com,
	coltonlewis@google.com, kevin.tian@intel.com, yi.l.liu@intel.com,
	ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com,
	linux-mm@kvack.org, tabba@google.com, qperret@google.com,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn



On 6/19/25 8:14 AM, Ankit Agrawal wrote:
> Considering the feedback, I think we may do the following here: 
> 1. Rename the device variable to S2_noncacheable to represent if the S2 is going to be marked non cacheable. Otherwise S2 will be mapped NORMAL. 
> 2. Detect what PFN has to be marked S2_noncacheable. If a PFN is not in the kernel map, mark as S2 except for PFNMAP + VMA cacheable.
   Q: 'mark as S2 except'... should be 'mark as S2_noncacheable' ?
  
> 3. Prohibit cacheable PFNMAP if hardware doesn't support FWB and CACHE DIC. 
> 4. Prohibit S2 non cached mapping for cacheable VMA for all cases, whether pre-FWB hardware or not.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  2025-06-19 16:03         ` Donald Dutile
@ 2025-06-19 16:46           ` Ankit Agrawal
  0 siblings, 0 replies; 18+ messages in thread
From: Ankit Agrawal @ 2025-06-19 16:46 UTC (permalink / raw)
  To: Donald Dutile, Jason Gunthorpe, Catalin Marinas
  Cc: maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
	ryan.roberts@arm.com, shahuang@redhat.com, lpieralisi@kernel.org,
	david@redhat.com, seanjc@google.com, Aniket Agashe, Neo Jia,
	Kirti Wankhede, Krishnakant Jaju, Tarun Gupta (SW-GPU),
	Vikram Sethi, Andy Currid, Alistair Popple, John Hubbard,
	Dan Williams, Zhi Wang, Matt Ochs, Uday Dhoke, Dheeraj Nigam,
	alex.williamson@redhat.com, sebastianene@google.com,
	coltonlewis@google.com, kevin.tian@intel.com, yi.l.liu@intel.com,
	ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com,
	linux-mm@kvack.org, tabba@google.com, qperret@google.com,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn

>> Considering the feedback, I think we may do the following here:
>> 1. Rename the device variable to S2_noncacheable to represent if the S2 is
>> going to be marked non cacheable. Otherwise S2 will be >mapped NORMAL.
>> 2. Detect what PFN has to be marked S2_noncacheable. If a PFN is not in the
>> kernel map, mark as S2 except for PFNMAP + VMA cacheable.
>   Q: 'mark as S2 except'... should be 'mark as S2_noncacheable' ?

Yeah, that's correct. Thanks for catching that.

"Detect what PFN has to be marked S2_noncacheable. If a PFN is not in the
kernel map, mark as S2_noncacheable except for PFNMAP + VMA cacheable."

> 3. Prohibit cacheable PFNMAP if hardware doesn't support FWB and CACHE DIC.
> 4. Prohibit S2 non cached mapping for cacheable VMA for all cases, whether
> pre-FWB hardware or not.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-19 16:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  6:55 [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
2025-06-18  6:55 ` [PATCH v7 1/5] KVM: arm64: Rename symbols to reflect whether CMO may be used ankita
2025-06-18 14:28   ` Catalin Marinas
2025-06-18 14:35   ` Catalin Marinas
2025-06-19  2:22     ` Ankit Agrawal
2025-06-18  6:55 ` [PATCH v7 2/5] KVM: arm64: Block cacheable PFNMAP mapping ankita
2025-06-18 15:46   ` Catalin Marinas
2025-06-19  2:21     ` Ankit Agrawal
2025-06-18  6:55 ` [PATCH v7 3/5] KVM: arm64: New function to determine hardware cache management support ankita
2025-06-18 16:12   ` Catalin Marinas
2025-06-18  6:55 ` [PATCH v7 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
2025-06-18 16:34   ` Catalin Marinas
2025-06-18 16:38     ` Jason Gunthorpe
2025-06-19 12:14       ` Ankit Agrawal
2025-06-19 14:16         ` Jason Gunthorpe
2025-06-19 16:03         ` Donald Dutile
2025-06-19 16:46           ` Ankit Agrawal
2025-06-18  6:55 ` [PATCH v7 5/5] KVM: arm64: Expose new KVM cap for cacheable PFNMAP ankita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).