linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Add support for NoTagAccess memory attribute
@ 2025-01-10 11:00 Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 1/7] arm64: Update the values to binary from hex Aneesh Kumar K.V (Arm)
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

A VMM allows assigning different types of memory regions to the guest and not
all memory regions support storing allocation tags. Currently, the kernel
doesn't allow enabling the MTE feature in the guest if any of the assigned
memory regions don't allow MTE. This prevents the usage of MTE in the guest even
though the guest will never use these memory regions as allocation-tagged
memory.

One example of such a configuration is a VFIO passthrough-enabled guest.
Enabling MTE with such config results in failure as shown below(kvmtool VMM)

[  617.921030] vfio-pci 0000:01:00.0: resetting
[  618.024719] vfio-pci 0000:01:00.0: reset done
  Error: 0000:01:00.0: failed to register region with KVM
  Warning: [0abc:aced] Error activating emulation for BAR 0
  Error: 0000:01:00.0: failed to configure regions
  Warning: Failed init: vfio__init

  Fatal: Initialisation failed

This patch series provides a way to enable MTE in such configs. Even though
NoTagAccess can only be used with cacheable mapping, having the ability to map
both MTE capable and non-capable VMAs allow us to support VFIO with the MTE
capability enabled.

Also, there is a possibility of cachable device memory. That memory, if exposed
to guest as WB and the guest enables MTE, may trigger SErrors (Arm ARM mention
this as "[RBCFRK] The result of an access to an Allocation Tag where Allocation
Tag storage is not provided is IMPLEMENTATION DEFINED"). With FEAT_MTE_PERM KVM
could trap and do necessary corrective action.

Another use case is virtiofs dax support, which can use a page cache region as a
virtio-shm region. We can use MTE_PERM to enable MTE in this config.

In summary, different types of memory, whether WB MMIO or RAM presented as
virtio-(pmem, shm etc.) backed by files in the VMM, the VM should be aware it is
not standard RAM and should not attempt to enable MTE on it. If it does (either
by mistake or malice), FEAT_MTE_PERM will trap and cause a VM exit with
KVM_VM_EXIT_MEMORY_FAULT as the reason.

A VM exit with exit reason KVM_VM_EXIT_MEMORY_FAULT, gives sufficient
flexibility for any future fault-handling schemes we come up with (One possible
use case is to allocate additional allocation tag space for schemes that want to
use this for supporting smaller allocation tag pool and retry the guest
instruction again). For the current implementation where we expect the guest to
be terminated, this can also be achieved from within the hypervisor.

Implementation notes:
For non-MTE-allowed memory, the hypervisor will install stage2 translation with
NoTagAccess memory attributes. Guest access of allocation tags with these memory
regions will result in a VM Exit. One detail to note here is that NoTagAccess
memory attribute can only be applied to Normal cacheable memory ie, using the
attribute value of MemAttr[3:0] = 0b0100 implies Normal, NoTagAccess, write-back
cacheable memory region. No other memory attribute value will trap the
allocation tag access.

Migration notes:
The feature is only exposed to an EL2 guest only if it is capable of nested
virtualization. Otherwise, a read of ID_AA64PFR2_EL1_FPMR will not expose
MTE_PERM feature. We also want to make sure that an EL2 guest using this feature
as part of its stage2 translation can only migrate to a target that supports the
feature in the hardware. This is achieved by using KVM_CAP_ARM_MTE_PERM.

Nested virtualization notes:
This being a stage2 translation attribute, it is exposed to EL2 guest only if it
is capable of a VirtualEL2 state. When an EL1 guest is started with MTE_PERM
capability enabled, the EL2 hypervisor will look at the EL1 stage2 tables to
determine whether a NoTagAccess attribute needs to be inserted in the shadow
stage2 table at EL2. (Limitation, upstream nested virt support disables MTE in
EL1 guest and in a similar fashion, we don't allow MTE_PERM with EL1 guest for
now. This also mean we can drop the patch "KVM: arm64: MTE: Nested guest
support" because the feature is only used by EL2 guest for now.)

Changes from v1:
* Add KVM_CAP_ARM_MTE_PERM to handle migration.
* Add handling of NoTagAccess with Nested guest stage2.
* Add changes to split some of the kvm_pgtable_prot bits.


Aneesh Kumar K.V (Arm) (7):
  arm64: Update the values to binary from hex
  KVM: arm64: MTE: Update code comments
  arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM)
    feature
  KVM: arm64: MTE: Add KVM_CAP_ARM_MTE_PERM
  KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  KVM: arm64: MTE: Nested guest support
  KVM: arm64: Split some of the kvm_pgtable_prot bits into separate
    defines

 Documentation/virt/kvm/api.rst        | 17 ++++++++++
 arch/arm64/include/asm/cpufeature.h   |  5 +++
 arch/arm64/include/asm/kvm_emulate.h  |  5 +++
 arch/arm64/include/asm/kvm_host.h     |  7 +++++
 arch/arm64/include/asm/kvm_nested.h   | 10 ++++++
 arch/arm64/include/asm/kvm_pgtable.h  |  9 ++++--
 arch/arm64/include/asm/memory.h       | 14 +++++----
 arch/arm64/kernel/cpufeature.c        |  9 ++++++
 arch/arm64/kvm/arm.c                  | 11 +++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          | 43 +++++++++++++++----------
 arch/arm64/kvm/mmu.c                  | 45 ++++++++++++++++++++-------
 arch/arm64/kvm/nested.c               | 28 +++++++++++++++++
 arch/arm64/kvm/sys_regs.c             | 15 ++++++---
 arch/arm64/tools/cpucaps              |  1 +
 include/linux/kvm_host.h              | 10 ++++++
 include/uapi/linux/kvm.h              |  2 ++
 17 files changed, 191 insertions(+), 42 deletions(-)


base-commit: 56e6a3499e14716b9a28a307bb6d18c10e95301e
-- 
2.43.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/7] arm64: Update the values to binary from hex
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 13:11   ` Catalin Marinas
  2025-01-10 11:00 ` [PATCH v2 2/7] KVM: arm64: MTE: Update code comments Aneesh Kumar K.V (Arm)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

This matches the ARM ARM representation. No functional change in this
patch.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/memory.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 8b9f33cf561b..cb244668954c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -178,17 +178,17 @@
 /*
  * Memory types for Stage-2 translation
  */
-#define MT_S2_NORMAL		0xf
-#define MT_S2_NORMAL_NC		0x5
-#define MT_S2_DEVICE_nGnRE	0x1
+#define MT_S2_NORMAL			0b1111
+#define MT_S2_NORMAL_NC			0b0101
+#define MT_S2_DEVICE_nGnRE		0b0001
 
 /*
  * Memory types for Stage-2 translation when ID_AA64MMFR2_EL1.FWB is 0001
  * Stage-2 enforces Normal-WB and Device-nGnRE
  */
-#define MT_S2_FWB_NORMAL	6
-#define MT_S2_FWB_NORMAL_NC	5
-#define MT_S2_FWB_DEVICE_nGnRE	1
+#define MT_S2_FWB_NORMAL		0b0110
+#define MT_S2_FWB_NORMAL_NC		0b0101
+#define MT_S2_FWB_DEVICE_nGnRE		0b0001
 
 #ifdef CONFIG_ARM64_4K_PAGES
 #define IOREMAP_MAX_ORDER	(PUD_SHIFT)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/7] KVM: arm64: MTE: Update code comments
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 1/7] arm64: Update the values to binary from hex Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 13:11   ` Catalin Marinas
  2025-01-10 11:00 ` [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature Aneesh Kumar K.V (Arm)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
initialisation") updated the locking such the kernel now allows
VM_SHARED mapping with MTE. Update the code comment to reflect this.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/kvm/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9d46ad57e52..eb8220a409e1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1391,11 +1391,11 @@ static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
  * able to see the page's tags and therefore they must be initialised first. If
  * PG_mte_tagged is set, tags have already been initialised.
  *
- * The race in the test/set of the PG_mte_tagged flag is handled by:
- * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
- *   racing to santise the same page
- * - mmap_lock protects between a VM faulting a page in and the VMM performing
- *   an mprotect() to add VM_MTE
+ * The race in the test/set of the PG_mte_tagged flag is handled by using
+ * PG_mte_lock and PG_mte_tagged together. if PG_mte_lock is found unset, we can
+ * go ahead and clear the page tags. if PG_mte_lock is found set, then the page
+ * tags are already cleared or there is a parallel tag clearing is going on. We
+ * wait for the parallel tag clear to finish by waiting on PG_mte_tagged bit.
  */
 static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 			      unsigned long size)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 1/7] arm64: Update the values to binary from hex Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 2/7] KVM: arm64: MTE: Update code comments Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 13:15   ` Catalin Marinas
  2025-01-10 11:00 ` [PATCH v2 4/7] KVM: arm64: MTE: Add KVM_CAP_ARM_MTE_PERM Aneesh Kumar K.V (Arm)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

This indicates if the system supports MTE_PERM. This will be used by KVM
for stage 2 mapping. This is a CPUCAP_SYSTEM feature because if we enable
the feature all cpus must have it.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 5 +++++
 arch/arm64/include/asm/memory.h     | 2 ++
 arch/arm64/kernel/cpufeature.c      | 9 +++++++++
 arch/arm64/tools/cpucaps            | 1 +
 4 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8b4e5a3cd24c..d70d60ca1cf7 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -813,6 +813,11 @@ static inline bool system_supports_mte(void)
 	return alternative_has_cap_unlikely(ARM64_MTE);
 }
 
+static inline bool system_supports_notagaccess(void)
+{
+	return alternative_has_cap_unlikely(ARM64_MTE_PERM);
+}
+
 static inline bool system_has_prio_mask_debugging(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) &&
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index cb244668954c..6939e4700a5e 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -179,6 +179,7 @@
  * Memory types for Stage-2 translation
  */
 #define MT_S2_NORMAL			0b1111
+#define MT_S2_NORMAL_NOTAGACCESS	0b0100
 #define MT_S2_NORMAL_NC			0b0101
 #define MT_S2_DEVICE_nGnRE		0b0001
 
@@ -187,6 +188,7 @@
  * Stage-2 enforces Normal-WB and Device-nGnRE
  */
 #define MT_S2_FWB_NORMAL		0b0110
+#define MT_S2_FWB_NORMAL_NOTAGACCESS	0b1110
 #define MT_S2_FWB_NORMAL_NC		0b0101
 #define MT_S2_FWB_DEVICE_nGnRE		0b0001
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ce71f444ed8..c9cd0735aaf5 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -309,6 +309,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr2[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_FPMR_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_MTEPERM_SHIFT, 4, 0),
 	ARM64_FTR_END,
 };
 
@@ -2818,6 +2819,14 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, MTE, MTE3)
 	},
+	{
+		.desc = "MTE Allocation Tag Access Permission",
+		.capability = ARM64_MTE_PERM,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, MTEPERM, IMP)
+	},
+
 #endif /* CONFIG_ARM64_MTE */
 	{
 		.desc = "RCpc load-acquire (LDAPR)",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index eb17f59e543c..10e01cf8ad96 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -66,6 +66,7 @@ MPAM
 MPAM_HCR
 MTE
 MTE_ASYMM
+MTE_PERM
 SME
 SME_FA64
 SME2
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/7] KVM: arm64: MTE: Add KVM_CAP_ARM_MTE_PERM
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
                   ` (2 preceding siblings ...)
  2025-01-10 11:00 ` [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported Aneesh Kumar K.V (Arm)
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

This will be used by VMM to enable the usage of NoTagAccess memory
attribute while mapping pages not supporting allocating tags to guest
IPA.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 Documentation/virt/kvm/api.rst    | 14 ++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  7 +++++++
 arch/arm64/kvm/arm.c              | 11 +++++++++++
 include/uapi/linux/kvm.h          |  1 +
 4 files changed, 33 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 454c2aaa155e..e954fca76c27 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -9017,6 +9017,20 @@ Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
 production.  The behavior and effective ABI for software-protected VMs is
 unstable.
 
+8.42 KVM_CAP_ARM_MTE_PERM
+------------------------
+
+:Capability: KVM_CAP_ARM_MTE_PERM
+:Architectures: arm64
+:Type: vm
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
+that the kernel has support for mapping memory regions not supporting
+allocations tags into a guest which enables KVM_CAP_ARM_MTE capability.
+
+In order to use this, it has to be activated by setting this capability via
+KVM_ENABLE_CAP ioctl on the VM fd.
+
 9. Known KVM API problems
 =========================
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e18e9244d17a..ad2b488b99d5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -331,6 +331,9 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_ID_REGS_INITIALIZED		7
 	/* Fine-Grained UNDEF initialised */
 #define KVM_ARCH_FLAG_FGU_INITIALIZED			8
+	/* Memory Tagging Extension NoTagAccess check enabled for the guest */
+#define KVM_ARCH_FLAG_MTE_PERM_ENABLED			9
+
 	unsigned long flags;
 
 	/* VM-wide vCPU feature set */
@@ -1417,6 +1420,10 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_vm_has_ran_once(kvm)					\
 	(test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &(kvm)->arch.flags))
 
+#define kvm_has_mte_perm(kvm)					\
+	(system_supports_notagaccess() &&				\
+	 test_bit(KVM_ARCH_FLAG_MTE_PERM_ENABLED, &(kvm)->arch.flags))
+
 static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
 {
 	return test_bit(feature, ka->vcpu_features);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a102c3aebdbc..fdcd2c1605d5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -150,6 +150,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->slots_lock);
 		break;
+	case KVM_CAP_ARM_MTE_PERM:
+		mutex_lock(&kvm->lock);
+		if (system_supports_notagaccess() && !kvm->created_vcpus) {
+			r = 0;
+			set_bit(KVM_ARCH_FLAG_MTE_PERM_ENABLED, &kvm->arch.flags);
+		}
+		mutex_unlock(&kvm->lock);
+		break;
 	default:
 		break;
 	}
@@ -418,6 +426,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
 		r = BIT(0);
 		break;
+	case KVM_CAP_ARM_MTE_PERM:
+		r = system_supports_notagaccess();
+		break;
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 502ea63b5d2e..4900ff577819 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -933,6 +933,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_ARM_MTE_PERM 239
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
                   ` (3 preceding siblings ...)
  2025-01-10 11:00 ` [PATCH v2 4/7] KVM: arm64: MTE: Add KVM_CAP_ARM_MTE_PERM Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 18:20   ` Catalin Marinas
  2025-01-10 11:00 ` [PATCH v2 6/7] KVM: arm64: MTE: Nested guest support Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 7/7] KVM: arm64: Split some of the kvm_pgtable_prot bits into separate defines Aneesh Kumar K.V (Arm)
  6 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

Currently, the kernel won't start a guest if the MTE feature is enabled
and the guest RAM is backed by memory which doesn't support access tags.
Update this such that the kernel uses the NoTagAccess memory attribute
while mapping pages from VMAs for which MTE is not allowed. The fault
from accessing the access tags with such pages is forwarded to VMM so
that VMM can decide to kill the guest or take any corrective actions

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 Documentation/virt/kvm/api.rst       |  3 +++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_pgtable.h |  1 +
 arch/arm64/kvm/hyp/pgtable.c         | 16 +++++++++++++---
 arch/arm64/kvm/mmu.c                 | 17 ++++++++++++++---
 include/linux/kvm_host.h             | 10 ++++++++++
 include/uapi/linux/kvm.h             |  1 +
 7 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e954fca76c27..3b357f9b76d6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7115,6 +7115,9 @@ describes properties of the faulting access that are likely pertinent:
  - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
    on a private memory access.  When clear, indicates the fault occurred on a
    shared access.
+ - KVM_MEMORY_EXIT_FLAG_NOTAGACCESS - When set, indicates the memory fault
+   occurred due to allocation tag access on a memory region that doesn't support
+   allocation tags.
 
 Note!  KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it
 accompanies a return code of '-1', not '0'!  errno will always be set to EFAULT
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index cf811009a33c..609ed6a5ffce 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -378,6 +378,11 @@ static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
 	return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
 }
 
+static inline bool kvm_vcpu_trap_is_tagaccess(const struct kvm_vcpu *vcpu)
+{
+	return !!(ESR_ELx_ISS2(kvm_vcpu_get_esr(vcpu)) & ESR_ELx_TagAccess);
+}
+
 static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index aab04097b505..0daf4ffedc99 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -252,6 +252,7 @@ enum kvm_pgtable_prot {
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
 	KVM_PGTABLE_PROT_NORMAL_NC		= BIT(4),
+	KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS	= BIT(5),
 
 	KVM_PGTABLE_PROT_SW0			= BIT(55),
 	KVM_PGTABLE_PROT_SW1			= BIT(56),
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 40bd55966540..4eb6e9345c12 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -677,9 +677,11 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 {
 	kvm_pte_t attr;
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
+	unsigned long prot_mask = KVM_PGTABLE_PROT_DEVICE |
+				  KVM_PGTABLE_PROT_NORMAL_NC |
+				  KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS;
 
-	switch (prot & (KVM_PGTABLE_PROT_DEVICE |
-			KVM_PGTABLE_PROT_NORMAL_NC)) {
+	switch (prot & prot_mask) {
 	case KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC:
 		return -EINVAL;
 	case KVM_PGTABLE_PROT_DEVICE:
@@ -692,6 +694,12 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 			return -EINVAL;
 		attr = KVM_S2_MEMATTR(pgt, NORMAL_NC);
 		break;
+	case KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS:
+		if (system_supports_notagaccess())
+			attr = KVM_S2_MEMATTR(pgt, NORMAL_NOTAGACCESS);
+		else
+			return -EINVAL;
+		break;
 	default:
 		attr = KVM_S2_MEMATTR(pgt, NORMAL);
 	}
@@ -872,7 +880,9 @@ static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
 	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
-	return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
+	return kvm_pte_valid(pte) &&
+	       ((memattr == KVM_S2_MEMATTR(pgt, NORMAL)) ||
+		(memattr == KVM_S2_MEMATTR(pgt, NORMAL_NOTAGACCESS)));
 }
 
 static bool stage2_pte_executable(kvm_pte_t pte)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index eb8220a409e1..3610bea7607d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1660,9 +1660,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-		if (mte_allowed) {
+		if (mte_allowed)
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
-		} else {
+		else if (kvm_has_mte_perm(kvm))
+			prot |= KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS;
+		else {
 			ret = -EFAULT;
 			goto out_unlock;
 		}
@@ -1840,6 +1842,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+
+	if (kvm_vcpu_trap_is_tagaccess(vcpu)) {
+		/* exit to host and handle the error */
+		kvm_prepare_notagaccess_exit(vcpu, gfn << PAGE_SHIFT, PAGE_SIZE);
+		ret = 0;
+		goto out;
+	}
+
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
@@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		if (!vma)
 			break;
 
-		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
+		if (kvm_has_mte(kvm) &&
+		    !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
 			ret = -EINVAL;
 			break;
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 401439bb21e3..8a270f658f36 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2471,6 +2471,16 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
 }
 
+static inline void kvm_prepare_notagaccess_exit(struct kvm_vcpu *vcpu,
+						 gpa_t gpa, gpa_t size)
+{
+	vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
+	vcpu->run->memory_fault.flags = KVM_MEMORY_EXIT_FLAG_NOTAGACCESS;
+	vcpu->run->memory_fault.gpa = gpa;
+	vcpu->run->memory_fault.size = size;
+}
+
+
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4900ff577819..7136d28eb307 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -442,6 +442,7 @@ struct kvm_run {
 		/* KVM_EXIT_MEMORY_FAULT */
 		struct {
 #define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
+#define KVM_MEMORY_EXIT_FLAG_NOTAGACCESS (1ULL << 4)
 			__u64 flags;
 			__u64 gpa;
 			__u64 size;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/7] KVM: arm64: MTE: Nested guest support
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
                   ` (4 preceding siblings ...)
  2025-01-10 11:00 ` [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  2025-01-10 11:00 ` [PATCH v2 7/7] KVM: arm64: Split some of the kvm_pgtable_prot bits into separate defines Aneesh Kumar K.V (Arm)
  6 siblings, 0 replies; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

Currently MTE feature is not enabled for a EL1 guest and similarly we
disable MTE_PERM. But we do add code in this patch to allow using
KVM_CAP_ARM_MTE_PERM with an EL1 guest. This will allow the usage of MTE
in nested guest even if some of the memory backing nested guest RAM is
not MTE capable (ex: page cache pages).

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 10 ++++++++++
 arch/arm64/kvm/mmu.c                | 10 ++++++++++
 arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 15 +++++++++++----
 4 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 233e65522716..4d6c0df3ef48 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -86,6 +86,8 @@ struct kvm_s2_trans {
 	bool writable;
 	bool readable;
 	int level;
+	int s2_fwb;
+	int mem_attr;
 	u32 esr;
 	u64 desc;
 };
@@ -120,10 +122,18 @@ static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
 	return !(trans->desc & BIT(54));
 }
 
+static inline bool kvm_s2_trans_tagaccess(struct kvm_s2_trans *trans)
+{
+	if (trans->s2_fwb)
+		return (trans->mem_attr & MT_S2_FWB_NORMAL_NOTAGACCESS) != MT_S2_FWB_NORMAL_NOTAGACCESS;
+	return (trans->mem_attr & MT_S2_NORMAL_NOTAGACCESS) != MT_S2_NORMAL_NOTAGACCESS;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
+int kvm_s2_handle_notagaccess_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
 extern void kvm_nested_s2_wp(struct kvm *kvm);
 extern void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3610bea7607d..54e5bfe4f126 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1640,6 +1640,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		goto out_unlock;
 	}
 
+	if (nested && !kvm_s2_trans_tagaccess(nested))
+		mte_allowed = false;
+
 	/*
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
@@ -1836,6 +1839,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 			goto out_unlock;
 		}
 
+		ret = kvm_s2_handle_notagaccess_fault(vcpu, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
 		ipa = kvm_s2_trans_output(&nested_trans);
 		nested = &nested_trans;
 	}
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 9b36218b48de..5867e0376444 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -290,6 +290,7 @@ static int walk_nested_s2_pgd(phys_addr_t ipa,
 	out->writable = desc & (0b10 << 6);
 	out->level = level;
 	out->desc = desc;
+	out->mem_attr = desc & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
 	return 0;
 }
 
@@ -340,6 +341,7 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 
 	wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_ELx_EE;
 
+	result->s2_fwb = !!(*vcpu_hcr(vcpu) & HCR_FWB);
 	ret = walk_nested_s2_pgd(gipa, &wi, result);
 	if (ret)
 		result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
@@ -733,6 +735,27 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 	return forward_fault;
 }
 
+int kvm_s2_handle_notagaccess_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (!kvm_vcpu_trap_is_tagaccess(vcpu))
+		return 0;
+
+	if (!kvm_s2_trans_tagaccess(trans))
+		forward_fault = true;
+	else
+		forward_fault = false;
+
+	/* forward it as a permission fault with tag access set in ISS2 */
+	if (forward_fault)
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+
+	return forward_fault;
+}
+
 int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 {
 	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
@@ -844,6 +867,11 @@ static void limit_nv_id_regs(struct kvm *kvm)
 		NV_FTR(PFR1, CSV2_frac));
 	kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR1_EL1, val);
 
+	/* For now no MTE_PERM support because MTE is disabled above */
+	val = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR2_EL1);
+	val &= ~NV_FTR(PFR2, MTEPERM);
+	kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR2_EL1, val);
+
 	/* Hide ECV, ExS, Secure Memory */
 	val = kvm_read_vm_id_reg(kvm, SYS_ID_AA64MMFR0_EL1);
 	val &= ~(NV_FTR(MMFR0, ECV)		|
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e2a5c2918d9e..cb7d4d32179c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1557,7 +1557,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
 				       const struct sys_reg_desc *r)
 {
 	u32 id = reg_to_encoding(r);
-	u64 val;
+	u64 val, mask;
 
 	if (sysreg_visible_as_raz(vcpu, r))
 		return 0;
@@ -1587,8 +1587,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
 		val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac);
 		break;
 	case SYS_ID_AA64PFR2_EL1:
-		/* We only expose FPMR */
-		val &= ID_AA64PFR2_EL1_FPMR;
+		mask = ID_AA64PFR2_EL1_FPMR;
+		/*
+		 * Since this is a stage-2 specific feature, only pass
+		 * if vcpu can run in vEL2
+		 */
+		if (vcpu_has_nv(vcpu))
+			mask |= ID_AA64PFR2_EL1_MTEPERM;
+		val &= mask;
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
@@ -2566,7 +2572,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 				       ID_AA64PFR1_EL1_MPAM_frac |
 				       ID_AA64PFR1_EL1_RAS_frac |
 				       ID_AA64PFR1_EL1_MTE)),
-	ID_WRITABLE(ID_AA64PFR2_EL1, ID_AA64PFR2_EL1_FPMR),
+	ID_WRITABLE(ID_AA64PFR2_EL1, ID_AA64PFR2_EL1_FPMR |
+					ID_AA64PFR2_EL1_MTEPERM),
 	ID_UNALLOCATED(4,3),
 	ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),
 	ID_HIDDEN(ID_AA64SMFR0_EL1),
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 7/7] KVM: arm64: Split some of the kvm_pgtable_prot bits into separate defines
  2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
                   ` (5 preceding siblings ...)
  2025-01-10 11:00 ` [PATCH v2 6/7] KVM: arm64: MTE: Nested guest support Aneesh Kumar K.V (Arm)
@ 2025-01-10 11:00 ` Aneesh Kumar K.V (Arm)
  6 siblings, 0 replies; 19+ messages in thread
From: Aneesh Kumar K.V (Arm) @ 2025-01-10 11:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm
  Cc: Suzuki K Poulose, Steven Price, Will Deacon, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu,
	Aneesh Kumar K.V (Arm)

Some of the kvm_pgtable_prot values are mutually exclusive, like
KVM_PGTABLE_PROT_NORMAL_NC and KVM_PGTABLE_PROT_DEVICE. This patch
splits the Normal memory non-cacheable and NoTagAccess attributes into
separate #defines. With this change, the kvm_pgtable_prot bits only
indicate whether it is a device or normal memory mapping.

There are no functional changes in this patch.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 10 +++---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          | 47 +++++++++++++--------------
 arch/arm64/kvm/mmu.c                  | 10 +++---
 4 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 0daf4ffedc99..9443a8ad9343 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -239,7 +239,6 @@ enum kvm_pgtable_stage2_flags {
  * @KVM_PGTABLE_PROT_W:		Write permission.
  * @KVM_PGTABLE_PROT_R:		Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
- * @KVM_PGTABLE_PROT_NORMAL_NC:	Normal noncacheable attributes.
  * @KVM_PGTABLE_PROT_SW0:	Software bit 0.
  * @KVM_PGTABLE_PROT_SW1:	Software bit 1.
  * @KVM_PGTABLE_PROT_SW2:	Software bit 2.
@@ -251,8 +250,6 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_R			= BIT(2),
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
-	KVM_PGTABLE_PROT_NORMAL_NC		= BIT(4),
-	KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS	= BIT(5),
 
 	KVM_PGTABLE_PROT_SW0			= BIT(55),
 	KVM_PGTABLE_PROT_SW1			= BIT(56),
@@ -263,6 +260,11 @@ enum kvm_pgtable_prot {
 #define KVM_PGTABLE_PROT_RW	(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
 #define KVM_PGTABLE_PROT_RWX	(KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X)
 
+/* different memory attribute requested */
+#define	KVM_PGTABLE_ATTR_NORMAL_NC		0x1
+#define	KVM_PGTABLE_ATTR_NORMAL_NOTAGACCESS	0x2
+
+
 #define PKVM_HOST_MEM_PROT	KVM_PGTABLE_PROT_RWX
 #define PKVM_HOST_MMIO_PROT	KVM_PGTABLE_PROT_RW
 
@@ -606,7 +608,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
-			   u64 phys, enum kvm_pgtable_prot prot,
+			   u64 phys, enum kvm_pgtable_prot prot, int mem_attr,
 			   void *mc, enum kvm_pgtable_walk_flags flags);
 
 /**
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index caba3e4bd09e..25c8b2fbce15 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -411,7 +411,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
 	return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
-				      prot, &host_s2_pool, 0);
+				      prot, 0, &host_s2_pool, 0);
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4eb6e9345c12..9dd93ae8bb97 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -673,35 +673,34 @@ void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
 
 static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
-				kvm_pte_t *ptep)
+				int mem_attr, kvm_pte_t *ptep)
 {
 	kvm_pte_t attr;
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
-	unsigned long prot_mask = KVM_PGTABLE_PROT_DEVICE |
-				  KVM_PGTABLE_PROT_NORMAL_NC |
-				  KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS;
+	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 
-	switch (prot & prot_mask) {
-	case KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC:
-		return -EINVAL;
-	case KVM_PGTABLE_PROT_DEVICE:
+	if (device) {
 		if (prot & KVM_PGTABLE_PROT_X)
 			return -EINVAL;
 		attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRE);
-		break;
-	case KVM_PGTABLE_PROT_NORMAL_NC:
-		if (prot & KVM_PGTABLE_PROT_X)
+		if (!mem_attr)
 			return -EINVAL;
-		attr = KVM_S2_MEMATTR(pgt, NORMAL_NC);
-		break;
-	case KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS:
-		if (system_supports_notagaccess())
-			attr = KVM_S2_MEMATTR(pgt, NORMAL_NOTAGACCESS);
-		else
-			return -EINVAL;
-		break;
-	default:
-		attr = KVM_S2_MEMATTR(pgt, NORMAL);
+	} else {
+		switch (mem_attr) {
+		case KVM_PGTABLE_ATTR_NORMAL_NC:
+			if (prot & KVM_PGTABLE_PROT_X)
+				return -EINVAL;
+			attr = KVM_S2_MEMATTR(pgt, NORMAL_NC);
+			break;
+		case KVM_PGTABLE_ATTR_NORMAL_NOTAGACCESS:
+			if (system_supports_notagaccess())
+				attr = KVM_S2_MEMATTR(pgt, NORMAL_NOTAGACCESS);
+			else
+				return -EINVAL;
+			break;
+		default:
+			attr = KVM_S2_MEMATTR(pgt, NORMAL);
+		}
 	}
 
 	if (!(prot & KVM_PGTABLE_PROT_X))
@@ -1060,7 +1059,7 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 }
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
-			   u64 phys, enum kvm_pgtable_prot prot,
+			   u64 phys, enum kvm_pgtable_prot prot, int mem_attr,
 			   void *mc, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
@@ -1081,7 +1080,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	if (WARN_ON((pgt->flags & KVM_PGTABLE_S2_IDMAP) && (addr != phys)))
 		return -EINVAL;
 
-	ret = stage2_set_prot_attr(pgt, prot, &map_data.attr);
+	ret = stage2_set_prot_attr(pgt, prot, mem_attr, &map_data.attr);
 	if (ret)
 		return ret;
 
@@ -1408,7 +1407,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
 	if (!IS_ALIGNED(phys, kvm_granule_size(level)))
 		return ERR_PTR(-EINVAL);
 
-	ret = stage2_set_prot_attr(pgt, prot, &map_data.attr);
+	ret = stage2_set_prot_attr(pgt, prot, 0, &map_data.attr);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 54e5bfe4f126..87afc8862459 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1130,8 +1130,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			break;
 
 		write_lock(&kvm->mmu_lock);
-		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache, 0);
+		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE,
+					     pa, prot, 0, &cache, 0);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1452,6 +1452,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
+	int normal_memattr = 0;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1666,7 +1667,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (mte_allowed)
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
 		else if (kvm_has_mte_perm(kvm))
-			prot |= KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS;
+			normal_memattr = KVM_PGTABLE_ATTR_NORMAL_NOTAGACCESS;
 		else {
 			ret = -EFAULT;
 			goto out_unlock;
@@ -1681,7 +1682,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (device) {
 		if (vfio_allow_any_uc)
-			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
+			normal_memattr = KVM_PGTABLE_ATTR_NORMAL_NC;
 		else
 			prot |= KVM_PGTABLE_PROT_DEVICE;
 	} else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
@@ -1704,6 +1705,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
+					     normal_memattr,
 					     memcache,
 					     KVM_PGTABLE_WALK_HANDLE_FAULT |
 					     KVM_PGTABLE_WALK_SHARED);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/7] KVM: arm64: MTE: Update code comments
  2025-01-10 11:00 ` [PATCH v2 2/7] KVM: arm64: MTE: Update code comments Aneesh Kumar K.V (Arm)
@ 2025-01-10 13:11   ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2025-01-10 13:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

On Fri, Jan 10, 2025 at 04:30:18PM +0530, Aneesh Kumar K.V (Arm) wrote:
> commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
> initialisation") updated the locking such the kernel now allows
> VM_SHARED mapping with MTE. Update the code comment to reflect this.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9d46ad57e52..eb8220a409e1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1391,11 +1391,11 @@ static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
>   * able to see the page's tags and therefore they must be initialised first. If
>   * PG_mte_tagged is set, tags have already been initialised.
>   *
> - * The race in the test/set of the PG_mte_tagged flag is handled by:
> - * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
> - *   racing to santise the same page
> - * - mmap_lock protects between a VM faulting a page in and the VMM performing
> - *   an mprotect() to add VM_MTE
> + * The race in the test/set of the PG_mte_tagged flag is handled by using
> + * PG_mte_lock and PG_mte_tagged together. if PG_mte_lock is found unset, we can
> + * go ahead and clear the page tags. if PG_mte_lock is found set, then the page
> + * tags are already cleared or there is a parallel tag clearing is going on. We
				  ^^^^^^^^
				  remove this (or the other 'is')


> + * wait for the parallel tag clear to finish by waiting on PG_mte_tagged bit.
>   */

I don't think we need to describe the behaviour of set_page_mte_tagged()
and try_page_mte_tagging() in here. How the locking works for tagged
pages was hidden in those functions with their own documentation. I
would just remove this whole paragraph here, just leave the first one
stating that the tags must be initialised if not already done so.

-- 
Catalin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/7] arm64: Update the values to binary from hex
  2025-01-10 11:00 ` [PATCH v2 1/7] arm64: Update the values to binary from hex Aneesh Kumar K.V (Arm)
@ 2025-01-10 13:11   ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2025-01-10 13:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

On Fri, Jan 10, 2025 at 04:30:17PM +0530, Aneesh Kumar K.V (Arm) wrote:
> This matches the ARM ARM representation. No functional change in this
> patch.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

Fine by me if it's easier to compare with the Arm ARM.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature
  2025-01-10 11:00 ` [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature Aneesh Kumar K.V (Arm)
@ 2025-01-10 13:15   ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2025-01-10 13:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

On Fri, Jan 10, 2025 at 04:30:19PM +0530, Aneesh Kumar K.V (Arm) wrote:
> This indicates if the system supports MTE_PERM. This will be used by KVM
> for stage 2 mapping. This is a CPUCAP_SYSTEM feature because if we enable
> the feature all cpus must have it.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-10 11:00 ` [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported Aneesh Kumar K.V (Arm)
@ 2025-01-10 18:20   ` Catalin Marinas
  2025-01-11 13:19     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-01-10 18:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Currently, the kernel won't start a guest if the MTE feature is enabled
> and the guest RAM is backed by memory which doesn't support access tags.
> Update this such that the kernel uses the NoTagAccess memory attribute
> while mapping pages from VMAs for which MTE is not allowed. The fault
> from accessing the access tags with such pages is forwarded to VMM so
> that VMM can decide to kill the guest or take any corrective actions
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

Mostly nitpicks below (apart from the slot registration). The decision
on whether that's the best approach lies with Oliver/Marc.

> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index cf811009a33c..609ed6a5ffce 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -378,6 +378,11 @@ static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
>  }
>  
> +static inline bool kvm_vcpu_trap_is_tagaccess(const struct kvm_vcpu *vcpu)
> +{
> +	return !!(ESR_ELx_ISS2(kvm_vcpu_get_esr(vcpu)) & ESR_ELx_TagAccess);
> +}

The function type is already bool, no need for the "!!".

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index eb8220a409e1..3610bea7607d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1660,9 +1660,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
>  		/* Check the VMM hasn't introduced a new disallowed VMA */
> -		if (mte_allowed) {
> +		if (mte_allowed)
>  			sanitise_mte_tags(kvm, pfn, vma_pagesize);
> -		} else {
> +		else if (kvm_has_mte_perm(kvm))
> +			prot |= KVM_PGTABLE_PROT_NORMAL_NOTAGACCESS;
> +		else {
>  			ret = -EFAULT;
>  			goto out_unlock;
>  		}

Don't remove the brackets at the end of "if (mte_allowed) {" etc. The
coding style does require them if at least one of the branches is
multi-line (the last "else").

> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  		if (!vma)
>  			break;
>  
> -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> +		if (kvm_has_mte(kvm) &&
> +		    !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
>  			ret = -EINVAL;
>  			break;
>  		}

I don't think we should change this, or at least not how it's done above
(Suzuki raised a related issue internally relaxing this for VM_PFNMAP).

For standard memory slots, we want to reject them upfront rather than
deferring to the fault handler. An example here is file mmap() passed as
standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
I'd only relax this for VM_PFNMAP mappings further down in this
function (and move the VM_PFNMAP check above; see Suzuki's internal
patch, unless he posted it publicly already).

-- 
Catalin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-10 18:20   ` Catalin Marinas
@ 2025-01-11 13:19     ` Aneesh Kumar K.V
  2025-01-13 19:09       ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V @ 2025-01-11 13:19 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> Currently, the kernel won't start a guest if the MTE feature is enabled

...

>> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>  		if (!vma)
>>  			break;
>>  
>> -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
>> +		if (kvm_has_mte(kvm) &&
>> +		    !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
>>  			ret = -EINVAL;
>>  			break;
>>  		}
>
> I don't think we should change this, or at least not how it's done above
> (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
>
> For standard memory slots, we want to reject them upfront rather than
> deferring to the fault handler. An example here is file mmap() passed as
> standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
> I'd only relax this for VM_PFNMAP mappings further down in this
> function (and move the VM_PFNMAP check above; see Suzuki's internal
> patch, unless he posted it publicly already).
>

But we want to handle memslots backed by pagecache pages for virtio-shm
here (virtiofs dax use case). With MTE_PERM, we can essentially skip the
kvm_vma_mte_allowed(vma) check because we handle all types in the fault
handler.


-aneesh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-11 13:19     ` Aneesh Kumar K.V
@ 2025-01-13 19:09       ` Catalin Marinas
  2025-01-13 20:47         ` Peter Collingbourne
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-01-13 19:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu, Peter Collingbourne

On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> > On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> Currently, the kernel won't start a guest if the MTE feature is enabled
> 
> ...
> 
> >> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >>  		if (!vma)
> >>  			break;
> >>  
> >> -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> >> +		if (kvm_has_mte(kvm) &&
> >> +		    !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
> >>  			ret = -EINVAL;
> >>  			break;
> >>  		}
> >
> > I don't think we should change this, or at least not how it's done above
> > (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
> >
> > For standard memory slots, we want to reject them upfront rather than
> > deferring to the fault handler. An example here is file mmap() passed as
> > standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
> > I'd only relax this for VM_PFNMAP mappings further down in this
> > function (and move the VM_PFNMAP check above; see Suzuki's internal
> > patch, unless he posted it publicly already).
> 
> But we want to handle memslots backed by pagecache pages for virtio-shm
> here (virtiofs dax use case).

Ah, I forgot about this use case. So with virtiofs DAX, does a host page
cache page (host VMM mmap()) get mapped directly into the guest as a
separate memory slot? In this case, the host vma would not have
VM_MTE_ALLOWED set.

> With MTE_PERM, we can essentially skip the
> kvm_vma_mte_allowed(vma) check because we handle all types in the fault
> handler.

This was pretty much the early behaviour when we added KVM support for
MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
arm64: unify the tests for VMAs in memslots when MTE is enabled")
changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
subsequent commit removed the VM_SHARED check.

It's a minor ABI change but I'm trying to figure out why we needed this
upfront check rather than simply dropping the VM_SHARED check. Adding
Peter in case he remembers. I can't see any race if we simply skipped
this check altogether, irrespective of FEAT_MTE_PERM.

-- 
Catalin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-13 19:09       ` Catalin Marinas
@ 2025-01-13 20:47         ` Peter Collingbourne
  2025-01-14  9:55           ` Suzuki K Poulose
  2025-01-15 13:15           ` Catalin Marinas
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Collingbourne @ 2025-01-13 20:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Aneesh Kumar K.V, linux-kernel, linux-arm-kernel, kvmarm,
	Suzuki K Poulose, Steven Price, Will Deacon, Marc Zyngier,
	Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu

On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
> > Catalin Marinas <catalin.marinas@arm.com> writes:
> > > On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > >> Currently, the kernel won't start a guest if the MTE feature is enabled
> >
> > ...
> >
> > >> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > >>            if (!vma)
> > >>                    break;
> > >>
> > >> -          if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> > >> +          if (kvm_has_mte(kvm) &&
> > >> +              !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
> > >>                    ret = -EINVAL;
> > >>                    break;
> > >>            }
> > >
> > > I don't think we should change this, or at least not how it's done above
> > > (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
> > >
> > > For standard memory slots, we want to reject them upfront rather than
> > > deferring to the fault handler. An example here is file mmap() passed as
> > > standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
> > > I'd only relax this for VM_PFNMAP mappings further down in this
> > > function (and move the VM_PFNMAP check above; see Suzuki's internal
> > > patch, unless he posted it publicly already).
> >
> > But we want to handle memslots backed by pagecache pages for virtio-shm
> > here (virtiofs dax use case).
>
> Ah, I forgot about this use case. So with virtiofs DAX, does a host page
> cache page (host VMM mmap()) get mapped directly into the guest as a
> separate memory slot? In this case, the host vma would not have
> VM_MTE_ALLOWED set.
>
> > With MTE_PERM, we can essentially skip the
> > kvm_vma_mte_allowed(vma) check because we handle all types in the fault
> > handler.
>
> This was pretty much the early behaviour when we added KVM support for
> MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
> VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
> arm64: unify the tests for VMAs in memslots when MTE is enabled")
> changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
> subsequent commit removed the VM_SHARED check.
>
> It's a minor ABI change but I'm trying to figure out why we needed this
> upfront check rather than simply dropping the VM_SHARED check. Adding
> Peter in case he remembers. I can't see any race if we simply skipped
> this check altogether, irrespective of FEAT_MTE_PERM.

I don't see a problem with removing the upfront check. The reason I
kept the check was IIRC just that there was already a check there and
its logic needed to be adjusted for my VM_SHARED changes.

Peter


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-13 20:47         ` Peter Collingbourne
@ 2025-01-14  9:55           ` Suzuki K Poulose
  2025-01-15 13:15           ` Catalin Marinas
  1 sibling, 0 replies; 19+ messages in thread
From: Suzuki K Poulose @ 2025-01-14  9:55 UTC (permalink / raw)
  To: Peter Collingbourne, Catalin Marinas
  Cc: Aneesh Kumar K.V, linux-kernel, linux-arm-kernel, kvmarm,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

Hi,


On 13/01/2025 20:47, Peter Collingbourne wrote:
> On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
>>
>> On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
>>> Catalin Marinas <catalin.marinas@arm.com> writes:
>>>> On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
>>>>> Currently, the kernel won't start a guest if the MTE feature is enabled
>>>
>>> ...
>>>
>>>>> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>>>>             if (!vma)
>>>>>                     break;
>>>>>
>>>>> -          if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
>>>>> +          if (kvm_has_mte(kvm) &&
>>>>> +              !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
>>>>>                     ret = -EINVAL;
>>>>>                     break;
>>>>>             }
>>>>
>>>> I don't think we should change this, or at least not how it's done above
>>>> (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
>>>>
>>>> For standard memory slots, we want to reject them upfront rather than
>>>> deferring to the fault handler. An example here is file mmap() passed as
>>>> standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
>>>> I'd only relax this for VM_PFNMAP mappings further down in this
>>>> function (and move the VM_PFNMAP check above; see Suzuki's internal
>>>> patch, unless he posted it publicly already).

For the record, here is the patch Catalin was referring to.


--->8---

kvm: arm64: Allow device mappings with MTE

When a VM is configured to use MTE, we prevent mapping a "Device" to the VM.

e.g: with kvmtool (with additional debugging to dump error code and 
addresses)

$ lkvm run ... --vfio 0000:01:00.0 ...

  Info: Using IOMMU type 3 for VFIO container
    Error: 0000:01:00.0: failed to register region with KVM 
50020000-50022000: -22
       Warning: [0abc:aced] Error activating emulation for BAR 0
       Error: 0000:01:00.0: failed to configure regions
       Warning: Failed init: vfio__init

Only check for the MTE allowed for a VMA if it is not backed by "device"

Fixes: ea7fc1bb1cd1b ("KVM: arm64: Introduce MTE VM feature")
Cc: Steven Price <steven.price at arm.com>
Cc: Marc Zyngier <maz at kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose at arm.com>
---
  arch/arm64/kvm/mmu.c | 8 +++-----
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8b2d803e558a..7e084f22e185 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2295,17 +2295,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
  		if (!vma)
  			break;

-		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
-			ret = -EINVAL;
-			break;
-		}
-
  		if (vma->vm_flags & VM_PFNMAP) {
  			/* IO region dirty page logging not allowed */
  			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
  				ret = -EINVAL;
  				break;
  			}
+		} else if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
+			ret = -EINVAL;
+			break;
  		}
  		hva = min(reg_end, vma->vm_end);
  	} while (hva < reg_end);
-- 
2.34.1


Suzuki


>>>
>>> But we want to handle memslots backed by pagecache pages for virtio-shm
>>> here (virtiofs dax use case).
>>
>> Ah, I forgot about this use case. So with virtiofs DAX, does a host page
>> cache page (host VMM mmap()) get mapped directly into the guest as a
>> separate memory slot? In this case, the host vma would not have
>> VM_MTE_ALLOWED set.
>>
>>> With MTE_PERM, we can essentially skip the
>>> kvm_vma_mte_allowed(vma) check because we handle all types in the fault
>>> handler.
>>
>> This was pretty much the early behaviour when we added KVM support for
>> MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
>> VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
>> arm64: unify the tests for VMAs in memslots when MTE is enabled")
>> changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
>> subsequent commit removed the VM_SHARED check.
>>
>> It's a minor ABI change but I'm trying to figure out why we needed this
>> upfront check rather than simply dropping the VM_SHARED check. Adding
>> Peter in case he remembers. I can't see any race if we simply skipped
>> this check altogether, irrespective of FEAT_MTE_PERM.
> 
> I don't see a problem with removing the upfront check. The reason I
> kept the check was IIRC just that there was already a check there and
> its logic needed to be adjusted for my VM_SHARED changes.
> 
> Peter



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-13 20:47         ` Peter Collingbourne
  2025-01-14  9:55           ` Suzuki K Poulose
@ 2025-01-15 13:15           ` Catalin Marinas
  2025-01-28 10:31             ` Aneesh Kumar K.V
  1 sibling, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-01-15 13:15 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Aneesh Kumar K.V, linux-kernel, linux-arm-kernel, kvmarm,
	Suzuki K Poulose, Steven Price, Will Deacon, Marc Zyngier,
	Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu

On Mon, Jan 13, 2025 at 12:47:54PM -0800, Peter Collingbourne wrote:
> On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
> > > Catalin Marinas <catalin.marinas@arm.com> writes:
> > > > On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > > >> Currently, the kernel won't start a guest if the MTE feature is enabled
> > >
> > > ...
> > >
> > > >> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > > >>            if (!vma)
> > > >>                    break;
> > > >>
> > > >> -          if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> > > >> +          if (kvm_has_mte(kvm) &&
> > > >> +              !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
> > > >>                    ret = -EINVAL;
> > > >>                    break;
> > > >>            }
> > > >
> > > > I don't think we should change this, or at least not how it's done above
> > > > (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
> > > >
> > > > For standard memory slots, we want to reject them upfront rather than
> > > > deferring to the fault handler. An example here is file mmap() passed as
> > > > standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
> > > > I'd only relax this for VM_PFNMAP mappings further down in this
> > > > function (and move the VM_PFNMAP check above; see Suzuki's internal
> > > > patch, unless he posted it publicly already).
> > >
> > > But we want to handle memslots backed by pagecache pages for virtio-shm
> > > here (virtiofs dax use case).
> >
> > Ah, I forgot about this use case. So with virtiofs DAX, does a host page
> > cache page (host VMM mmap()) get mapped directly into the guest as a
> > separate memory slot? In this case, the host vma would not have
> > VM_MTE_ALLOWED set.
> >
> > > With MTE_PERM, we can essentially skip the
> > > kvm_vma_mte_allowed(vma) check because we handle all types in the fault
> > > handler.
> >
> > This was pretty much the early behaviour when we added KVM support for
> > MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
> > VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
> > arm64: unify the tests for VMAs in memslots when MTE is enabled")
> > changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
> > subsequent commit removed the VM_SHARED check.
> >
> > It's a minor ABI change but I'm trying to figure out why we needed this
> > upfront check rather than simply dropping the VM_SHARED check. Adding
> > Peter in case he remembers. I can't see any race if we simply skipped
> > this check altogether, irrespective of FEAT_MTE_PERM.
> 
> I don't see a problem with removing the upfront check. The reason I
> kept the check was IIRC just that there was already a check there and
> its logic needed to be adjusted for my VM_SHARED changes.

Prior to commit d89585fbb308, kvm_arch_prepare_memory_region() only
rejected a memory slot if VM_SHARED was set. This commit unified the
checking with user_mem_abort(), with slots being rejected if
(!VM_MTE_ALLOWED || VM_SHARED). A subsequent commit dropped the
VM_SHARED check, so we ended up with memory slots being rejected only if
!VM_MTE_ALLOWED (of course, if kvm_has_mte()). This wasn't the case
before the VM_SHARED relaxation.

So if you don't remember any strong reason for this change, I think we
should go back to the original behaviour of deferring the VM_MTE_ALLOWED
check to user_mem_abort() (and still permitting VM_SHARED).

-- 
Catalin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-15 13:15           ` Catalin Marinas
@ 2025-01-28 10:31             ` Aneesh Kumar K.V
  2025-01-29 14:38               ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Aneesh Kumar K.V @ 2025-01-28 10:31 UTC (permalink / raw)
  To: Catalin Marinas, Peter Collingbourne
  Cc: linux-kernel, linux-arm-kernel, kvmarm, Suzuki K Poulose,
	Steven Price, Will Deacon, Marc Zyngier, Mark Rutland,
	Oliver Upton, Joey Gouly, Zenghui Yu

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Mon, Jan 13, 2025 at 12:47:54PM -0800, Peter Collingbourne wrote:
>> On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
>> > > Catalin Marinas <catalin.marinas@arm.com> writes:
>> > > > On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> > > >> Currently, the kernel won't start a guest if the MTE feature is enabled
>> > >
>> > > ...
>> > >
>> > > >> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> > > >>            if (!vma)
>> > > >>                    break;
>> > > >>
>> > > >> -          if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
>> > > >> +          if (kvm_has_mte(kvm) &&
>> > > >> +              !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
>> > > >>                    ret = -EINVAL;
>> > > >>                    break;
>> > > >>            }
>> > > >
>> > > > I don't think we should change this, or at least not how it's done above
>> > > > (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
>> > > >
>> > > > For standard memory slots, we want to reject them upfront rather than
>> > > > deferring to the fault handler. An example here is file mmap() passed as
>> > > > standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
>> > > > I'd only relax this for VM_PFNMAP mappings further down in this
>> > > > function (and move the VM_PFNMAP check above; see Suzuki's internal
>> > > > patch, unless he posted it publicly already).
>> > >
>> > > But we want to handle memslots backed by pagecache pages for virtio-shm
>> > > here (virtiofs dax use case).
>> >
>> > Ah, I forgot about this use case. So with virtiofs DAX, does a host page
>> > cache page (host VMM mmap()) get mapped directly into the guest as a
>> > separate memory slot? In this case, the host vma would not have
>> > VM_MTE_ALLOWED set.
>> >
>> > > With MTE_PERM, we can essentially skip the
>> > > kvm_vma_mte_allowed(vma) check because we handle all types in the fault
>> > > handler.
>> >
>> > This was pretty much the early behaviour when we added KVM support for
>> > MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
>> > VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
>> > arm64: unify the tests for VMAs in memslots when MTE is enabled")
>> > changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
>> > subsequent commit removed the VM_SHARED check.
>> >
>> > It's a minor ABI change but I'm trying to figure out why we needed this
>> > upfront check rather than simply dropping the VM_SHARED check. Adding
>> > Peter in case he remembers. I can't see any race if we simply skipped
>> > this check altogether, irrespective of FEAT_MTE_PERM.
>> 
>> I don't see a problem with removing the upfront check. The reason I
>> kept the check was IIRC just that there was already a check there and
>> its logic needed to be adjusted for my VM_SHARED changes.
>
> Prior to commit d89585fbb308, kvm_arch_prepare_memory_region() only
> rejected a memory slot if VM_SHARED was set. This commit unified the
> checking with user_mem_abort(), with slots being rejected if
> (!VM_MTE_ALLOWED || VM_SHARED). A subsequent commit dropped the
> VM_SHARED check, so we ended up with memory slots being rejected only if
> !VM_MTE_ALLOWED (of course, if kvm_has_mte()). This wasn't the case
> before the VM_SHARED relaxation.
>
> So if you don't remember any strong reason for this change, I think we
> should go back to the original behaviour of deferring the VM_MTE_ALLOWED
> check to user_mem_abort() (and still permitting VM_SHARED).
>

Something as below?

From 466237a6f0a165152c157ab4a73f34c400cffe34 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Date: Tue, 28 Jan 2025 14:21:52 +0530
Subject: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation

Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
rejected a memory slot if VM_SHARED was set. This commit unified the
checking with user_mem_abort(), with slots being rejected if either
VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
enabled") dropped the VM_SHARED check, so we ended up with memory slots
being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
set was done to avoid a race condition with the test/set of the
PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
for MTE tag initialization") the kernel avoided allowing MTE with shared
pages, thereby preventing two tasks sharing a page from setting up the
PG_mte_tagged flag racily.

Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
initialization") further updated the locking so that the kernel
allows VM_SHARED mapping with MTE. With this commit, we can enable
memslot creation with VM_SHARED VMA mapping.

This patch results in a minor tweak to the ABI. We now allow creating
memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses
such a memslot with Allocation Tags, the kernel will generate -EFAULT.
ie, instead of failing early, we now fail later during KVM_RUN.

This change is needed because, without it, users are not able to use MTE
with VFIO passthrough, as shown below (kvmtool VMM).

[  617.921030] vfio-pci 0000:01:00.0: resetting
[  618.024719] vfio-pci 0000:01:00.0: reset done
  Error: 0000:01:00.0: failed to register region with KVM
  Warning: [0abc:aced] Error activating emulation for BAR 0
  Error: 0000:01:00.0: failed to configure regions
  Warning: Failed init: vfio__init

  Fatal: Initialisation failed

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/kvm/mmu.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 007dda958eab..610becd8574e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2146,11 +2146,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		if (!vma)
 			break;
 
-		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
-			ret = -EINVAL;
-			break;
-		}
-
 		if (vma->vm_flags & VM_PFNMAP) {
 			/* IO region dirty page logging not allowed */
 			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-- 
2.43.0




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
  2025-01-28 10:31             ` Aneesh Kumar K.V
@ 2025-01-29 14:38               ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2025-01-29 14:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Peter Collingbourne, linux-kernel, linux-arm-kernel, kvmarm,
	Suzuki K Poulose, Steven Price, Will Deacon, Marc Zyngier,
	Mark Rutland, Oliver Upton, Joey Gouly, Zenghui Yu

On Tue, Jan 28, 2025 at 04:01:18PM +0530, Aneesh Kumar K.V wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> > So if you don't remember any strong reason for this change, I think we
> > should go back to the original behaviour of deferring the VM_MTE_ALLOWED
> > check to user_mem_abort() (and still permitting VM_SHARED).
> 
> Something as below?
> 
> From 466237a6f0a165152c157ab4a73f34c400cffe34 Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
> Date: Tue, 28 Jan 2025 14:21:52 +0530
> Subject: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation
> 
> Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
> memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
> rejected a memory slot if VM_SHARED was set. This commit unified the
> checking with user_mem_abort(), with slots being rejected if either
> VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
> c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
> enabled") dropped the VM_SHARED check, so we ended up with memory slots
> being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
> the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
> set was done to avoid a race condition with the test/set of the
> PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
> for MTE tag initialization") the kernel avoided allowing MTE with shared
> pages, thereby preventing two tasks sharing a page from setting up the
> PG_mte_tagged flag racily.
> 
> Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
> initialization") further updated the locking so that the kernel
> allows VM_SHARED mapping with MTE. With this commit, we can enable
> memslot creation with VM_SHARED VMA mapping.
> 
> This patch results in a minor tweak to the ABI. We now allow creating
> memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses
> such a memslot with Allocation Tags, the kernel will generate -EFAULT.
> ie, instead of failing early, we now fail later during KVM_RUN.
> 
> This change is needed because, without it, users are not able to use MTE
> with VFIO passthrough, as shown below (kvmtool VMM).

You might want to clearly state it that the VFIO passthrough is Device
or NonCacheable (for the time being, parallel series from Ankit on
allowing cacheable).

> [  617.921030] vfio-pci 0000:01:00.0: resetting
> [  618.024719] vfio-pci 0000:01:00.0: reset done
>   Error: 0000:01:00.0: failed to register region with KVM
>   Warning: [0abc:aced] Error activating emulation for BAR 0
>   Error: 0000:01:00.0: failed to configure regions
>   Warning: Failed init: vfio__init
> 
>   Fatal: Initialisation failed
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 007dda958eab..610becd8574e 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2146,11 +2146,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  		if (!vma)
>  			break;
>  
> -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> -			ret = -EINVAL;
> -			break;
> -		}

Works for me. It's up to the KVM maintainers whether they are ok with
this ABI change.

-- 
Catalin


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-01-29 14:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 11:00 [PATCH v2 0/7] Add support for NoTagAccess memory attribute Aneesh Kumar K.V (Arm)
2025-01-10 11:00 ` [PATCH v2 1/7] arm64: Update the values to binary from hex Aneesh Kumar K.V (Arm)
2025-01-10 13:11   ` Catalin Marinas
2025-01-10 11:00 ` [PATCH v2 2/7] KVM: arm64: MTE: Update code comments Aneesh Kumar K.V (Arm)
2025-01-10 13:11   ` Catalin Marinas
2025-01-10 11:00 ` [PATCH v2 3/7] arm64: cpufeature: add Allocation Tag Access Permission (MTE_PERM) feature Aneesh Kumar K.V (Arm)
2025-01-10 13:15   ` Catalin Marinas
2025-01-10 11:00 ` [PATCH v2 4/7] KVM: arm64: MTE: Add KVM_CAP_ARM_MTE_PERM Aneesh Kumar K.V (Arm)
2025-01-10 11:00 ` [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported Aneesh Kumar K.V (Arm)
2025-01-10 18:20   ` Catalin Marinas
2025-01-11 13:19     ` Aneesh Kumar K.V
2025-01-13 19:09       ` Catalin Marinas
2025-01-13 20:47         ` Peter Collingbourne
2025-01-14  9:55           ` Suzuki K Poulose
2025-01-15 13:15           ` Catalin Marinas
2025-01-28 10:31             ` Aneesh Kumar K.V
2025-01-29 14:38               ` Catalin Marinas
2025-01-10 11:00 ` [PATCH v2 6/7] KVM: arm64: MTE: Nested guest support Aneesh Kumar K.V (Arm)
2025-01-10 11:00 ` [PATCH v2 7/7] KVM: arm64: Split some of the kvm_pgtable_prot bits into separate defines Aneesh Kumar K.V (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).