linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
@ 2024-02-08 15:18 Shameer Kolothum
  2024-02-08 15:18 ` [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs Shameer Kolothum
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

Hi,

On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature as part of the Distributed
Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
received by SMMUv3. This is very useful when the SMMUv3 shares the
page tables with the CPU(eg: Guest SVA use case). For this to work,
the SMMUv3 must use the same VMID that is allocated by KVM to configure
the nested stage 2(S2) translations.

An earlier proposal sent out[1] a while back resulted in changing the
ARM64/KVM VMID allocator similar to the ASID allocator to make it
better suited for this.

This RFC adds,
 -Support for pinned KVM VMID.
 -Support associating KVM pointer and iommufd ctx.
 -Changes to domain_alloc_user() to receive a kvm pointer.
 -Configure SMMUV3 S2 using KVM VMID
 -Finally enable BTM only if SMMUV3 supports S1 translation. This
  is to make sure that PAGING domains always use S1 and S2 is only
  used for nested domains with a valid KVM. The idea is to make sure
  when BTM is enabled in Guest, we use KVM VMID for S2.

Not sure I miss any explicit TLB invalidations with any use case
that may configure a S2 with a private VMID that matches a KVM
one.

This is based on Jason's ongoing SMMUv3 refactor series[2].

Please take a look and let me know.

Thanks,
Shameer

1. https://lore.kernel.org/linux-arm-kernel/20210222155338.26132-1-shameerali.kolothum.thodi@huawei.com/
2. https://lore.kernel.org/linux-arm-kernel/0-v5-cd1be8dd9c71+3fa-smmuv3_newapi_p1_jgg@nvidia.com/

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Enable broadcast TLB maintenance

Shameer Kolothum (6):
  KVM: Add generic infrastructure to support pinned VMIDs
  KVM: arm64: Introduce support to pin VMIDs
  KVM: arm64: Add interfaces for pinned VMID support
  iommufd: Associate kvm pointer to iommufd ctx
  iommu: Pass in kvm pointer to domain_alloc_user
  iommu/arm-smmu-v3: Use KVM VMID for s2 stage

 arch/arm64/include/asm/kvm_host.h           |  3 +
 arch/arm64/kvm/Kconfig                      |  1 +
 arch/arm64/kvm/arm.c                        | 14 ++++
 arch/arm64/kvm/vmid.c                       | 84 ++++++++++++++++++++-
 drivers/iommu/amd/iommu.c                   |  1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +
 drivers/iommu/intel/iommu.c                 |  1 +
 drivers/iommu/iommufd/hw_pagetable.c        |  5 +-
 drivers/iommu/iommufd/iommufd_private.h     |  3 +
 drivers/iommu/iommufd/main.c                | 14 ++++
 drivers/iommu/iommufd/selftest.c            |  1 +
 drivers/vfio/device_cdev.c                  |  3 +
 include/linux/iommu.h                       |  3 +-
 include/linux/iommufd.h                     |  7 ++
 include/linux/kvm_host.h                    | 18 +++++
 virt/kvm/Kconfig                            |  3 +
 virt/kvm/kvm_main.c                         | 23 ++++++
 18 files changed, 218 insertions(+), 11 deletions(-)

-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-06-24 15:48   ` Sean Christopherson
  2024-02-08 15:18 ` [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs Shameer Kolothum
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

Provide generic helper functions to get/put pinned VMIDs if the arch
supports it.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 include/linux/kvm_host.h | 18 ++++++++++++++++++
 virt/kvm/Kconfig         |  3 +++
 virt/kvm/kvm_main.c      | 23 +++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..610e239bea46 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2311,6 +2311,24 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+#ifdef CONFIG_HAVE_KVM_PINNED_VMID
+int kvm_arch_pinned_vmid_get(struct kvm *kvm);
+void kvm_arch_pinned_vmid_put(struct kvm *kvm);
+#endif
+
+#ifdef CONFIG_HAVE_KVM_PINNED_VMID
+int kvm_pinned_vmid_get(struct kvm *kvm);
+void kvm_pinned_vmid_put(struct kvm *kvm);
+#else
+static inline int kvm_pinned_vmid_get(struct kvm *kvm)
+{
+	return -EINVAL;
+}
+
+static inline void kvm_pinned_vmid_put(struct kvm *kvm)
+{
+}
+#endif
 /*
  * If more than one page is being (un)accounted, @virt must be the address of
  * the first page of a block of pages what were allocated together (i.e
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 184dab4ee871..a3052c8e3ac4 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -108,3 +108,6 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config HAVE_KVM_PINNED_VMID
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88a69f7..f84d6da5f464 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3918,6 +3918,29 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
 
+#ifdef CONFIG_HAVE_KVM_PINNED_VMID
+int kvm_pinned_vmid_get(struct kvm *kvm)
+{
+	int ret;
+
+	if (!kvm_get_kvm_safe(kvm))
+		return -ENOENT;
+	ret  = kvm_arch_pinned_vmid_get(kvm);
+	if (ret < 0)
+		kvm_put_kvm(kvm);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_pinned_vmid_get);
+
+void kvm_pinned_vmid_put(struct kvm *kvm)
+{
+	kvm_arch_pinned_vmid_put(kvm);
+	kvm_put_kvm(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_pinned_vmid_put);
+#endif
+
 #ifndef CONFIG_S390
 /*
  * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
  2024-02-08 15:18 ` [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-06-24 19:22   ` Oliver Upton
  2024-02-08 15:18 ` [RFC PATCH v2 3/7] KVM: arm64: Add interfaces for pinned VMID support Shameer Kolothum
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

Introduce kvm_arm_pinned_vmid_get() and kvm_arm_pinned_vmid_put(), to pin
a VMID associated with a KVM instance. This will guarantee that VMID
remains the same after a rollover.

This is in preparation of introducing support in the SMMUv3 driver to use
the KVM VMID for S2 stage configuration in nested mode.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kvm/vmid.c             | 84 ++++++++++++++++++++++++++++++-
 2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..20fb00d29f48 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -141,6 +141,7 @@ int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
 
 struct kvm_vmid {
 	atomic64_t id;
+	refcount_t pinned;
 };
 
 struct kvm_s2_mmu {
@@ -1097,6 +1098,8 @@ int __init kvm_arm_vmid_alloc_init(void);
 void __init kvm_arm_vmid_alloc_free(void);
 bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
 void kvm_arm_vmid_clear_active(void);
+unsigned long kvm_arm_pinned_vmid_get(struct kvm_vmid *kvm_vmid);
+void kvm_arm_pinned_vmid_put(struct kvm_vmid *kvm_vmid);
 
 static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
 {
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 806223b7022a..0ffe24683071 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -25,6 +25,10 @@ static unsigned long *vmid_map;
 static DEFINE_PER_CPU(atomic64_t, active_vmids);
 static DEFINE_PER_CPU(u64, reserved_vmids);
 
+static unsigned long max_pinned_vmids;
+static unsigned long nr_pinned_vmids;
+static unsigned long *pinned_vmid_map;
+
 #define VMID_MASK		(~GENMASK(kvm_arm_vmid_bits - 1, 0))
 #define VMID_FIRST_VERSION	(1UL << kvm_arm_vmid_bits)
 
@@ -47,7 +51,10 @@ static void flush_context(void)
 	int cpu;
 	u64 vmid;
 
-	bitmap_zero(vmid_map, NUM_USER_VMIDS);
+	if (pinned_vmid_map)
+		bitmap_copy(vmid_map, pinned_vmid_map, NUM_USER_VMIDS);
+	else
+		bitmap_zero(vmid_map, NUM_USER_VMIDS);
 
 	for_each_possible_cpu(cpu) {
 		vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
@@ -103,6 +110,14 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid)
 			return newvmid;
 		}
 
+		/*
+		 * If it is pinned, we can keep using it. Note that reserved
+		 * takes priority, because even if it is also pinned, we need to
+		 * update the generation into the reserved_vmids.
+		 */
+		if (refcount_read(&kvm_vmid->pinned))
+			return newvmid;
+
 		if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) {
 			atomic64_set(&kvm_vmid->id, newvmid);
 			return newvmid;
@@ -174,6 +189,63 @@ bool kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
 	return updated;
 }
 
+unsigned long kvm_arm_pinned_vmid_get(struct kvm_vmid *kvm_vmid)
+{
+	unsigned long flags;
+	u64 vmid;
+
+	if (!pinned_vmid_map)
+		return 0;
+
+	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
+
+	vmid = atomic64_read(&kvm_vmid->id);
+
+	if (refcount_inc_not_zero(&kvm_vmid->pinned))
+		goto out_unlock;
+
+	if (nr_pinned_vmids >= max_pinned_vmids) {
+		vmid = 0;
+		goto out_unlock;
+	}
+
+	/*
+	 * If we went through one or more rollover since that VMID was
+	 * used, make sure it is still valid, or generate a new one.
+	 */
+	if (!vmid_gen_match(vmid))
+		vmid = new_vmid(kvm_vmid);
+
+	nr_pinned_vmids++;
+	__set_bit(vmid2idx(vmid), pinned_vmid_map);
+	refcount_set(&kvm_vmid->pinned, 1);
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+
+	vmid &= ~VMID_MASK;
+
+	return vmid;
+}
+
+void kvm_arm_pinned_vmid_put(struct kvm_vmid *kvm_vmid)
+{
+	unsigned long flags;
+	u64 vmid = atomic64_read(&kvm_vmid->id);
+
+	if (!pinned_vmid_map)
+		return;
+
+	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
+
+	if (refcount_dec_and_test(&kvm_vmid->pinned)) {
+		__clear_bit(vmid2idx(vmid), pinned_vmid_map);
+		nr_pinned_vmids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+}
+
 /*
  * Initialize the VMID allocator
  */
@@ -191,10 +263,20 @@ int __init kvm_arm_vmid_alloc_init(void)
 	if (!vmid_map)
 		return -ENOMEM;
 
+	pinned_vmid_map = bitmap_zalloc(NUM_USER_VMIDS, GFP_KERNEL);
+	nr_pinned_vmids = 0;
+
+	/*
+	 * Ensure we have at least one emty slot available after rollover
+	 * and maximum number of VMIDs are pinned. VMID#0 is reserved.
+	 */
+	max_pinned_vmids = NUM_USER_VMIDS - num_possible_cpus() - 2;
+
 	return 0;
 }
 
 void __init kvm_arm_vmid_alloc_free(void)
 {
+	bitmap_free(pinned_vmid_map);
 	bitmap_free(vmid_map);
 }
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 3/7] KVM: arm64: Add interfaces for pinned VMID support
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
  2024-02-08 15:18 ` [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs Shameer Kolothum
  2024-02-08 15:18 ` [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-02-08 15:18 ` [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx Shameer Kolothum
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

Provide interfaces to get/put pinned VMIDs from KVM VMID allocator.
This will be used by SMMUv3 driver in subsequent patch.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/kvm/Kconfig |  1 +
 arch/arm64/kvm/arm.c   | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 6c3c8ca73e7f..29ff79f112ba 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -40,6 +40,7 @@ menuconfig KVM
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
 	select XARRAY_MULTI
+	select HAVE_KVM_PINNED_VMID
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a25265aca432..ed905da6e9ab 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -711,6 +711,20 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+int kvm_arch_pinned_vmid_get(struct kvm *kvm)
+{
+	int vmid;
+
+	vmid = kvm_arm_pinned_vmid_get(&kvm->arch.mmu.vmid);
+
+	return (vmid == 0) ? -EINVAL : vmid;
+}
+
+void kvm_arch_pinned_vmid_put(struct kvm *kvm)
+{
+	kvm_arm_pinned_vmid_put(&kvm->arch.mmu.vmid);
+}
+
 bool kvm_arch_intc_initialized(struct kvm *kvm)
 {
 	return vgic_initialized(kvm);
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (2 preceding siblings ...)
  2024-02-08 15:18 ` [RFC PATCH v2 3/7] KVM: arm64: Add interfaces for pinned VMID support Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-02-08 15:42   ` Jason Gunthorpe
  2024-02-08 15:18 ` [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user Shameer Kolothum
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

Introduce an API to set the KVM pointer to the iommufd ctx
and set the same when a vfio dev is bind to iommufd.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  3 +++
 drivers/iommu/iommufd/main.c            | 14 ++++++++++++++
 drivers/vfio/device_cdev.c              |  3 +++
 include/linux/iommufd.h                 |  7 +++++++
 4 files changed, 27 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..28ede82bb1a6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -16,6 +16,7 @@ struct iommu_domain;
 struct iommu_group;
 struct iommu_option;
 struct iommufd_device;
+struct kvm;
 
 struct iommufd_ctx {
 	struct file *file;
@@ -27,6 +28,8 @@ struct iommufd_ctx {
 	/* Compatibility with VFIO no iommu */
 	u8 no_iommu_mode;
 	struct iommufd_ioas *vfio_ioas;
+	/* Associated KVM pointer */
+	struct kvm *kvm;
 };
 
 /*
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..28272510fba4 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -495,6 +495,20 @@ void iommufd_ctx_put(struct iommufd_ctx *ictx)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_ctx_put, IOMMUFD);
 
+/**
+ * iommufd_ctx_set_kvm - Called to set a KVM pointer to iommufd context
+ * @ictx: Context to operate on
+ * @kvm: KVM pointer with a reference taken using kvm_get_kvm_safe()
+ */
+void iommufd_ctx_set_kvm(struct iommufd_ctx *ictx, struct kvm *kvm)
+{
+	xa_lock(&ictx->objects);
+	if (!ictx->kvm)
+		ictx->kvm = kvm;
+	xa_unlock(&ictx->objects);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_set_kvm, IOMMUFD);
+
 static const struct iommufd_object_ops iommufd_object_ops[] = {
 	[IOMMUFD_OBJ_ACCESS] = {
 		.destroy = iommufd_access_destroy_object,
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index e75da0a70d1f..e75e96fb57cb 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -101,6 +101,9 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	 */
 	vfio_df_get_kvm_safe(df);
 
+	if (df->kvm)
+		iommufd_ctx_set_kvm(df->iommufd, df->kvm);
+
 	ret = vfio_df_open(df);
 	if (ret)
 		goto out_put_kvm;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index ffc3a949f837..7408b620d0b8 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -17,6 +17,7 @@ struct iommufd_ctx;
 struct iommufd_access;
 struct file;
 struct iommu_group;
+struct kvm;
 
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
@@ -59,6 +60,7 @@ struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
 struct iommufd_ctx *iommufd_ctx_from_fd(int fd);
 void iommufd_ctx_put(struct iommufd_ctx *ictx);
 bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
+void iommufd_ctx_set_kvm(struct iommufd_ctx *ictx, struct kvm *kvm);
 
 int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
@@ -80,6 +82,11 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
 {
 }
 
+static inline void iommufd_ctx_set_kvm(struct iommufd_ctx *ictx,
+				       struct kvm *kvm)
+{
+}
+
 static inline int iommufd_access_pin_pages(struct iommufd_access *access,
 					   unsigned long iova,
 					   unsigned long length,
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (3 preceding siblings ...)
  2024-02-08 15:18 ` [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-02-08 15:43   ` Jason Gunthorpe
  2024-02-08 15:18 ` [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage Shameer Kolothum
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

No functional changes.

This will be used in a later patch to add support to use
KVM VMID in ARM SMMUv3 s2 stage configuration.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/amd/iommu.c                   | 1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
 drivers/iommu/intel/iommu.c                 | 1 +
 drivers/iommu/iommufd/hw_pagetable.c        | 5 +++--
 drivers/iommu/iommufd/selftest.c            | 1 +
 include/linux/iommu.h                       | 3 ++-
 6 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4283dd8191f0..75e0f4e9253a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2244,6 +2244,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
 static struct iommu_domain *
 amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
 			    struct iommu_domain *parent,
+			    struct kvm *kvm,
 			    const struct iommu_user_data *user_data)
 
 {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0606166a8781..b41d77787a2f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3190,6 +3190,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 static struct iommu_domain *
 arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
 			   struct iommu_domain *parent,
+			   struct kvm *kvm,
 			   const struct iommu_user_data *user_data)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6fb5f6fceea1..992a2f233198 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3877,6 +3877,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 static struct iommu_domain *
 intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 			      struct iommu_domain *parent,
+			      struct kvm *kvm,
 			      const struct iommu_user_data *user_data)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 3f3f1fa1a0a9..1efc4ef863ac 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -129,7 +129,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 	if (ops->domain_alloc_user) {
 		hwpt->domain = ops->domain_alloc_user(idev->dev, flags, NULL,
-						      user_data);
+						      ictx->kvm, user_data);
 		if (IS_ERR(hwpt->domain)) {
 			rc = PTR_ERR(hwpt->domain);
 			hwpt->domain = NULL;
@@ -228,7 +228,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	hwpt_nested->parent = parent;
 
 	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
-					      parent->common.domain, user_data);
+					      parent->common.domain,
+					      ictx->kvm, user_data);
 	if (IS_ERR(hwpt->domain)) {
 		rc = PTR_ERR(hwpt->domain);
 		hwpt->domain = NULL;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d9e9920c7eba..483bb7216235 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -269,6 +269,7 @@ __mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent,
 static struct iommu_domain *
 mock_domain_alloc_user(struct device *dev, u32 flags,
 		       struct iommu_domain *parent,
+		       struct kvm *kvm,
 		       const struct iommu_user_data *user_data)
 {
 	struct mock_iommu_domain *mock_parent;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1446e3718642..ad14cdc22863 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -43,6 +43,7 @@ struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
 struct iommu_dma_cookie;
+struct kvm;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
@@ -458,7 +459,7 @@ struct iommu_ops {
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
 	struct iommu_domain *(*domain_alloc_user)(
 		struct device *dev, u32 flags, struct iommu_domain *parent,
-		const struct iommu_user_data *user_data);
+		struct kvm *kvm, const struct iommu_user_data *user_data);
 	struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
 	struct iommu_domain *(*domain_alloc_sva)(struct device *dev,
 						 struct mm_struct *mm);
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (4 preceding siblings ...)
  2024-02-08 15:18 ` [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-02-08 15:59   ` Jason Gunthorpe
  2024-02-08 15:18 ` [RFC PATCH v2 7/7] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Shameer Kolothum
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

If kvm is available make use of kvm pinned VMID interfaces to
set the s2 stage VMID for nested domains.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b41d77787a2f..18e3e04b50f4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -18,6 +18,7 @@
 #include <linux/interrupt.h>
 #include <linux/io-pgtable.h>
 #include <linux/iopoll.h>
+#include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -2399,9 +2400,13 @@ int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu,
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) {
 		int vmid;
 
-		/* Reserve VMID 0 for stage-2 bypass STEs */
-		vmid = ida_alloc_range(&smmu->vmid_map, 1,
-				       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
+		if (smmu_domain->kvm) {
+			vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
+		} else {
+			/* Reserve VMID 0 for stage-2 bypass STEs */
+			vmid = ida_alloc_range(&smmu->vmid_map, 1,
+					       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
+		}
 		if (vmid < 0)
 			return vmid;
 		smmu_domain->vmid = vmid;
@@ -2431,7 +2436,10 @@ void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
 		   smmu_domain->vmid) {
 		arm_smmu_tlb_inv_all_s2(smmu_domain);
-		ida_free(&smmu->vmid_map, smmu_domain->vmid);
+		if (smmu_domain->kvm)
+			kvm_pinned_vmid_put(smmu_domain->kvm);
+		else
+			ida_free(&smmu->vmid_map, smmu_domain->vmid);
 	}
 }
 
@@ -3217,7 +3225,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
 			goto err_free;
 		}
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
-		smmu_domain->nesting_parent = true;
+		smmu_domain->kvm = kvm;
 	}
 
 	smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 45bcd72fcda4..7d732ea2a6ee 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -758,6 +758,8 @@ struct arm_smmu_domain {
 	struct mmu_notifier		mmu_notifier;
 	bool				btm_invalidation : 1;
 	bool				nesting_parent : 1;
+
+	struct kvm			*kvm;
 };
 
 struct arm_smmu_nested_domain {
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 7/7] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (5 preceding siblings ...)
  2024-02-08 15:18 ` [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage Shameer Kolothum
@ 2024-02-08 15:18 ` Shameer Kolothum
  2024-02-08 15:35 ` [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Jason Gunthorpe
  2024-02-09 11:58 ` Jean-Philippe Brucker
  8 siblings, 0 replies; 35+ messages in thread
From: Shameer Kolothum @ 2024-02-08 15:18 UTC (permalink / raw)
  To: kvmarm, iommu, linux-arm-kernel, linuxarm
  Cc: kevin.tian, jgg, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

In order to avoid over invalidation with stage-2 translation contexts,
enable BTM only when SMMUv3 supports eiher S1 or both S1 & S2 transaltion
contexts. In this way the default domain will use stage-1 and stage 2 will
be only used for NESTED Domain setup.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
[Shameer: Enable BTM only if S1 is supported]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 18e3e04b50f4..f40912de9e9f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4060,11 +4060,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -4209,6 +4212,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -4261,7 +4265,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
@@ -4286,6 +4290,21 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_S2P)
 		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
+	/*
+	 * If S1 is supported, check we can enable BTM. This means if S2 is available,
+	 * we will use S2 for nested domain only with a KVM VMID. BTM is useful when
+	 * CPU shares the page tables with SMMUv3(eg: vSVA)
+	 */
+	if (reg & IDR0_S1P) {
+		/*
+		 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+		 * will create TLB entries for NH-EL1 world and will miss the
+		 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+		 * BTM in that case.
+		 */
+		if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+			smmu->features |= ARM_SMMU_FEAT_BTM;
+	}
 
 	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
 		dev_err(smmu->dev, "no translation support!\n");
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 7d732ea2a6ee..ff3de784d1be 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (6 preceding siblings ...)
  2024-02-08 15:18 ` [RFC PATCH v2 7/7] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Shameer Kolothum
@ 2024-02-08 15:35 ` Jason Gunthorpe
  2024-02-08 15:49   ` Shameerali Kolothum Thodi
  2024-02-09 11:58 ` Jean-Philippe Brucker
  8 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 15:35 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote:
> Hi,
> 
> On an ARM64 system with a SMMUv3 implementation that fully supports
> Broadcast TLB Maintenance(BTM) feature as part of the Distributed
> Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
> received by SMMUv3. This is very useful when the SMMUv3 shares the
> page tables with the CPU(eg: Guest SVA use case). For this to work,
> the SMMUv3 must use the same VMID that is allocated by KVM to configure
> the nested stage 2(S2) translations.

Ah so you are going all the way and looking to enable BTM within a
vSVA environment too?

> An earlier proposal sent out[1] a while back resulted in changing the
> ARM64/KVM VMID allocator similar to the ASID allocator to make it
> better suited for this.
> 
> This RFC adds,
>  -Support for pinned KVM VMID.
>  -Support associating KVM pointer and iommufd ctx.
>  -Changes to domain_alloc_user() to receive a kvm pointer.
>  -Configure SMMUV3 S2 using KVM VMID
>  -Finally enable BTM only if SMMUV3 supports S1 translation. This
>   is to make sure that PAGING domains always use S1 and S2 is only
>   used for nested domains with a valid KVM. The idea is to make sure
>   when BTM is enabled in Guest, we use KVM VMID for S2.
> 
> Not sure I miss any explicit TLB invalidations with any use case
> that may configure a S2 with a private VMID that matches a KVM
> one.
> 
> This is based on Jason's ongoing SMMUv3 refactor series[2].

I'm glad to see this, thank you for finishing the BTM stuff!

If someone is using it I wonder if we need to get a more solid answer
on the races with invalidation an ASID reassign.. I added some notes
in comments in part 2 after I audited all of it.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-02-08 15:18 ` [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx Shameer Kolothum
@ 2024-02-08 15:42   ` Jason Gunthorpe
  2024-06-24 16:53     ` Sean Christopherson
  2024-06-25  1:53     ` Tian, Kevin
  0 siblings, 2 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 15:42 UTC (permalink / raw)
  To: Shameer Kolothum, Sean Christopherson
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024 at 03:18:34PM +0000, Shameer Kolothum wrote:

> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 991f864d1f9b..28ede82bb1a6 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -16,6 +16,7 @@ struct iommu_domain;
>  struct iommu_group;
>  struct iommu_option;
>  struct iommufd_device;
> +struct kvm;
>  
>  struct iommufd_ctx {
>  	struct file *file;
> @@ -27,6 +28,8 @@ struct iommufd_ctx {
>  	/* Compatibility with VFIO no iommu */
>  	u8 no_iommu_mode;
>  	struct iommufd_ioas *vfio_ioas;
> +	/* Associated KVM pointer */
> +	struct kvm *kvm;
>  };

Associating the KVM with the entire iommufd is a big hammer, is this
what we want to do?

I know it has to be linked to domain allocation and the coming
"viommu" object, and it is already linked to VFIO.

It means we support one KVM per iommufd (which doesn't seem
unreasonable, but also the first time we've had such a limitation)

The other option would be to pass in the kvm to the individual sub
objects.

Kevin?

Sean would you be OK with this approach considering your other series
to try to make more of this private?

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user
  2024-02-08 15:18 ` [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user Shameer Kolothum
@ 2024-02-08 15:43   ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 15:43 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024 at 03:18:35PM +0000, Shameer Kolothum wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 4283dd8191f0..75e0f4e9253a 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2244,6 +2244,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
>  static struct iommu_domain *
>  amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
>  			    struct iommu_domain *parent,
> +			    struct kvm *kvm,
>  			    const struct iommu_user_data *user_data)

It would also be fine to stick this in iommu_user_data - since it is
fully optional to any driver. 

The other parameters are things the drivers must check and do
something with.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-08 15:35 ` [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Jason Gunthorpe
@ 2024-02-08 15:49   ` Shameerali Kolothum Thodi
  2024-02-08 16:07     ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-02-08 15:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, kevin.tian@intel.com,
	alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, February 8, 2024 3:36 PM
> To: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; linuxarm@huawei.com; kevin.tian@intel.com;
> alex.williamson@redhat.com; maz@kernel.org; oliver.upton@linux.dev;
> will@kernel.org; robin.murphy@arm.com; jean-philippe@linaro.org;
> jonathan.cameron@huawei.com
> Subject: Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID
> for stage 2
> 
> On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote:
> > Hi,
> >
> > On an ARM64 system with a SMMUv3 implementation that fully supports
> > Broadcast TLB Maintenance(BTM) feature as part of the Distributed
> > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
> > received by SMMUv3. This is very useful when the SMMUv3 shares the
> > page tables with the CPU(eg: Guest SVA use case). For this to work,
> > the SMMUv3 must use the same VMID that is allocated by KVM to configure
> > the nested stage 2(S2) translations.
> 
> Ah so you are going all the way and looking to enable BTM within a
> vSVA environment too?

Yes, eager to get the vSVA support 😊 Its been ages and tired of rebasing for our
private git for the vSVA cases!.

Also for Host SVA, since we are using S1 now, I don’t think this matters that much.

(I do have some questions on the iommufd based vSVA , but I will come back to it,
once I cleanup my Qemu branch for that.)

> 
> > An earlier proposal sent out[1] a while back resulted in changing the
> > ARM64/KVM VMID allocator similar to the ASID allocator to make it
> > better suited for this.
> >
> > This RFC adds,
> >  -Support for pinned KVM VMID.
> >  -Support associating KVM pointer and iommufd ctx.
> >  -Changes to domain_alloc_user() to receive a kvm pointer.
> >  -Configure SMMUV3 S2 using KVM VMID
> >  -Finally enable BTM only if SMMUV3 supports S1 translation. This
> >   is to make sure that PAGING domains always use S1 and S2 is only
> >   used for nested domains with a valid KVM. The idea is to make sure
> >   when BTM is enabled in Guest, we use KVM VMID for S2.
> >
> > Not sure I miss any explicit TLB invalidations with any use case
> > that may configure a S2 with a private VMID that matches a KVM
> > one.
> >
> > This is based on Jason's ongoing SMMUv3 refactor series[2].
> 
> I'm glad to see this, thank you for finishing the BTM stuff!
> 
> If someone is using it I wonder if we need to get a more solid answer
> on the races with invalidation an ASID reassign.. I added some notes
> in comments in part 2 after I audited all of it.

I saw that one. The main point here is, if we guarantee that BTM is only used
for a nested S2 case and there will be a pinned KVM VMID for that, I am 
not sure we need to worry about any S2 related conflicts in TLBIs.

Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage
  2024-02-08 15:18 ` [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage Shameer Kolothum
@ 2024-02-08 15:59   ` Jason Gunthorpe
  2024-02-08 16:14     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 15:59 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024 at 03:18:36PM +0000, Shameer Kolothum wrote:
> If kvm is available make use of kvm pinned VMID interfaces to
> set the s2 stage VMID for nested domains.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++-----
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index b41d77787a2f..18e3e04b50f4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -18,6 +18,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io-pgtable.h>
>  #include <linux/iopoll.h>
> +#include <linux/kvm_host.h>
>  #include <linux/module.h>
>  #include <linux/msi.h>
>  #include <linux/of.h>
> @@ -2399,9 +2400,13 @@ int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu,
>  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) {
>  		int vmid;
>  
> -		/* Reserve VMID 0 for stage-2 bypass STEs */
> -		vmid = ida_alloc_range(&smmu->vmid_map, 1,
> -				       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> +		if (smmu_domain->kvm) {
> +			vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
> +		} else {
> +			/* Reserve VMID 0 for stage-2 bypass STEs */
> +			vmid = ida_alloc_range(&smmu->vmid_map, 1,
> +					       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> +		}

We cannot allow the two different STEs to be programmed with the same
VMID but different translations, so somehow the two allocators have to
work together.

This is why the SVA BTM code has that complex ASID reassignment logic
so it can get away with two allocators. However ASID also has SMMU HW
ASET support to opt-in to the BTM broadcast.

My suggestion is to avoid two allocators and make iommu instances that
support BTM always use the KVM owned VMID allocator by forbidding a S2
domain from being created with a NULL KVM.

IOW all the DMA API/etc will use S1 domains and the only way to get to
a S2 is to allocate a nesting parent via iommufd - for BTM systems
only.

Since the S2 can't opt-out of the BTM broadcast this means the VMIDs
are cleanly assigned and we never get an issue where the KVM TLBI's
are flushing an unrelated S2.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-08 15:49   ` Shameerali Kolothum Thodi
@ 2024-02-08 16:07     ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 16:07 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, kevin.tian@intel.com,
	alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron

On Thu, Feb 08, 2024 at 03:49:20PM +0000, Shameerali Kolothum Thodi wrote:
> > On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote:
> > > Hi,
> > >
> > > On an ARM64 system with a SMMUv3 implementation that fully supports
> > > Broadcast TLB Maintenance(BTM) feature as part of the Distributed
> > > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
> > > received by SMMUv3. This is very useful when the SMMUv3 shares the
> > > page tables with the CPU(eg: Guest SVA use case). For this to work,
> > > the SMMUv3 must use the same VMID that is allocated by KVM to configure
> > > the nested stage 2(S2) translations.
> > 
> > Ah so you are going all the way and looking to enable BTM within a
> > vSVA environment too?
> 
> Yes, eager to get the vSVA support 😊 Its been ages and tired of rebasing for our
> private git for the vSVA cases!.
> 
> Also for Host SVA, since we are using S1 now, I don’t think this matters that much.
> 
> (I do have some questions on the iommufd based vSVA , but I will come back to it,
> once I cleanup my Qemu branch for that.)

Oh, I think I left a comment someplace that you also need to signal to the
VMM that the kernel supports vBTM, probably via new flag in the
iommu_hw_info_arm_smmuv3

+ *
+ * Note that these values reflect the raw HW capability, without any insight if
+ * any required kernel driver support is present. Bits may be set indicating the
+ * HW has functionality that is lacking kernel software support, such as BTM. If
+ * a VMM is using this information to construct emulated copies of these
+ * registers it should only forward bits that it knows it can support.
+ *
+ * In future, presence of required kernel support will be indicated in flags.
+ */

> > If someone is using it I wonder if we need to get a more solid answer
> > on the races with invalidation an ASID reassign.. I added some notes
> > in comments in part 2 after I audited all of it.
> 
> I saw that one. The main point here is, if we guarantee that BTM is only used
> for a nested S2 case and there will be a pinned KVM VMID for that, I am 
> not sure we need to worry about any S2 related conflicts in TLBIs.

Sure for the S2, aside my other note, but my remark here was about the
S1 ASID reassign logic. I was sort of OK to leave it as it was under
the idea that BTM support wasn't turned on.

I'm concerned it is another iommufd triggerable security problem..

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage
  2024-02-08 15:59   ` Jason Gunthorpe
@ 2024-02-08 16:14     ` Shameerali Kolothum Thodi
  2024-02-08 16:26       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-02-08 16:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, February 8, 2024 3:59 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>;
> kevin.tian@intel.com; alex.williamson@redhat.com; maz@kernel.org;
> oliver.upton@linux.dev; will@kernel.org; robin.murphy@arm.com; jean-
> philippe@linaro.org; Jonathan Cameron <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2
> stage
> 
> On Thu, Feb 08, 2024 at 03:18:36PM +0000, Shameer Kolothum wrote:
> > If kvm is available make use of kvm pinned VMID interfaces to
> > set the s2 stage VMID for nested domains.
> >
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++-----
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
> >  2 files changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index b41d77787a2f..18e3e04b50f4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -18,6 +18,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/io-pgtable.h>
> >  #include <linux/iopoll.h>
> > +#include <linux/kvm_host.h>
> >  #include <linux/module.h>
> >  #include <linux/msi.h>
> >  #include <linux/of.h>
> > @@ -2399,9 +2400,13 @@ int arm_smmu_domain_alloc_id(struct
> arm_smmu_device *smmu,
> >  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) {
> >  		int vmid;
> >
> > -		/* Reserve VMID 0 for stage-2 bypass STEs */
> > -		vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > -				       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> > +		if (smmu_domain->kvm) {
> > +			vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
> > +		} else {
> > +			/* Reserve VMID 0 for stage-2 bypass STEs */
> > +			vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > +					       (1 << smmu->vmid_bits) - 1,
> GFP_KERNEL);
> > +		}
> 
> We cannot allow the two different STEs to be programmed with the same
> VMID but different translations, so somehow the two allocators have to
> work together.

My idea was, we only set smmu_domain->kvm in the nested parent case
and that means SMMUv3 supports both S1 & S2. So all the non-nested domains
are using S1.

> 
> This is why the SVA BTM code has that complex ASID reassignment logic
> so it can get away with two allocators. However ASID also has SMMU HW
> ASET support to opt-in to the BTM broadcast.
> 
> My suggestion is to avoid two allocators and make iommu instances that
> support BTM always use the KVM owned VMID allocator by forbidding a S2
> domain from being created with a NULL KVM.

This is what I was trying to do. But not sure we have systems out there where
only S2  is supported and that may require a private VMID without KVM.

> 
> IOW all the DMA API/etc will use S1 domains and the only way to get to
> a S2 is to allocate a nesting parent via iommufd - for BTM systems
> only.
> 
> Since the S2 can't opt-out of the BTM broadcast this means the VMIDs
> are cleanly assigned and we never get an issue where the KVM TLBI's
> are flushing an unrelated S2.

In the last patch we only enable BTM if only S1 is supported. Meaning we will
always have a KVM VMID associated for S2(if it supports that). 

Did I miss something here?

Thanks,
Shameer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage
  2024-02-08 16:14     ` Shameerali Kolothum Thodi
@ 2024-02-08 16:26       ` Jason Gunthorpe
  2024-02-08 16:36         ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 16:26 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron

On Thu, Feb 08, 2024 at 04:14:07PM +0000, Shameerali Kolothum Thodi wrote:
> > >  #include <linux/of.h>
> > > @@ -2399,9 +2400,13 @@ int arm_smmu_domain_alloc_id(struct
> > arm_smmu_device *smmu,
> > >  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) {
> > >  		int vmid;
> > >
> > > -		/* Reserve VMID 0 for stage-2 bypass STEs */
> > > -		vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > > -				       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> > > +		if (smmu_domain->kvm) {
> > > +			vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
> > > +		} else {
> > > +			/* Reserve VMID 0 for stage-2 bypass STEs */
> > > +			vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > > +					       (1 << smmu->vmid_bits) - 1,
> > GFP_KERNEL);
> > > +		}
> > 
> > We cannot allow the two different STEs to be programmed with the same
> > VMID but different translations, so somehow the two allocators have to
> > work together.
> 
> My idea was, we only set smmu_domain->kvm in the nested parent case
> and that means SMMUv3 supports both S1 & S2. So all the non-nested domains
> are using S1.

Okay, that is what I was thinking to get to.

The logic needs to reflect this directly though:

/*
 * There can only be one allocator for VMIDs active at once. If BTM is
 * turned on then KVM's allocator always supplies the VMID, and the
 * VMID is matched by CPU invalidation of the KVM S2. Right now there
 * is no API to get an unused VMID from KVM so this also means BTM systems
 * cannot support S2 without an associated KVM.
 */
if ((smmu->feature & ARM_SMMU_FEAT_BTM)) {
   if (!smmu_domain->kvm)
      vmid = -EOPNOTSUPP;
   else
       vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
} else {
	/* Reserve VMID 0 for stage-2 bypass STEs */
	vmid = ida_alloc_range(&smmu->vmid_map, 1,
		       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
}

(and matching change to the free side)

> This is what I was trying to do. But not sure we have systems out there where
> only S2  is supported and that may require a private VMID without KVM.

iommufd can ask for a nesting parent without supplying a KVM fd, which
would cause vmid aliasing between the two allocators. That needs to be
blocked as above

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage
  2024-02-08 16:26       ` Jason Gunthorpe
@ 2024-02-08 16:36         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-02-08 16:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, February 8, 2024 4:26 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>;
> kevin.tian@intel.com; alex.williamson@redhat.com; maz@kernel.org;
> oliver.upton@linux.dev; will@kernel.org; robin.murphy@arm.com; jean-
> philippe@linaro.org; Jonathan Cameron <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2
> stage
> 
> On Thu, Feb 08, 2024 at 04:14:07PM +0000, Shameerali Kolothum Thodi wrote:
> > > >  #include <linux/of.h>
> > > > @@ -2399,9 +2400,13 @@ int arm_smmu_domain_alloc_id(struct
> > > arm_smmu_device *smmu,
> > > >  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) {
> > > >  		int vmid;
> > > >
> > > > -		/* Reserve VMID 0 for stage-2 bypass STEs */
> > > > -		vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > > > -				       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> > > > +		if (smmu_domain->kvm) {
> > > > +			vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
> > > > +		} else {
> > > > +			/* Reserve VMID 0 for stage-2 bypass STEs */
> > > > +			vmid = ida_alloc_range(&smmu->vmid_map, 1,
> > > > +					       (1 << smmu->vmid_bits) - 1,
> > > GFP_KERNEL);
> > > > +		}
> > >
> > > We cannot allow the two different STEs to be programmed with the same
> > > VMID but different translations, so somehow the two allocators have to
> > > work together.
> >
> > My idea was, we only set smmu_domain->kvm in the nested parent case
> > and that means SMMUv3 supports both S1 & S2. So all the non-nested
> domains
> > are using S1.
> 
> Okay, that is what I was thinking to get to.
> 
> The logic needs to reflect this directly though:
> 
> /*
>  * There can only be one allocator for VMIDs active at once. If BTM is
>  * turned on then KVM's allocator always supplies the VMID, and the
>  * VMID is matched by CPU invalidation of the KVM S2. Right now there
>  * is no API to get an unused VMID from KVM so this also means BTM systems
>  * cannot support S2 without an associated KVM.
>  */
> if ((smmu->feature & ARM_SMMU_FEAT_BTM)) {
>    if (!smmu_domain->kvm)
>       vmid = -EOPNOTSUPP;
>    else
>        vmid = kvm_pinned_vmid_get(smmu_domain->kvm);
> } else {
> 	/* Reserve VMID 0 for stage-2 bypass STEs */
> 	vmid = ida_alloc_range(&smmu->vmid_map, 1,
> 		       (1 << smmu->vmid_bits) - 1, GFP_KERNEL);
> }
> 
> (and matching change to the free side)
> 
> > This is what I was trying to do. But not sure we have systems out there where
> > only S2  is supported and that may require a private VMID without KVM.
> 
> iommufd can ask for a nesting parent without supplying a KVM fd, which
> would cause vmid aliasing between the two allocators. That needs to be
> blocked as above

Ah..That's right. Missed that case where KVM can be NULL, but still there is a 
request for nested domain.

Thanks,
Shameer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
                   ` (7 preceding siblings ...)
  2024-02-08 15:35 ` [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Jason Gunthorpe
@ 2024-02-09 11:58 ` Jean-Philippe Brucker
  2024-02-09 12:48   ` Jason Gunthorpe
  2024-02-09 13:54   ` Shameerali Kolothum Thodi
  8 siblings, 2 replies; 35+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-09 11:58 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian, jgg,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jonathan.cameron

Hi Shameer,

On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote:
> Hi,
> 
> On an ARM64 system with a SMMUv3 implementation that fully supports
> Broadcast TLB Maintenance(BTM) feature as part of the Distributed
> Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
> received by SMMUv3. This is very useful when the SMMUv3 shares the
> page tables with the CPU(eg: Guest SVA use case). For this to work,
> the SMMUv3 must use the same VMID that is allocated by KVM to configure
> the nested stage 2(S2) translations.

The series makes sense to me. Maybe a little more detail to help the KVM
maintainers understand why we need something like this even though the s2
page tables aren't shared between CPU and SMMU:

* When enabling BTM in the SMMU, all TLB invalidations to the
  inner-shareable domain issued by the CPU are taken into account by the
  SMMU. That includes for example the TLBI IPAS2E1IS from
  __kvm_tlb_flush_vmid_range().

* BTM is enabled globally in the SMMU CR2 register. If we enable BTM for
  host SVA, then it also affects KVM.

* Stage-1 TLB entries in the SMMU have a bit (ASET) saying "this entry
  is private and does not participate in BTM", which we set for private
  SMMU address spaces.

  Annoyingly, the stage-2 TLB entries do not have it. With BTM all VMIDs
  are shared between CPU and SMMU.

* So, if the SMMU driver allocates VMID privately and we enable BTM, then
  CPU invalidations will remove unrelated SMMU TLB entries. Instead, the
  SMMU driver needs to coordinate with KVM on VMID allocation.

* Private stage-2 address spaces in the SMMU would need to allocate VMIDs
  that aren't used by KVM, but that's not a use-case at the moment:

  - For assigning devices to a host process or to a VM, we use private
    stage-1 mappings. stage-2 will be used to enable nesting translation,
    and will typically mirror the KVM stage-2 since it pins the guest
    address space.

  - If the SMMU doesn't support stage-1, the driver falls back to stage-2
    for private address spaces. For such an implementation we disable BTM.

  - The old VFIO_TYPE1_NESTING_IOMMU lets userspace allocate a private
    stage-2, and has only been used for testing as far as I know. I don't
    think I ever found a program that used it in the wild, but haven't
    checked recently.

    The effects of using it with BTM enabled is performance degradation:
    TLB entries of that VFIO container get invalidated by unrelated KVM
    activity, and maybe that can be used in a side-channel attack. 
 
    It needs to be deprecated over a few releases (starting with a
    warning maybe?), and the replacement API shouldn't allow creating a
    stage-2 without a KVM context.

Thanks,
Jean


> 
> An earlier proposal sent out[1] a while back resulted in changing the
> ARM64/KVM VMID allocator similar to the ASID allocator to make it
> better suited for this.
> 
> This RFC adds,
>  -Support for pinned KVM VMID.
>  -Support associating KVM pointer and iommufd ctx.
>  -Changes to domain_alloc_user() to receive a kvm pointer.
>  -Configure SMMUV3 S2 using KVM VMID
>  -Finally enable BTM only if SMMUV3 supports S1 translation. This
>   is to make sure that PAGING domains always use S1 and S2 is only
>   used for nested domains with a valid KVM. The idea is to make sure
>   when BTM is enabled in Guest, we use KVM VMID for S2.
> 
> Not sure I miss any explicit TLB invalidations with any use case
> that may configure a S2 with a private VMID that matches a KVM
> one.
> 
> This is based on Jason's ongoing SMMUv3 refactor series[2].
> 
> Please take a look and let me know.
> 
> Thanks,
> Shameer
> 
> 1. https://lore.kernel.org/linux-arm-kernel/20210222155338.26132-1-shameerali.kolothum.thodi@huawei.com/
> 2. https://lore.kernel.org/linux-arm-kernel/0-v5-cd1be8dd9c71+3fa-smmuv3_newapi_p1_jgg@nvidia.com/
> 
> Jean-Philippe Brucker (1):
>   iommu/arm-smmu-v3: Enable broadcast TLB maintenance
> 
> Shameer Kolothum (6):
>   KVM: Add generic infrastructure to support pinned VMIDs
>   KVM: arm64: Introduce support to pin VMIDs
>   KVM: arm64: Add interfaces for pinned VMID support
>   iommufd: Associate kvm pointer to iommufd ctx
>   iommu: Pass in kvm pointer to domain_alloc_user
>   iommu/arm-smmu-v3: Use KVM VMID for s2 stage
> 
>  arch/arm64/include/asm/kvm_host.h           |  3 +
>  arch/arm64/kvm/Kconfig                      |  1 +
>  arch/arm64/kvm/arm.c                        | 14 ++++
>  arch/arm64/kvm/vmid.c                       | 84 ++++++++++++++++++++-
>  drivers/iommu/amd/iommu.c                   |  1 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +
>  drivers/iommu/intel/iommu.c                 |  1 +
>  drivers/iommu/iommufd/hw_pagetable.c        |  5 +-
>  drivers/iommu/iommufd/iommufd_private.h     |  3 +
>  drivers/iommu/iommufd/main.c                | 14 ++++
>  drivers/iommu/iommufd/selftest.c            |  1 +
>  drivers/vfio/device_cdev.c                  |  3 +
>  include/linux/iommu.h                       |  3 +-
>  include/linux/iommufd.h                     |  7 ++
>  include/linux/kvm_host.h                    | 18 +++++
>  virt/kvm/Kconfig                            |  3 +
>  virt/kvm/kvm_main.c                         | 23 ++++++
>  18 files changed, 218 insertions(+), 11 deletions(-)
> 
> -- 
> 2.34.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-09 11:58 ` Jean-Philippe Brucker
@ 2024-02-09 12:48   ` Jason Gunthorpe
  2024-02-09 13:54   ` Shameerali Kolothum Thodi
  1 sibling, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2024-02-09 12:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Shameer Kolothum, kvmarm, iommu, linux-arm-kernel, linuxarm,
	kevin.tian, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jonathan.cameron

On Fri, Feb 09, 2024 at 11:58:24AM +0000, Jean-Philippe Brucker wrote:

> * Stage-1 TLB entries in the SMMU have a bit (ASET) saying "this entry
>   is private and does not participate in BTM", which we set for private
>   SMMU address spaces.
> 
>   Annoyingly, the stage-2 TLB entries do not have it. With BTM all VMIDs
>   are shared between CPU and SMMU.

Right, the spec justified this decision like this:

 Note: Arm expects that SMMU stage 2 address spaces are generally
 shared with their respective PE virtual machine stage 2
 configuration. If broadcast invalidation is required to be avoided
 for a particular SMMU stage 2 address space, Arm recommends that a
 hypervisor configures the STE with a VMID that is not allocated for
 virtual machine use on the PEs.

Which doesn't match how Linux works and I think after the recent KVM
PUCK call on this topic we can say it does not match how Linux will
work going into the future. Assuming the KVM S2 and IOMMU S2 are
shared is not true.

So unfortuntely this creates a waste as the BTM will generate
worthless invalidation workload on the related IOMMU S2. We cannot do
as the spec suggests to avoid broadcast invalidation here with a
unique VMID as that will break vSVA vBTM invalidation to the S1.

I do wonder how much of a performance negative this will create. At
least the S1 isn't flushed so perhaps the performace hit is
small.

Anyhow, I view it as a defect that the HW doesn't have a BTM ignore
bit at the S2 level so that we can use the same VMID to make vBTM work
but not participate in CPU originated invalidations for a non-shared
S2. A global bit to disable S2 BTM would have been fine for Linux.

>   - The old VFIO_TYPE1_NESTING_IOMMU lets userspace allocate a private
>     stage-2, and has only been used for testing as far as I know. I don't
>     think I ever found a program that used it in the wild, but haven't
>     checked recently.

There isn't, it is useless and cannot do anything. A patch has been
waiting to remove it for a while now, I've got it in my part 3 right
now.

>     It needs to be deprecated over a few releases (starting with a
>     warning maybe?),

No need, we just NOP'd it. It has no user visible side effect,
replacing the S2 with a S1 is fine.

>     and the replacement API shouldn't allow creating a
>     stage-2 without a KVM context.

See my remarks yesterday.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2
  2024-02-09 11:58 ` Jean-Philippe Brucker
  2024-02-09 12:48   ` Jason Gunthorpe
@ 2024-02-09 13:54   ` Shameerali Kolothum Thodi
  1 sibling, 0 replies; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-02-09 13:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, jgg@ziepe.ca, alex.williamson@redhat.com,
	maz@kernel.org, oliver.upton@linux.dev, will@kernel.org,
	robin.murphy@arm.com, Jonathan Cameron



> -----Original Message-----
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 9, 2024 11:58 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>;
> kevin.tian@intel.com; jgg@ziepe.ca; alex.williamson@redhat.com;
> maz@kernel.org; oliver.upton@linux.dev; will@kernel.org;
> robin.murphy@arm.com; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM
> VMID for stage 2
> 
> Hi Shameer,
> 
> On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote:
> > Hi,
> >
> > On an ARM64 system with a SMMUv3 implementation that fully supports
> > Broadcast TLB Maintenance(BTM) feature as part of the Distributed
> > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
> > received by SMMUv3. This is very useful when the SMMUv3 shares the
> > page tables with the CPU(eg: Guest SVA use case). For this to work,
> > the SMMUv3 must use the same VMID that is allocated by KVM to
> configure
> > the nested stage 2(S2) translations.
> 
> The series makes sense to me. Maybe a little more detail to help the KVM
> maintainers understand why we need something like this even though the s2
> page tables aren't shared between CPU and SMMU:

Thanks Jean for the nice write up. I will use this for next revisions of this series.

Shameer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs
  2024-02-08 15:18 ` [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs Shameer Kolothum
@ 2024-06-24 15:48   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2024-06-24 15:48 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian, jgg,
	alex.williamson, maz, oliver.upton, will, robin.murphy,
	jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024, Shameer Kolothum wrote:
> Provide generic helper functions to get/put pinned VMIDs if the arch
> supports it.

IMO, this has no business being in generic KVM.  Multiple architectures have
constructs that are _similar_ ARM's VMID, but AFAICT the exact semantics are very
ARM specific.  I.e. odds are very good that the only thing that can actually use
kvm_pinned_vmid_get() is the SMMU, because the concept won't fit any other
architecture.

The other issue is that other architectures support building KVM as a module,
which makes it much more difficult to guarantee the safety of these hooks.

> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  include/linux/kvm_host.h | 18 ++++++++++++++++++
>  virt/kvm/Kconfig         |  3 +++
>  virt/kvm/kvm_main.c      | 23 +++++++++++++++++++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..610e239bea46 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2311,6 +2311,24 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>  
> +#ifdef CONFIG_HAVE_KVM_PINNED_VMID
> +int kvm_arch_pinned_vmid_get(struct kvm *kvm);
> +void kvm_arch_pinned_vmid_put(struct kvm *kvm);
> +#endif
> +
> +#ifdef CONFIG_HAVE_KVM_PINNED_VMID
> +int kvm_pinned_vmid_get(struct kvm *kvm);
> +void kvm_pinned_vmid_put(struct kvm *kvm);
> +#else
> +static inline int kvm_pinned_vmid_get(struct kvm *kvm)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline void kvm_pinned_vmid_put(struct kvm *kvm)
> +{
> +}
> +#endif
>  /*
>   * If more than one page is being (un)accounted, @virt must be the address of
>   * the first page of a block of pages what were allocated together (i.e
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 184dab4ee871..a3052c8e3ac4 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -108,3 +108,6 @@ config KVM_GENERIC_PRIVATE_MEM
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select KVM_PRIVATE_MEM
>         bool
> +
> +config HAVE_KVM_PINNED_VMID
> +       bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 10bfc88a69f7..f84d6da5f464 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3918,6 +3918,29 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
>  
> +#ifdef CONFIG_HAVE_KVM_PINNED_VMID
> +int kvm_pinned_vmid_get(struct kvm *kvm)
> +{
> +	int ret;
> +
> +	if (!kvm_get_kvm_safe(kvm))
> +		return -ENOENT;
> +	ret  = kvm_arch_pinned_vmid_get(kvm);
> +	if (ret < 0)
> +		kvm_put_kvm(kvm);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pinned_vmid_get);
> +
> +void kvm_pinned_vmid_put(struct kvm *kvm)
> +{
> +	kvm_arch_pinned_vmid_put(kvm);
> +	kvm_put_kvm(kvm);
> +}
> +EXPORT_SYMBOL_GPL(kvm_pinned_vmid_put);
> +#endif
> +
>  #ifndef CONFIG_S390
>  /*
>   * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-02-08 15:42   ` Jason Gunthorpe
@ 2024-06-24 16:53     ` Sean Christopherson
  2024-06-24 17:07       ` Jason Gunthorpe
  2024-06-25  1:53     ` Tian, Kevin
  1 sibling, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2024-06-24 16:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, kvmarm, iommu, linux-arm-kernel, linuxarm,
	kevin.tian, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

On Thu, Feb 08, 2024, Jason Gunthorpe wrote:
> On Thu, Feb 08, 2024 at 03:18:34PM +0000, Shameer Kolothum wrote:
> 
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > index 991f864d1f9b..28ede82bb1a6 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -16,6 +16,7 @@ struct iommu_domain;
> >  struct iommu_group;
> >  struct iommu_option;
> >  struct iommufd_device;
> > +struct kvm;
> >  
> >  struct iommufd_ctx {
> >  	struct file *file;
> > @@ -27,6 +28,8 @@ struct iommufd_ctx {
> >  	/* Compatibility with VFIO no iommu */
> >  	u8 no_iommu_mode;
> >  	struct iommufd_ioas *vfio_ioas;
> > +	/* Associated KVM pointer */
> > +	struct kvm *kvm;
> >  };
> 
> Associating the KVM with the entire iommufd is a big hammer, is this
> what we want to do?
> 
> I know it has to be linked to domain allocation and the coming
> "viommu" object, and it is already linked to VFIO.
> 
> It means we support one KVM per iommufd (which doesn't seem
> unreasonable, but also the first time we've had such a limitation)

And if KVM+iommufd come as pairs, wouldn't iommufd_ctx_set_kvm() need to reject
attempts to bind devices associated with different KVMs, as opposed to silently
ignoring that case?  E.g. something like the below?  Or is there magic elsewhere
in the stack that prevents such a scenario?

void iommufd_ctx_set_kvm(struct iommufd_ctx *ictx, struct kvm *kvm)
{
	int r = 0;

	xa_lock(&ictx->objects);
	if (!ictx->kvm)
		ictx->kvm = kvm;
	else if (icxt->kvm != kvm)
		r = -EINVAL;
	xa_unlock(&ictx->objects);
}

> The other option would be to pass in the kvm to the individual sub
> objects.
> 
> Kevin?
> 
> Sean would you be OK with this approach considering your other series
> to try to make more of this private?

Sorry, I completely missed this.

If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM, then I don't
have any immediate concerns, as KVM ARM is a long, long way from being able to
isolate KVM from the core kernel.  And if we ever want/need to provide generic
APIs for propagating state from KVM into iommufd, bundling the KVM file pointer[*]
with a set of function pointers wouldn't be terribly difficult.

That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs out
of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing an
iommufd operation because of a (potentially transient) KVM resource issue is
rather unpleasant.

And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
suprise me if someone wanted to add cgroup integration, e.g. similar to the
misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
is analagous to an ARM VMID).

Rather than on-demand pinning, would it make sense to have KVM provide an ioctl()
(or capability, or VM type) to let userspace pin a VM's VMID?  That would allow
for a much saner failure mode, and I suspect would be cleaner in general for iommufd.

[*] https://lore.kernel.org/all/ZXkVSKULLivrMkBl@google.com


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 16:53     ` Sean Christopherson
@ 2024-06-24 17:07       ` Jason Gunthorpe
  2024-06-24 17:54         ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2024-06-24 17:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Shameer Kolothum, kvmarm, iommu, linux-arm-kernel, linuxarm,
	kevin.tian, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024 at 09:53:00AM -0700, Sean Christopherson wrote:
> > Associating the KVM with the entire iommufd is a big hammer, is this
> > what we want to do?
> > 
> > I know it has to be linked to domain allocation and the coming
> > "viommu" object, and it is already linked to VFIO.
> > 
> > It means we support one KVM per iommufd (which doesn't seem
> > unreasonable, but also the first time we've had such a limitation)
> 
> And if KVM+iommufd come as pairs, wouldn't iommufd_ctx_set_kvm() need to reject
> attempts to bind devices associated with different KVMs, as opposed to silently
> ignoring that case?  E.g. something like the below?  Or is there magic elsewhere
> in the stack that prevents such a scenario?

Yes, it would need things like that

But I think based on other discussions we are likely to tie to the KVM
to the coming IOMMUFD VIOMMU object, and the KVM will probably be
provided at object creation time to avoid this issue.

> > Sean would you be OK with this approach considering your other series
> > to try to make more of this private?
> 
> Sorry, I completely missed this.
> 
> If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM, then I don't
> have any immediate concerns, as KVM ARM is a long, long way from being able to
> isolate KVM from the core kernel.  

I think that is a reasonable thing, I also don't really see VMID as
being general. We will have to figure out how to ensure that the KVM
FD we got is an ARM KVM FD..

> That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs out
> of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing an
> iommufd operation because of a (potentially transient) KVM resource issue is
> rather unpleasant.

It is kind of subtle, but the only thing that will consume VMIDs is
IOMMUFD operations that are working with nested translation but not
providing KVMs. This is a pretty small blast radius - ie a specific
qemu will fail to start - that I think we can tolerate it.

More normal iommu operation will not require VMIDs so things like
driver attaching/etc is fine.

> And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> suprise me if someone wanted to add cgroup integration, e.g. similar to the
> misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> is analagous to an ARM VMID).

Yeah, but if someone is using such a cgroup then I expect they will
also have an up to date VMM that doesn't trigger this VMID allocation
in the first place...
 
> Rather than on-demand pinning, would it make sense to have KVM provide an ioctl()
> (or capability, or VM type) to let userspace pin a VM's VMID?  That would allow
> for a much saner failure mode, and I suspect would be cleaner in general for iommufd.

The point of this mechanism is to support using this iommufd feature
without a KVM at all. We could instead prevent this directly 100% of
the time, but it means that HW with this BTM capability would not run
the legacy VMMs at all, so I'm not that keen on it..

When a KVM is present then the iommu needs to adopt the VMID of KVM,
and that should have a mechanism to ensure the VMID is valid so long
as the IOMMU is using it (eg because the KVM FD is open)

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 17:07       ` Jason Gunthorpe
@ 2024-06-24 17:54         ` Sean Christopherson
  2024-06-24 18:01           ` Jason Gunthorpe
  2024-06-24 19:13           ` Shameerali Kolothum Thodi
  0 siblings, 2 replies; 35+ messages in thread
From: Sean Christopherson @ 2024-06-24 17:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, kvmarm, iommu, linux-arm-kernel, linuxarm,
	kevin.tian, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024, Jason Gunthorpe wrote:
> On Mon, Jun 24, 2024 at 09:53:00AM -0700, Sean Christopherson wrote:
> > If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM, then I don't
> > have any immediate concerns, as KVM ARM is a long, long way from being able to
> > isolate KVM from the core kernel.  
> 
> I think that is a reasonable thing, I also don't really see VMID as
> being general. We will have to figure out how to ensure that the KVM
> FD we got is an ARM KVM FD..

Isn't the caller in ARM specific code?  I was assuming kvm_pinned_vmid_{get,put}()
would simply not exist for non-ARM builds.

> > That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs out
> > of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing an
> > iommufd operation because of a (potentially transient) KVM resource issue is
> > rather unpleasant.
> 
> It is kind of subtle, but the only thing that will consume VMIDs is
> IOMMUFD operations that are working with nested translation but not
> providing KVMs. This is a pretty small blast radius - ie a specific
> qemu will fail to start - that I think we can tolerate it.
> 
> More normal iommu operation will not require VMIDs so things like
> driver attaching/etc is fine.
> 
> > And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> > suprise me if someone wanted to add cgroup integration, e.g. similar to the
> > misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> > is analagous to an ARM VMID).
> 
> Yeah, but if someone is using such a cgroup then I expect they will
> also have an up to date VMM that doesn't trigger this VMID allocation
> in the first place...

I suspect we're talking about two different things.  Either that, or I am really
lost.

> > Rather than on-demand pinning, would it make sense to have KVM provide an ioctl()
> > (or capability, or VM type) to let userspace pin a VM's VMID?  That would allow
> > for a much saner failure mode, and I suspect would be cleaner in general for iommufd.
> 
> The point of this mechanism is to support using this iommufd feature
> without a KVM at all. We could instead prevent this directly 100% of
> the time, but it means that HW with this BTM capability would not run
> the legacy VMMs at all, so I'm not that keen on it..
> 
> When a KVM is present then the iommu needs to adopt the VMID of KVM,
> and that should have a mechanism to ensure the VMID is valid so long
> as the IOMMU is using it (eg because the KVM FD is open)

Right, and that's what I'm referring to as "on-demand pinning".  For the IOMMU
to adopt a KVM VMID, the VMID needs to be pinned (or KVM would need to notify
the IOMMU every time the VMID changed), i.e. every KVM+IOMMU pair pins a VMID
that is managed by KVM.

Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.  Which
seems odd.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 17:54         ` Sean Christopherson
@ 2024-06-24 18:01           ` Jason Gunthorpe
  2024-06-24 19:12             ` Oliver Upton
  2024-06-24 19:13           ` Shameerali Kolothum Thodi
  1 sibling, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2024-06-24 18:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Shameer Kolothum, kvmarm, iommu, linux-arm-kernel, linuxarm,
	kevin.tian, alex.williamson, maz, oliver.upton, will,
	robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024 at 10:54:37AM -0700, Sean Christopherson wrote:
> > > And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> > > suprise me if someone wanted to add cgroup integration, e.g. similar to the
> > > misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> > > is analagous to an ARM VMID).
> > 
> > Yeah, but if someone is using such a cgroup then I expect they will
> > also have an up to date VMM that doesn't trigger this VMID allocation
> > in the first place...
> 
> I suspect we're talking about two different things.  Either that, or I am really
> lost.

I mean KVM will have already allocated and charged the cgroup for it's
use of the VMID. The IOMMU side just has to match it, no second
allocation of a VMID.

We wouldn't charge a cgroup for iommu and kvm sharing the same vmid.

> > When a KVM is present then the iommu needs to adopt the VMID of KVM,
> > and that should have a mechanism to ensure the VMID is valid so long
> > as the IOMMU is using it (eg because the KVM FD is open)
> 
> Right, and that's what I'm referring to as "on-demand pinning".  For the IOMMU
> to adopt a KVM VMID, the VMID needs to be pinned (or KVM would need to notify
> the IOMMU every time the VMID changed), i.e. every KVM+IOMMU pair pins a VMID
> that is managed by KVM.

Ok, right, yes, the expectation is that KVM allocates a VMID at some
point and it stays fixed for the life of that kvm.

If KVM can change VMID on the fly then that is a further complication
:\

> Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.  Which
> seems odd.

I had the impression the KVM always had a fixed VMID, but I don't
really know.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 18:01           ` Jason Gunthorpe
@ 2024-06-24 19:12             ` Oliver Upton
  2024-06-24 19:29               ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Upton @ 2024-06-24 19:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sean Christopherson, Shameer Kolothum, kvmarm, iommu,
	linux-arm-kernel, linuxarm, kevin.tian, alex.williamson, maz,
	will, robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024 at 03:01:48PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 24, 2024 at 10:54:37AM -0700, Sean Christopherson wrote:
> > > > And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> > > > suprise me if someone wanted to add cgroup integration, e.g. similar to the
> > > > misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> > > > is analagous to an ARM VMID).
> > > 
> > > Yeah, but if someone is using such a cgroup then I expect they will
> > > also have an up to date VMM that doesn't trigger this VMID allocation
> > > in the first place...
> > 
> > I suspect we're talking about two different things.  Either that, or I am really
> > lost.
> 
> I mean KVM will have already allocated and charged the cgroup for it's
> use of the VMID. The IOMMU side just has to match it, no second
> allocation of a VMID.
> 
> We wouldn't charge a cgroup for iommu and kvm sharing the same vmid.

I think the concern remains that an operator may want to limit the blast
radius of some runaway VMID allocation in a system. But you're right, a
well-intentioned VMM should wind up with a single charge for all the
stage-2's that used the VMID allocation.

> > > When a KVM is present then the iommu needs to adopt the VMID of KVM,
> > > and that should have a mechanism to ensure the VMID is valid so long
> > > as the IOMMU is using it (eg because the KVM FD is open)
> > 
> > Right, and that's what I'm referring to as "on-demand pinning".  For the IOMMU
> > to adopt a KVM VMID, the VMID needs to be pinned (or KVM would need to notify
> > the IOMMU every time the VMID changed), i.e. every KVM+IOMMU pair pins a VMID
> > that is managed by KVM.
> 
> Ok, right, yes, the expectation is that KVM allocates a VMID at some
> point and it stays fixed for the life of that kvm.
> 
> If KVM can change VMID on the fly then that is a further complication
> :\
> 
> > Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.  Which
> > seems odd.

This is bleeding a bit of implementation detail where VMID=0 is known to
be reserved (thus invalid), it'd probably be better if the
implementation actually just returned an error.

VMID=0 is associated with the host's MMU context, which is relevant when
running {n,h}VHE mode, as the VMID tags TLB entries even if stage-2
translation is disabled (HCR_EL2.VM = 0).

-- 
Thanks,
Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 17:54         ` Sean Christopherson
  2024-06-24 18:01           ` Jason Gunthorpe
@ 2024-06-24 19:13           ` Shameerali Kolothum Thodi
  1 sibling, 0 replies; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-06-24 19:13 UTC (permalink / raw)
  To: Sean Christopherson, Jason Gunthorpe
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron



> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Monday, June 24, 2024 6:55 PM
> To: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>;
> kevin.tian@intel.com; alex.williamson@redhat.com; maz@kernel.org;
> oliver.upton@linux.dev; will@kernel.org; robin.murphy@arm.com; jean-
> philippe@linaro.org; Jonathan Cameron <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd
> ctx
> 
> On Mon, Jun 24, 2024, Jason Gunthorpe wrote:
> > On Mon, Jun 24, 2024 at 09:53:00AM -0700, Sean Christopherson wrote:
> > > If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM,
> then I don't
> > > have any immediate concerns, as KVM ARM is a long, long way from
> being able to
> > > isolate KVM from the core kernel.
> >
> > I think that is a reasonable thing, I also don't really see VMID as
> > being general. We will have to figure out how to ensure that the KVM
> > FD we got is an ARM KVM FD..
> 
> Isn't the caller in ARM specific code?  I was assuming
> kvm_pinned_vmid_{get,put}()
> would simply not exist for non-ARM builds.

The caller is in ARM specific code(SMMUv3 driver). But at present, the kvm pointer
is passed to it by IOMMUFD during nested domain allocation time. And IOMMUFD
gets the kvm pointer from VFIO during device bind operation.

Not sure,, how this flow is going to be in  the new IOMMUFD VIOMMU object model
Jason referred. If not through IOMMUFD, SMMUv3 has to have a way to figure out
the KVM associated with the device.  

> 
> > > That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs
> out
> > > of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing
> an
> > > iommufd operation because of a (potentially transient) KVM resource
> issue is
> > > rather unpleasant.
> >
> > It is kind of subtle, but the only thing that will consume VMIDs is
> > IOMMUFD operations that are working with nested translation but not
> > providing KVMs. This is a pretty small blast radius - ie a specific
> > qemu will fail to start - that I think we can tolerate it.
> >
> > More normal iommu operation will not require VMIDs so things like
> > driver attaching/etc is fine.
> >
> > > And assuming that pinnable VMIDs are a somewhat scarce resource, it
> wouldn't
> > > suprise me if someone wanted to add cgroup integration, e.g. similar to
> the
> > > misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an
> SEV ASID
> > > is analagous to an ARM VMID).
> >
> > Yeah, but if someone is using such a cgroup then I expect they will
> > also have an up to date VMM that doesn't trigger this VMID allocation
> > in the first place...
> 
> I suspect we're talking about two different things.  Either that, or I am really
> lost.
> 
> > > Rather than on-demand pinning, would it make sense to have KVM
> provide an ioctl()
> > > (or capability, or VM type) to let userspace pin a VM's VMID?  That would
> allow
> > > for a much saner failure mode, and I suspect would be cleaner in general
> for iommufd.
> >
> > The point of this mechanism is to support using this iommufd feature
> > without a KVM at all. We could instead prevent this directly 100% of
> > the time, but it means that HW with this BTM capability would not run
> > the legacy VMMs at all, so I'm not that keen on it..
> >
> > When a KVM is present then the iommu needs to adopt the VMID of KVM,
> > and that should have a mechanism to ensure the VMID is valid so long
> > as the IOMMU is using it (eg because the KVM FD is open)
> 
> Right, and that's what I'm referring to as "on-demand pinning".  For the
> IOMMU
> to adopt a KVM VMID, the VMID needs to be pinned (or KVM would need to
> notify
> the IOMMU every time the VMID changed), i.e. every KVM+IOMMU pair pins
> a VMID
> that is managed by KVM.
> 
> Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.
> Which
> seems odd.

See patch 3, kvm_arch_pinned_vmid_get() return -EINVAL if VMID == 0. For ARM64/KVM
VMID 0 is always reserved and never allocated for a Guest.

Thanks,
Shameer




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs
  2024-02-08 15:18 ` [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs Shameer Kolothum
@ 2024-06-24 19:22   ` Oliver Upton
  2024-06-24 19:34     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Upton @ 2024-06-24 19:22 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvmarm, iommu, linux-arm-kernel, linuxarm, kevin.tian, jgg,
	alex.williamson, maz, will, robin.murphy, jean-philippe,
	jonathan.cameron

Hi Shameer,

On Thu, Feb 08, 2024 at 03:18:32PM +0000, Shameer Kolothum wrote:

[...]

> +unsigned long kvm_arm_pinned_vmid_get(struct kvm_vmid *kvm_vmid)
> +{
> +	unsigned long flags;
> +	u64 vmid;
> +
> +	if (!pinned_vmid_map)
> +		return 0;

Like I'd mentioned over in the other thread, returning 0 for error
conditions is rather confusing unless one is staring at the VMID
allocator code.

Can you rework this to return an actual error (e.g. -EINVAL) on failure?

> +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> +
> +	vmid = atomic64_read(&kvm_vmid->id);
> +
> +	if (refcount_inc_not_zero(&kvm_vmid->pinned))
> +		goto out_unlock;
> +
> +	if (nr_pinned_vmids >= max_pinned_vmids) {
> +		vmid = 0;
> +		goto out_unlock;
> +	}

You need to decrement the refcount in this error path.

> +
> +	/*
> +	 * If we went through one or more rollover since that VMID was
> +	 * used, make sure it is still valid, or generate a new one.
> +	 */
> +	if (!vmid_gen_match(vmid))
> +		vmid = new_vmid(kvm_vmid);
> +
> +	nr_pinned_vmids++;
> +	__set_bit(vmid2idx(vmid), pinned_vmid_map);
> +	refcount_set(&kvm_vmid->pinned, 1);
> +
> +out_unlock:
> +	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> +
> +	vmid &= ~VMID_MASK;
> +
> +	return vmid;
> +}
> +
> +void kvm_arm_pinned_vmid_put(struct kvm_vmid *kvm_vmid)
> +{
> +	unsigned long flags;
> +	u64 vmid = atomic64_read(&kvm_vmid->id);
> +
> +	if (!pinned_vmid_map)
> +		return;
> +
> +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> +
> +	if (refcount_dec_and_test(&kvm_vmid->pinned)) {
> +		__clear_bit(vmid2idx(vmid), pinned_vmid_map);
> +		nr_pinned_vmids--;
> +	}
> +
> +	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> +}
> +
>  /*
>   * Initialize the VMID allocator
>   */
> @@ -191,10 +263,20 @@ int __init kvm_arm_vmid_alloc_init(void)
>  	if (!vmid_map)
>  		return -ENOMEM;
>  
> +	pinned_vmid_map = bitmap_zalloc(NUM_USER_VMIDS, GFP_KERNEL);
> +	nr_pinned_vmids = 0;
> +
> +	/*
> +	 * Ensure we have at least one emty slot available after rollover

typo: empty

-- 
Thanks,
Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 19:12             ` Oliver Upton
@ 2024-06-24 19:29               ` Sean Christopherson
  2024-06-24 19:51                 ` Oliver Upton
  2024-06-25  2:21                 ` Tian, Kevin
  0 siblings, 2 replies; 35+ messages in thread
From: Sean Christopherson @ 2024-06-24 19:29 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Jason Gunthorpe, Shameer Kolothum, kvmarm, iommu,
	linux-arm-kernel, linuxarm, kevin.tian, alex.williamson, maz,
	will, robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024, Oliver Upton wrote:
> On Mon, Jun 24, 2024 at 03:01:48PM -0300, Jason Gunthorpe wrote:
> > On Mon, Jun 24, 2024 at 10:54:37AM -0700, Sean Christopherson wrote:
> > > > > And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> > > > > suprise me if someone wanted to add cgroup integration, e.g. similar to the
> > > > > misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> > > > > is analagous to an ARM VMID).
> > > > 
> > > > Yeah, but if someone is using such a cgroup then I expect they will
> > > > also have an up to date VMM that doesn't trigger this VMID allocation
> > > > in the first place...
> > > 
> > > I suspect we're talking about two different things.  Either that, or I am really
> > > lost.
> > 
> > I mean KVM will have already allocated and charged the cgroup for it's
> > use of the VMID. The IOMMU side just has to match it, no second
> > allocation of a VMID.
> > 
> > We wouldn't charge a cgroup for iommu and kvm sharing the same vmid.
> 
> I think the concern remains that an operator may want to limit the blast
> radius of some runaway VMID allocation in a system. But you're right, a
> well-intentioned VMM should wind up with a single charge for all the
> stage-2's that used the VMID allocation.
> 
> > > > When a KVM is present then the iommu needs to adopt the VMID of KVM,
> > > > and that should have a mechanism to ensure the VMID is valid so long
> > > > as the IOMMU is using it (eg because the KVM FD is open)
> > > 
> > > Right, and that's what I'm referring to as "on-demand pinning".  For the IOMMU
> > > to adopt a KVM VMID, the VMID needs to be pinned (or KVM would need to notify
> > > the IOMMU every time the VMID changed), i.e. every KVM+IOMMU pair pins a VMID
> > > that is managed by KVM.
> > 
> > Ok, right, yes, the expectation is that KVM allocates a VMID at some
> > point and it stays fixed for the life of that kvm.
> > 
> > If KVM can change VMID on the fly then that is a further complication
> > :\

Ya, as written today, KVM doesn't assign a VMID when the VM is created, and instead
allocates VMIDs on-demand when a vCPU is run.

The KVM changes in this series allow "pinning" the currently assigned VMID, i.e.
tries to address that further complication.  But because of the on-demand
allocation, there might not be a currently assigned VMID for VM, or the VMID might
be stale, i.e. re-assigned to a different VM.

Thus, kvm_arm_pinned_vmid_get() can effectively trigger VMID allocations, and
thus cgroup charging and failure.

If I'm reading the ARM code correctly, the intent is to cycle through VMIDs as  
necessary so that it's possible for every actively running VM to have a VMID.
And maybe also to also minimize the number of TLB + I$ invalidations?

> > 
> > > Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.  Which
> > > seems odd.
> 
> This is bleeding a bit of implementation detail where VMID=0 is known to
> be reserved (thus invalid), it'd probably be better if the
> implementation actually just returned an error.

Oof, I assumed using VMID=0 just caused a loss of performance, but this makes
it sound like the IOMMU mappings will fault?

> VMID=0 is associated with the host's MMU context, which is relevant when
> running {n,h}VHE mode, as the VMID tags TLB entries even if stage-2
> translation is disabled (HCR_EL2.VM = 0).

Heh, I assumed VMID=0 is the host MMU, Intel and AMD have the same effective
behavior :-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs
  2024-06-24 19:22   ` Oliver Upton
@ 2024-06-24 19:34     ` Shameerali Kolothum Thodi
  2024-06-24 19:52       ` Oliver Upton
  0 siblings, 1 reply; 35+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-06-24 19:34 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, jgg@ziepe.ca, alex.williamson@redhat.com,
	maz@kernel.org, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron



> -----Original Message-----
> From: Oliver Upton <oliver.upton@linux.dev>
> Sent: Monday, June 24, 2024 8:22 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm-
> kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>;
> kevin.tian@intel.com; jgg@ziepe.ca; alex.williamson@redhat.com;
> maz@kernel.org; will@kernel.org; robin.murphy@arm.com; jean-
> philippe@linaro.org; Jonathan Cameron <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin
> VMIDs
> 
> Hi Shameer,
> 
> On Thu, Feb 08, 2024 at 03:18:32PM +0000, Shameer Kolothum wrote:
> 
> [...]
> 
> > +unsigned long kvm_arm_pinned_vmid_get(struct kvm_vmid *kvm_vmid)
> > +{
> > +	unsigned long flags;
> > +	u64 vmid;
> > +
> > +	if (!pinned_vmid_map)
> > +		return 0;
> 
> Like I'd mentioned over in the other thread, returning 0 for error
> conditions is rather confusing unless one is staring at the VMID
> allocator code.

Agree. It gives the impression VMID 0 is valid.

> 
> Can you rework this to return an actual error (e.g. -EINVAL) on failure?

Ok.

> 
> > +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> > +
> > +	vmid = atomic64_read(&kvm_vmid->id);
> > +
> > +	if (refcount_inc_not_zero(&kvm_vmid->pinned))
> > +		goto out_unlock;
> > +
> > +	if (nr_pinned_vmids >= max_pinned_vmids) {
> > +		vmid = 0;
> > +		goto out_unlock;
> > +	}
> 
> You need to decrement the refcount in this error path.

Hmm..do we? We are here means refcount is zero, right?

> 
> > +
> > +	/*
> > +	 * If we went through one or more rollover since that VMID was
> > +	 * used, make sure it is still valid, or generate a new one.
> > +	 */
> > +	if (!vmid_gen_match(vmid))
> > +		vmid = new_vmid(kvm_vmid);
> > +
> > +	nr_pinned_vmids++;
> > +	__set_bit(vmid2idx(vmid), pinned_vmid_map);
> > +	refcount_set(&kvm_vmid->pinned, 1);
> > +
> > +out_unlock:
> > +	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> > +
> > +	vmid &= ~VMID_MASK;
> > +
> > +	return vmid;
> > +}
> > +
> > +void kvm_arm_pinned_vmid_put(struct kvm_vmid *kvm_vmid)
> > +{
> > +	unsigned long flags;
> > +	u64 vmid = atomic64_read(&kvm_vmid->id);
> > +
> > +	if (!pinned_vmid_map)
> > +		return;
> > +
> > +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> > +
> > +	if (refcount_dec_and_test(&kvm_vmid->pinned)) {
> > +		__clear_bit(vmid2idx(vmid), pinned_vmid_map);
> > +		nr_pinned_vmids--;
> > +	}
> > +
> > +	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> > +}
> > +
> >  /*
> >   * Initialize the VMID allocator
> >   */
> > @@ -191,10 +263,20 @@ int __init kvm_arm_vmid_alloc_init(void)
> >  	if (!vmid_map)
> >  		return -ENOMEM;
> >
> > +	pinned_vmid_map = bitmap_zalloc(NUM_USER_VMIDS,
> GFP_KERNEL);
> > +	nr_pinned_vmids = 0;
> > +
> > +	/*
> > +	 * Ensure we have at least one emty slot available after rollover
> 
> typo: empty

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 19:29               ` Sean Christopherson
@ 2024-06-24 19:51                 ` Oliver Upton
  2024-06-24 22:35                   ` Jason Gunthorpe
  2024-06-25  2:21                 ` Tian, Kevin
  1 sibling, 1 reply; 35+ messages in thread
From: Oliver Upton @ 2024-06-24 19:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jason Gunthorpe, Shameer Kolothum, kvmarm, iommu,
	linux-arm-kernel, linuxarm, kevin.tian, alex.williamson, maz,
	will, robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024 at 12:29:07PM -0700, Sean Christopherson wrote:
> On Mon, Jun 24, 2024, Oliver Upton wrote:
> > On Mon, Jun 24, 2024 at 03:01:48PM -0300, Jason Gunthorpe wrote:
> > > If KVM can change VMID on the fly then that is a further complication
> > > :\
> 
> Ya, as written today, KVM doesn't assign a VMID when the VM is created, and instead
> allocates VMIDs on-demand when a vCPU is run.
> 
> The KVM changes in this series allow "pinning" the currently assigned VMID, i.e.
> tries to address that further complication.  But because of the on-demand
> allocation, there might not be a currently assigned VMID for VM, or the VMID might
> be stale, i.e. re-assigned to a different VM.
> 
> Thus, kvm_arm_pinned_vmid_get() can effectively trigger VMID allocations, and
> thus cgroup charging and failure.
> 
> If I'm reading the ARM code correctly, the intent is to cycle through VMIDs as  
> necessary so that it's possible for every actively running VM to have a VMID.
> And maybe also to also minimize the number of TLB + I$ invalidations?

The commentary about TLBIs + I$ invalidations is in relation to how
rollover is handled. The kernel's ASID allocator does some deferred
invalidation, the VMID allocator does some eager invalidation at rollover
because it is believed to be a rarer occurrence.

But generally speaking, it's what you expect, we have some structures in
hardware that use VMID to form a tag, and we wnat to avoid blasting them
if at all possible.

> > > 
> > > > Hmm, kvm_arm_pinned_vmid_get() doesn't fail, it just falls back to VMID=0.  Which
> > > > seems odd.
> > 
> > This is bleeding a bit of implementation detail where VMID=0 is known to
> > be reserved (thus invalid), it'd probably be better if the
> > implementation actually just returned an error.
> 
> Oof, I assumed using VMID=0 just caused a loss of performance, but this makes
> it sound like the IOMMU mappings will fault?

Bit worse than that :)

Having the SMMU participate in broadcast TLB maintenance means TLBI
instructions on the CPU invalidate translations in the SMMU, which is
useful for SVA usecases. However, if the host is using different VMIDs
for the CPU and SMMU, then guest TLBIs no longer match the guest's SMMU
context...

Pinning / sharing the VMID between CPU and SMMU is a hard requirement if
you advertise BTM support to the guest.

-- 
Thanks,
Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs
  2024-06-24 19:34     ` Shameerali Kolothum Thodi
@ 2024-06-24 19:52       ` Oliver Upton
  0 siblings, 0 replies; 35+ messages in thread
From: Oliver Upton @ 2024-06-24 19:52 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Linuxarm,
	kevin.tian@intel.com, jgg@ziepe.ca, alex.williamson@redhat.com,
	maz@kernel.org, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, Jonathan Cameron

On Mon, Jun 24, 2024 at 07:34:42PM +0000, Shameerali Kolothum Thodi wrote:
> > > +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> > > +
> > > +	vmid = atomic64_read(&kvm_vmid->id);
> > > +
> > > +	if (refcount_inc_not_zero(&kvm_vmid->pinned))
> > > +		goto out_unlock;
> > > +
> > > +	if (nr_pinned_vmids >= max_pinned_vmids) {
> > > +		vmid = 0;
> > > +		goto out_unlock;
> > > +	}
> > 
> > You need to decrement the refcount in this error path.
> 
> Hmm..do we? We are here means refcount is zero, right?

Durr, I need more caffeine. You're right.

-- 
Thanks,
Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 19:51                 ` Oliver Upton
@ 2024-06-24 22:35                   ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2024-06-24 22:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Shameer Kolothum, kvmarm, iommu,
	linux-arm-kernel, linuxarm, kevin.tian, alex.williamson, maz,
	will, robin.murphy, jean-philippe, jonathan.cameron

On Mon, Jun 24, 2024 at 12:51:34PM -0700, Oliver Upton wrote:

> Having the SMMU participate in broadcast TLB maintenance means TLBI
> instructions on the CPU invalidate translations in the SMMU, which is
> useful for SVA usecases. However, if the host is using different VMIDs
> for the CPU and SMMU, then guest TLBIs no longer match the guest's SMMU
> context...
> 
> Pinning / sharing the VMID between CPU and SMMU is a hard requirement if
> you advertise BTM support to the guest.

Yes, and the extra detail that the SMMU *must* always have a unique
VMID for the specific VM, we can't leave it in a state without one.

Meaning there is little purpose to changing the VMID at runtime as you
can't multiplex more VM's than you have IDs once the SMMU is using it
anyhow.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-02-08 15:42   ` Jason Gunthorpe
  2024-06-24 16:53     ` Sean Christopherson
@ 2024-06-25  1:53     ` Tian, Kevin
  1 sibling, 0 replies; 35+ messages in thread
From: Tian, Kevin @ 2024-06-25  1:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameer Kolothum, Sean Christopherson
  Cc: kvmarm@lists.linux.dev, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, linuxarm@huawei.com,
	alex.williamson@redhat.com, maz@kernel.org,
	oliver.upton@linux.dev, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, jonathan.cameron@huawei.com

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, February 8, 2024 11:42 PM
> 
> On Thu, Feb 08, 2024 at 03:18:34PM +0000, Shameer Kolothum wrote:
> 
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h
> b/drivers/iommu/iommufd/iommufd_private.h
> > index 991f864d1f9b..28ede82bb1a6 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -16,6 +16,7 @@ struct iommu_domain;
> >  struct iommu_group;
> >  struct iommu_option;
> >  struct iommufd_device;
> > +struct kvm;
> >
> >  struct iommufd_ctx {
> >  	struct file *file;
> > @@ -27,6 +28,8 @@ struct iommufd_ctx {
> >  	/* Compatibility with VFIO no iommu */
> >  	u8 no_iommu_mode;
> >  	struct iommufd_ioas *vfio_ioas;
> > +	/* Associated KVM pointer */
> > +	struct kvm *kvm;
> >  };
> 
> Associating the KVM with the entire iommufd is a big hammer, is this
> what we want to do?
> 
> I know it has to be linked to domain allocation and the coming
> "viommu" object, and it is already linked to VFIO.
> 
> It means we support one KVM per iommufd (which doesn't seem
> unreasonable, but also the first time we've had such a limitation)
> 
> The other option would be to pass in the kvm to the individual sub
> objects.
> 
> Kevin?
> 

I prefer to not imposing such limitation. Passing it in optionally
to the individual sub objects e.g. hwpt or viommu makes more
sense to me.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx
  2024-06-24 19:29               ` Sean Christopherson
  2024-06-24 19:51                 ` Oliver Upton
@ 2024-06-25  2:21                 ` Tian, Kevin
  1 sibling, 0 replies; 35+ messages in thread
From: Tian, Kevin @ 2024-06-25  2:21 UTC (permalink / raw)
  To: Sean Christopherson, Oliver Upton
  Cc: Jason Gunthorpe, Shameer Kolothum, kvmarm@lists.linux.dev,
	iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linuxarm@huawei.com, alex.williamson@redhat.com, maz@kernel.org,
	will@kernel.org, robin.murphy@arm.com, jean-philippe@linaro.org,
	jonathan.cameron@huawei.com

> From: Sean Christopherson <seanjc@google.com>
> Sent: Tuesday, June 25, 2024 3:29 AM
> 
> On Mon, Jun 24, 2024, Oliver Upton wrote:
> > VMID=0 is associated with the host's MMU context, which is relevant when
> > running {n,h}VHE mode, as the VMID tags TLB entries even if stage-2
> > translation is disabled (HCR_EL2.VM = 0).
> 
> Heh, I assumed VMID=0 is the host MMU, Intel and AMD have the same
> effective
> behavior :-)

side note - Intel VT-d spec recommends stage-1 translations using a
VMID different from those associated with stage-2, but not necessarily
to be 0. intel-iommu driver uses '1' so far:

/*
 * Domain ID reserved for pasid entries programmed for first-level
 * only and pass-through transfer modes.
 */
#define FLPT_DEFAULT_DID                1


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-06-25  2:21 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-08 15:18 [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Shameer Kolothum
2024-02-08 15:18 ` [RFC PATCH v2 1/7] KVM: Add generic infrastructure to support pinned VMIDs Shameer Kolothum
2024-06-24 15:48   ` Sean Christopherson
2024-02-08 15:18 ` [RFC PATCH v2 2/7] KVM: arm64: Introduce support to pin VMIDs Shameer Kolothum
2024-06-24 19:22   ` Oliver Upton
2024-06-24 19:34     ` Shameerali Kolothum Thodi
2024-06-24 19:52       ` Oliver Upton
2024-02-08 15:18 ` [RFC PATCH v2 3/7] KVM: arm64: Add interfaces for pinned VMID support Shameer Kolothum
2024-02-08 15:18 ` [RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx Shameer Kolothum
2024-02-08 15:42   ` Jason Gunthorpe
2024-06-24 16:53     ` Sean Christopherson
2024-06-24 17:07       ` Jason Gunthorpe
2024-06-24 17:54         ` Sean Christopherson
2024-06-24 18:01           ` Jason Gunthorpe
2024-06-24 19:12             ` Oliver Upton
2024-06-24 19:29               ` Sean Christopherson
2024-06-24 19:51                 ` Oliver Upton
2024-06-24 22:35                   ` Jason Gunthorpe
2024-06-25  2:21                 ` Tian, Kevin
2024-06-24 19:13           ` Shameerali Kolothum Thodi
2024-06-25  1:53     ` Tian, Kevin
2024-02-08 15:18 ` [RFC PATCH v2 5/7] iommu: Pass in kvm pointer to domain_alloc_user Shameer Kolothum
2024-02-08 15:43   ` Jason Gunthorpe
2024-02-08 15:18 ` [RFC PATCH v2 6/7] iommu/arm-smmu-v3: Use KVM VMID for s2 stage Shameer Kolothum
2024-02-08 15:59   ` Jason Gunthorpe
2024-02-08 16:14     ` Shameerali Kolothum Thodi
2024-02-08 16:26       ` Jason Gunthorpe
2024-02-08 16:36         ` Shameerali Kolothum Thodi
2024-02-08 15:18 ` [RFC PATCH v2 7/7] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Shameer Kolothum
2024-02-08 15:35 ` [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 Jason Gunthorpe
2024-02-08 15:49   ` Shameerali Kolothum Thodi
2024-02-08 16:07     ` Jason Gunthorpe
2024-02-09 11:58 ` Jean-Philippe Brucker
2024-02-09 12:48   ` Jason Gunthorpe
2024-02-09 13:54   ` Shameerali Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).