public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2)
@ 2023-10-20  9:32 Yi Liu
  2023-10-20  9:32 ` [PATCH v6 1/8] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

This is the first part to add Intel VT-d nested translation based on IOMMUFD
nesting infrastructure. As the iommufd nesting infrastructure series[1],
iommu core supports new ops to allocate domains with user data. For nesting,
the user data is vendor-specific, IOMMU_HWPT_DATA_VTD_S1 is defined for
the Intel VT-d stage-1 page table, it will be used in the stage-1 domain
allocation path. struct iommu_hwpt_vtd_s1 is defined to pass user_data
for the Intel VT-d stage-1 domain allocation. This series does not have
the cache invalidation path, it would be added in part 2/2.

The first Intel platform supporting nested translation is Sapphire
Rapids which, unfortunately, has a hardware errata [2] requiring special
treatment. This errata happens when a stage-1 page table page (either
level) is located in a stage-2 read-only region. In that case the IOMMU
hardware may ignore the stage-2 RO permission and still set the A/D bit
in stage-1 page table entries during page table walking.

A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report
this errata to userspace. With that restriction the user should either
disable nested translation to favor RO stage-2 mappings or ensure no
RO stage-2 mapping to enable nested translation.

Intel-iommu driver is armed with necessary checks to prevent such mix
in patch8 of this series.

Qemu currently does add RO mappings though. The vfio agent in Qemu
simply maps all valid regions in the GPA address space which certainly
includes RO regions e.g. vbios.

In reality we don't know a usage relying on DMA reads from the BIOS
region. Hence finding a way to skip RO regions (e.g. via a discard manager)
in Qemu might be an acceptable tradeoff. The actual change needs more
discussion in Qemu community. For now we just hacked Qemu to test.

Complete code can be found in [3], corresponding QEMU could can be found
in [4].

[1] https://lore.kernel.org/linux-iommu/20231020091946.12173-1-yi.l.liu@intel.com/
[2] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[4] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1

Change log:

v6:
 - Add Kevin's r-b for patch 1 and 8
 - Drop Kevin's r-b for patch 7
 - Address comments from Kevin
 - Split the VT-d nesting series into two parts 1/2 and 2/2

v5: https://lore.kernel.org/linux-iommu/20230921075431.125239-1-yi.l.liu@intel.com/
 - Add Kevin's r-b for patch 2, 3 ,5 8, 10
 - Drop enforce_cache_coherency callback from the nested type domain ops (Kevin)
 - Remove duplicate agaw check in patch 04 (Kevin)
 - Remove duplicate domain_update_iommu_cap() in patch 06 (Kevin)
 - Check parent's force_snooping to set pgsnp in the pasid entry (Kevin)
 - uapi data structure check (Kevin)
 - Simplify the errata handling as user can allocate nested parent domain

v4: https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.com/
 - Remove ascii art tables (Jason)
 - Drop EMT (Tina, Jason)
 - Drop MTS and related definitions (Kevin)
 - Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin)
 - Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin)
 - Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin)
 - Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin)
 - Do not trim the higher page levels of S2 domain in nested domain attachment as the
   S2 domain may have been used independently. (Kevin)
 - Remove the first-stage pgd check against the maximum address of s2_domain as hw
   can check it anyhow. It makes sense to check every pfns used in the stage-1 page
   table. But it cannot make it. So just leave it to hw. (Kevin)
 - Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin)
 - Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used
   as parent domain of a nested domain. This removes the nested_users counting. (Kevin)
 - Minor fix for "make htmldocs"

v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.com/
 - Further split the patches into an order of adding helpers for nested
   domain, iotlb flush, nested domain attachment and nested domain allocation
   callback, then report the hw_info to userspace.
 - Add batch support in cache invalidation from userspace
 - Disallow nested translation usage if RO mappings exists in stage-2 domain
   due to errata on readonly mappings on Sapphire Rapids platform.

v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/
 - The iommufd infrastructure is split to be separate series.

v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Lu Baolu (5):
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommu/vt-d: Add helper for nested domain allocation
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add nested domain allocation
  iommu/vt-d: Disallow read-only mappings to nest parent domain

Yi Liu (3):
  iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  iommu/vt-d: Make domain attach helpers to be extern
  iommu/vt-d: Set the nested domain to a device

 drivers/iommu/intel/Makefile |   2 +-
 drivers/iommu/intel/iommu.c  |  63 +++++++++++++-------
 drivers/iommu/intel/iommu.h  |  46 ++++++++++++--
 drivers/iommu/intel/nested.c | 109 ++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.c  | 112 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h  |   2 +
 include/uapi/linux/iommufd.h |  42 ++++++++++++-
 7 files changed, 348 insertions(+), 28 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v6 1/8] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20  9:32 ` [PATCH v6 2/8] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

This adds IOMMU_HWPT_DATA_VTD_S1 for stage-1 hw_pagetable of Intel
VT-d and the corressponding data structure for userspace specified parameter
for the domain allocation.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/uapi/linux/iommufd.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index fc305a48ab81..9b843a197ea8 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,12 +357,42 @@ enum iommufd_hwpt_alloc_flags {
 	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
 };
 
+/**
+ * enum iommu_hwpt_vtd_s1_flags - Intel VT-d stage-1 page table
+ *                                entry attributes
+ * @IOMMU_VTD_S1_SRE: Supervisor request
+ * @IOMMU_VTD_S1_EAFE: Extended access enable
+ * @IOMMU_VTD_S1_WPE: Write protect enable
+ */
+enum iommu_hwpt_vtd_s1_flags {
+	IOMMU_VTD_S1_SRE = 1 << 0,
+	IOMMU_VTD_S1_EAFE = 1 << 1,
+	IOMMU_VTD_S1_WPE = 1 << 2,
+};
+
+/**
+ * struct iommu_hwpt_vtd_s1 - Intel VT-d stage-1 page table
+ *                            info (IOMMU_HWPT_DATA_VTD_S1)
+ * @flags: Combination of enum iommu_hwpt_vtd_s1_flags
+ * @pgtbl_addr: The base address of the stage-1 page table.
+ * @addr_width: The address width of the stage-1 page table
+ * @__reserved: Must be 0
+ */
+struct iommu_hwpt_vtd_s1 {
+	__aligned_u64 flags;
+	__aligned_u64 pgtbl_addr;
+	__u32 addr_width;
+	__u32 __reserved;
+};
+
 /**
  * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
  * @IOMMU_HWPT_DATA_NONE: no data
+ * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
  */
 enum iommu_hwpt_data_type {
 	IOMMU_HWPT_DATA_NONE,
+	IOMMU_HWPT_DATA_VTD_S1,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 2/8] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
  2023-10-20  9:32 ` [PATCH v6 1/8] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20  9:32 ` [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation Yi Liu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

From: Lu Baolu <baolu.lu@linux.intel.com>

The nested domain fields are exclusive to those that used for a DMA
remapping domain. Use union to avoid memory waste.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.h | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index c18fb699c87a..a91077a063ee 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -25,6 +25,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <uapi/linux/iommufd.h>
 
 /*
  * VT-d hardware uses 4KiB page size regardless of host page size.
@@ -597,15 +598,38 @@ struct dmar_domain {
 	struct list_head devices;	/* all devices' list */
 	struct list_head dev_pasids;	/* all attached pasids */
 
-	struct dma_pte	*pgd;		/* virtual address */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
 	int		iommu_superpage;/* Level of superpages supported:
 					   0 == 4KiB (no superpages), 1 == 2MiB,
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
-	u64		max_addr;	/* maximum mapped address */
+	union {
+		/* DMA remapping domain */
+		struct {
+			/* virtual address */
+			struct dma_pte	*pgd;
+			/* max guest address width */
+			int		gaw;
+			/*
+			 * adjusted guest address width:
+			 *   0: level 2 30-bit
+			 *   1: level 3 39-bit
+			 *   2: level 4 48-bit
+			 *   3: level 5 57-bit
+			 */
+			int		agaw;
+			/* maximum mapped address */
+			u64		max_addr;
+		};
+
+		/* Nested user domain */
+		struct {
+			/* parent page table which the user domain is nested on */
+			struct dmar_domain *s2_domain;
+			/* user page table pointer (in GPA) */
+			unsigned long s1_pgtbl;
+			/* page table attributes */
+			struct iommu_hwpt_vtd_s1 s1_cfg;
+		};
+	};
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
  2023-10-20  9:32 ` [PATCH v6 1/8] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
  2023-10-20  9:32 ` [PATCH v6 2/8] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20 11:49   ` Baolu Lu
  2023-10-20  9:32 ` [PATCH v6 4/8] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds helper for accepting user parameters and allocate a nested
domain.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/Makefile |  2 +-
 drivers/iommu/intel/iommu.h  |  2 ++
 drivers/iommu/intel/nested.c | 55 ++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/intel/nested.c

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 7af3b8a4f2a0..5dabf081a779 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index a91077a063ee..ff55184456dd 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -866,6 +866,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
 void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const struct iommu_user_data *user_data);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
new file mode 100644
index 000000000000..5a2920a98e47
--- /dev/null
+++ b/drivers/iommu/intel/nested.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * nested.c - nested mode translation support
+ *
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ *         Jacob Pan <jacob.jun.pan@linux.intel.com>
+ *         Yi Liu <yi.l.liu@intel.com>
+ */
+
+#define pr_fmt(fmt)	"DMAR: " fmt
+
+#include <linux/iommu.h>
+
+#include "iommu.h"
+
+static void intel_nested_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.free			= intel_nested_domain_free,
+};
+
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const struct iommu_user_data *user_data)
+{
+	struct iommu_hwpt_vtd_s1 vtd;
+	struct dmar_domain *domain;
+	int ret;
+
+	ret = iommu_copy_struct_from_user(&vtd, user_data,
+					  IOMMU_HWPT_DATA_VTD_S1, __reserved);
+	if (ret)
+		return ERR_PTR(ret);
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
+	if (!domain)
+		return NULL;
+
+	domain->use_first_level = true;
+	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s1_pgtbl = vtd.pgtbl_addr;
+	domain->s1_cfg = vtd;
+	domain->domain.ops = &intel_nested_domain_ops;
+	domain->domain.type = IOMMU_DOMAIN_NESTED;
+	INIT_LIST_HEAD(&domain->devices);
+	INIT_LIST_HEAD(&domain->dev_pasids);
+	spin_lock_init(&domain->lock);
+	xa_init(&domain->iommu_array);
+
+	return &domain->domain;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 4/8] iommu/vt-d: Add helper to setup pasid nested translation
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
                   ` (2 preceding siblings ...)
  2023-10-20  9:32 ` [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20  9:32 ` [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/pasid.c | 112 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   2 +
 2 files changed, 114 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 8f92b92f3d2a..fbf71f56653d 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -335,6 +335,15 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[0], 1 << 1, 0);
 }
 
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
 /*
  * Setup the WPE(Write Protect Enable) field (Bit 132) of a
  * scalable mode PASID entry.
@@ -402,6 +411,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, u32 pasid)
@@ -713,3 +731,97 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 	if (!cap_caching_mode(iommu->cap))
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * @iommu:      IOMMU which the device belong to
+ * @dev:        Device to be set up for translation
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @domain:     User stage-1 domain nested on a stage-2 domain
+ *
+ * This is used for nested translation. The input domain should be
+ * nested type and nested on a parent with 'is_nested_parent' flag
+ * set.
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
+	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dma_pte *pgd = s2_domain->pgd;
+	struct pasid_entry *pte;
+
+	/* Address width should match the address width supported by hardware */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+	case ADDR_WIDTH_5LEVEL:
+		if (!cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+	default:
+		dev_err_ratelimited(dev, "Invalid stage-1 address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+	if (pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EBUSY;
+	}
+
+	pasid_clear_entry(pte);
+
+	if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+		pasid_set_flpm(pte, 1);
+
+	pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_SRE) {
+		pasid_set_sre(pte);
+		if (s1_cfg->flags & IOMMU_VTD_S1_WPE)
+			pasid_set_wpe(pte);
+	}
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
+		pasid_set_eafe(pte);
+
+	if (s2_domain->force_snooping)
+		pasid_set_pgsnp(pte);
+
+	pasid_set_slptr(pte, virt_to_phys(pgd));
+	pasid_set_fault_enable(pte);
+	pasid_set_domain_id(pte, did);
+	pasid_set_address_width(pte, s2_domain->agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	spin_unlock(&iommu->lock);
+
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 4e9e68c3c388..7906d73f4ded 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -109,6 +109,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
                   ` (3 preceding siblings ...)
  2023-10-20  9:32 ` [PATCH v6 4/8] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20 12:20   ` Baolu Lu
  2023-10-20  9:32 ` [PATCH v6 6/8] iommu/vt-d: Set the nested domain to a device Yi Liu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

This makes the helpers visible to nested.c.

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 15 ++++++---------
 drivers/iommu/intel/iommu.h |  7 +++++++
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8d78149e793e..b47025fbdea4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -282,7 +282,6 @@ static LIST_HEAD(dmar_satc_units);
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
 
-static void device_block_translation(struct device *dev);
 static void intel_iommu_domain_free(struct iommu_domain *domain);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -560,7 +559,7 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 }
 
 /* Some capabilities may be different across iommus */
-static void domain_update_iommu_cap(struct dmar_domain *domain)
+void domain_update_iommu_cap(struct dmar_domain *domain)
 {
 	domain_update_iommu_coherency(domain);
 	domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
@@ -1778,8 +1777,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 	return domain;
 }
 
-static int domain_attach_iommu(struct dmar_domain *domain,
-			       struct intel_iommu *iommu)
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info, *curr;
 	unsigned long ndomains;
@@ -1828,8 +1826,7 @@ static int domain_attach_iommu(struct dmar_domain *domain,
 	return ret;
 }
 
-static void domain_detach_iommu(struct dmar_domain *domain,
-				struct intel_iommu *iommu)
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info;
 
@@ -3974,7 +3971,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
  * all DMA requests without PASID from the device are blocked. If the page
  * table has been set, clean up the data structures.
  */
-static void device_block_translation(struct device *dev)
+void device_block_translation(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
@@ -4109,8 +4106,8 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 		domain_exit(to_dmar_domain(domain));
 }
 
-static int prepare_domain_attach_device(struct iommu_domain *domain,
-					struct device *dev)
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index ff55184456dd..b4560983b8b9 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -860,6 +860,13 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
  */
 #define QI_OPT_WAIT_DRAIN		BIT(0)
 
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void device_block_translation(struct device *dev);
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev);
+void domain_update_iommu_cap(struct dmar_domain *domain);
+
 int dmar_ir_support(void);
 
 void *alloc_pgtable_page(int node, gfp_t gfp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 6/8] iommu/vt-d: Set the nested domain to a device
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
                   ` (4 preceding siblings ...)
  2023-10-20  9:32 ` [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-20  9:32 ` [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation Yi Liu
  2023-10-20  9:32 ` [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain Yi Liu
  7 siblings, 0 replies; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng, Jacob Pan

This adds the helper for setting the nested domain to a device hence
enable nested domain usage on Intel VT-d.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 5a2920a98e47..19538fb616db 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -12,8 +12,61 @@
 #define pr_fmt(fmt)	"DMAR: " fmt
 
 #include <linux/iommu.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
 
 #include "iommu.h"
+#include "pasid.h"
+
+static int intel_nested_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long flags;
+	int ret = 0;
+
+	if (info->domain)
+		device_block_translation(dev);
+
+	if (iommu->agaw < dmar_domain->s2_domain->agaw) {
+		dev_err_ratelimited(dev, "Adjusted guest address width not compatible\n");
+		return -ENODEV;
+	}
+
+	/*
+	 * Stage-1 domain cannot work alone, it is nested on a s2_domain.
+	 * The s2_domain will be used in nested translation, hence needs
+	 * to ensure the s2_domain is compatible with this IOMMU.
+	 */
+	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+	if (ret) {
+		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
+		return ret;
+	}
+
+	ret = domain_attach_iommu(dmar_domain, iommu);
+	if (ret) {
+		dev_err_ratelimited(dev, "Failed to attach domain to iommu\n");
+		return ret;
+	}
+
+	ret = intel_pasid_setup_nested(iommu, dev,
+				       IOMMU_NO_PASID, dmar_domain);
+	if (ret) {
+		domain_detach_iommu(dmar_domain, iommu);
+		dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
+		return ret;
+	}
+
+	info->domain = dmar_domain;
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&info->link, &dmar_domain->devices);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+
+	return 0;
+}
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
 {
@@ -21,6 +74,7 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 }
 
 static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.attach_dev		= intel_nested_attach_dev,
 	.free			= intel_nested_domain_free,
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
                   ` (5 preceding siblings ...)
  2023-10-20  9:32 ` [PATCH v6 6/8] iommu/vt-d: Set the nested domain to a device Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-21  3:07   ` Baolu Lu
  2023-10-20  9:32 ` [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain Yi Liu
  7 siblings, 1 reply; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds the support for IOMMU_HWPT_DATA_VTD_S1 type.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 39 ++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b47025fbdea4..c7704e7efd4a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4076,7 +4076,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 			      struct iommu_domain *parent,
 			      const struct iommu_user_data *user_data)
 {
-	struct iommu_domain *domain;
+	bool request_nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
 	struct intel_iommu *iommu;
 
 	if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT))
@@ -4086,18 +4086,35 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 	if (!iommu)
 		return ERR_PTR(-ENODEV);
 
-	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
+	if (!user_data) { /* Must be PAGING domain */
+		struct iommu_domain *domain;
+
+		if (request_nest_parent && !ecap_nest(iommu->ecap))
+			return ERR_PTR(-EOPNOTSUPP);
+		if (parent)
+			return ERR_PTR(-EINVAL);
+		/*
+		 * domain_alloc_user op needs to fully initialize a domain
+		 * before return, so uses iommu_domain_alloc() here for
+		 * simple.
+		 */
+		domain = iommu_domain_alloc(dev->bus);
+		if (!domain)
+			return ERR_PTR(-ENOMEM);
+		return domain;
+	}
+
+	/* Must be nested domain */
+	if (!ecap_nest(iommu->ecap))
+		return ERR_PTR(-EOPNOTSUPP);
+	if (user_data->type != IOMMU_HWPT_DATA_VTD_S1)
 		return ERR_PTR(-EOPNOTSUPP);
+	if (!parent || parent->ops != intel_iommu_ops.default_domain_ops)
+		return ERR_PTR(-EINVAL);
+	if (request_nest_parent)
+		return ERR_PTR(-EINVAL);
 
-	/*
-	 * domain_alloc_user op needs to fully initialize a domain
-	 * before return, so uses iommu_domain_alloc() here for
-	 * simple.
-	 */
-	domain = iommu_domain_alloc(dev->bus);
-	if (!domain)
-		domain = ERR_PTR(-ENOMEM);
-	return domain;
+	return intel_nested_domain_alloc(parent, user_data);
 }
 
 static void intel_iommu_domain_free(struct iommu_domain *domain)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain
  2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
                   ` (6 preceding siblings ...)
  2023-10-20  9:32 ` [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation Yi Liu
@ 2023-10-20  9:32 ` Yi Liu
  2023-10-21  3:24   ` Baolu Lu
  7 siblings, 1 reply; 18+ messages in thread
From: Yi Liu @ 2023-10-20  9:32 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins, xin.zeng

From: Lu Baolu <baolu.lu@linux.intel.com>

When remapping hardware is configured by system software in scalable mode
as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
in first-stage page-table entries even when second-stage mappings indicate
that corresponding first-stage page-table is Read-Only.

As the result, contents of pages designated by VMM as Read-Only can be
modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
address translation process due to DMAs issued by Guest.

This disallows read-only mappings in the domain that is supposed to be used
as nested parent. Reference from Sapphire Rapids Specification Update [1],
errata details, SPR17. Userspace should know this limitation by checking
the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO
ioctl.

[1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  |  9 +++++++++
 drivers/iommu/intel/iommu.h  |  1 +
 include/uapi/linux/iommufd.h | 12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c7704e7efd4a..a0341a069fbf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2193,6 +2193,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
 		return -EINVAL;
 
+	if (!(prot & DMA_PTE_WRITE) && domain->is_nested_parent) {
+		pr_err_ratelimited("Read-only mapping is disallowed on the domain which serves as the parent in a nested configuration, due to HW errata (ERRATA_772415_SPR17)\n");
+		return -EINVAL;
+	}
+
 	attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
 	attr |= DMA_FL_PTE_PRESENT;
 	if (domain->use_first_level) {
@@ -4101,6 +4106,9 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 		domain = iommu_domain_alloc(dev->bus);
 		if (!domain)
 			return ERR_PTR(-ENOMEM);
+		container_of(domain,
+			     struct dmar_domain,
+			     domain)->is_nested_parent = request_nest_parent;
 		return domain;
 	}
 
@@ -4839,6 +4847,7 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length, u32 *type)
 	if (!vtd)
 		return ERR_PTR(-ENOMEM);
 
+	vtd->flags = IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17;
 	vtd->cap_reg = iommu->cap;
 	vtd->ecap_reg = iommu->ecap;
 	*length = sizeof(*vtd);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b4560983b8b9..0539a0f47557 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -593,6 +593,7 @@ struct dmar_domain {
 					 * otherwise, goes through the second
 					 * level.
 					 */
+	u8 is_nested_parent:1;		/* has other domains nested on it */
 
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 9b843a197ea8..c8f523a7bc06 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -439,10 +439,20 @@ struct iommu_hwpt_alloc {
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
+/**
+ * enum iommu_hw_info_vtd_flags - Flags for VT-d hw_info
+ * @IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17: If set, disallow nesting on domains
+ *                                   with read-only mapping.
+ *                                   https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
+ */
+enum iommu_hw_info_vtd_flags {
+	IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 = 1 << 0,
+};
+
 /**
  * struct iommu_hw_info_vtd - Intel VT-d hardware information
  *
- * @flags: Must be 0
+ * @flags: Combination of enum iommu_hw_info_vtd_flags
  * @__reserved: Must be 0
  *
  * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation
  2023-10-20  9:32 ` [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation Yi Liu
@ 2023-10-20 11:49   ` Baolu Lu
  2023-10-23 11:00     ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Baolu Lu @ 2023-10-20 11:49 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins,
	xin.zeng, Jacob Pan

On 2023/10/20 17:32, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> This adds helper for accepting user parameters and allocate a nested
> domain.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/Makefile |  2 +-
>   drivers/iommu/intel/iommu.h  |  2 ++
>   drivers/iommu/intel/nested.c | 55 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 58 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/iommu/intel/nested.c
> 
> diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
> index 7af3b8a4f2a0..5dabf081a779 100644
> --- a/drivers/iommu/intel/Makefile
> +++ b/drivers/iommu/intel/Makefile
> @@ -1,6 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_DMAR_TABLE) += dmar.o
> -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
> +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
>   obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
>   obj-$(CONFIG_DMAR_PERF) += perf.o
>   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index a91077a063ee..ff55184456dd 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -866,6 +866,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
>   void free_pgtable_page(void *vaddr);
>   void iommu_flush_write_buffer(struct intel_iommu *iommu);
>   struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
> +					       const struct iommu_user_data *user_data);
>   
>   #ifdef CONFIG_INTEL_IOMMU_SVM
>   void intel_svm_check(struct intel_iommu *iommu);
> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> new file mode 100644
> index 000000000000..5a2920a98e47
> --- /dev/null
> +++ b/drivers/iommu/intel/nested.c
> @@ -0,0 +1,55 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * nested.c - nested mode translation support
> + *
> + * Copyright (C) 2023 Intel Corporation
> + *
> + * Author: Lu Baolu <baolu.lu@linux.intel.com>
> + *         Jacob Pan <jacob.jun.pan@linux.intel.com>
> + *         Yi Liu <yi.l.liu@intel.com>
> + */
> +
> +#define pr_fmt(fmt)	"DMAR: " fmt
> +
> +#include <linux/iommu.h>
> +
> +#include "iommu.h"
> +
> +static void intel_nested_domain_free(struct iommu_domain *domain)
> +{
> +	kfree(to_dmar_domain(domain));
> +}
> +
> +static const struct iommu_domain_ops intel_nested_domain_ops = {
> +	.free			= intel_nested_domain_free,
> +};
> +
> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
> +					       const struct iommu_user_data *user_data)
> +{
> +	struct iommu_hwpt_vtd_s1 vtd;
> +	struct dmar_domain *domain;
> +	int ret;
> +
> +	ret = iommu_copy_struct_from_user(&vtd, user_data,
> +					  IOMMU_HWPT_DATA_VTD_S1, __reserved);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
> +	if (!domain)
> +		return NULL;

	return ERR_PTR(-ENOMEM);
?

> +
> +	domain->use_first_level = true;
> +	domain->s2_domain = to_dmar_domain(s2_domain);
> +	domain->s1_pgtbl = vtd.pgtbl_addr;
> +	domain->s1_cfg = vtd;
> +	domain->domain.ops = &intel_nested_domain_ops;
> +	domain->domain.type = IOMMU_DOMAIN_NESTED;
> +	INIT_LIST_HEAD(&domain->devices);
> +	INIT_LIST_HEAD(&domain->dev_pasids);
> +	spin_lock_init(&domain->lock);
> +	xa_init(&domain->iommu_array);
> +
> +	return &domain->domain;
> +}

Best regards,
baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern
  2023-10-20  9:32 ` [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
@ 2023-10-20 12:20   ` Baolu Lu
  0 siblings, 0 replies; 18+ messages in thread
From: Baolu Lu @ 2023-10-20 12:20 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins,
	xin.zeng

On 2023/10/20 17:32, Yi Liu wrote:
> This makes the helpers visible to nested.c.
> 
> Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 15 ++++++---------
>   drivers/iommu/intel/iommu.h |  7 +++++++
>   2 files changed, 13 insertions(+), 9 deletions(-)

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation
  2023-10-20  9:32 ` [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation Yi Liu
@ 2023-10-21  3:07   ` Baolu Lu
  2023-10-23 11:11     ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Baolu Lu @ 2023-10-21  3:07 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins,
	xin.zeng

On 10/20/23 5:32 PM, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> This adds the support for IOMMU_HWPT_DATA_VTD_S1 type.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 39 ++++++++++++++++++++++++++-----------
>   1 file changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index b47025fbdea4..c7704e7efd4a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4076,7 +4076,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
>   			      struct iommu_domain *parent,
>   			      const struct iommu_user_data *user_data)
>   {
> -	struct iommu_domain *domain;
> +	bool request_nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
>   	struct intel_iommu *iommu;
>   
>   	if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT))
> @@ -4086,18 +4086,35 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
>   	if (!iommu)
>   		return ERR_PTR(-ENODEV);
>   
> -	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
> +	if (!user_data) { /* Must be PAGING domain */
> +		struct iommu_domain *domain;
> +
> +		if (request_nest_parent && !ecap_nest(iommu->ecap))

Hardware capability is not sufficient. How about adding below helper:

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b5f33a7c1973..b04bbabcd696 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -540,6 +540,8 @@ enum {
  #define sm_supported(iommu)    (intel_iommu_sm && 
ecap_smts((iommu)->ecap))
  #define pasid_supported(iommu) (sm_supported(iommu) &&                 \
                                  ecap_pasid((iommu)->ecap))
+#define nested_supported(iommu)        (sm_supported(iommu) && 
        \
+                                ecap_nest((iommu)->ecap))

  struct pasid_entry;
  struct pasid_state_entry;

And, use nested_supported() here and bleow

> +			return ERR_PTR(-EOPNOTSUPP);
> +		if (parent)
> +			return ERR_PTR(-EINVAL);
> +		/*
> +		 * domain_alloc_user op needs to fully initialize a domain
> +		 * before return, so uses iommu_domain_alloc() here for
> +		 * simple.
> +		 */
> +		domain = iommu_domain_alloc(dev->bus);
> +		if (!domain)
> +			return ERR_PTR(-ENOMEM);
> +		return domain;
> +	}
> +
> +	/* Must be nested domain */
> +	if (!ecap_nest(iommu->ecap))

...here.

> +		return ERR_PTR(-EOPNOTSUPP);
> +	if (user_data->type != IOMMU_HWPT_DATA_VTD_S1)
>   		return ERR_PTR(-EOPNOTSUPP);
> +	if (!parent || parent->ops != intel_iommu_ops.default_domain_ops)
> +		return ERR_PTR(-EINVAL);
> +	if (request_nest_parent)
> +		return ERR_PTR(-EINVAL);
>   
> -	/*
> -	 * domain_alloc_user op needs to fully initialize a domain
> -	 * before return, so uses iommu_domain_alloc() here for
> -	 * simple.
> -	 */
> -	domain = iommu_domain_alloc(dev->bus);
> -	if (!domain)
> -		domain = ERR_PTR(-ENOMEM);
> -	return domain;
> +	return intel_nested_domain_alloc(parent, user_data);
>   }
>   
>   static void intel_iommu_domain_free(struct iommu_domain *domain)

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain
  2023-10-20  9:32 ` [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain Yi Liu
@ 2023-10-21  3:24   ` Baolu Lu
  2023-10-23 11:15     ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Baolu Lu @ 2023-10-21  3:24 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins,
	xin.zeng

On 10/20/23 5:32 PM, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> When remapping hardware is configured by system software in scalable mode
> as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
> it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
> in first-stage page-table entries even when second-stage mappings indicate
> that corresponding first-stage page-table is Read-Only.
> 
> As the result, contents of pages designated by VMM as Read-Only can be
> modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
> address translation process due to DMAs issued by Guest.
> 
> This disallows read-only mappings in the domain that is supposed to be used
> as nested parent. Reference from Sapphire Rapids Specification Update [1],
> errata details, SPR17. Userspace should know this limitation by checking
> the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO
> ioctl.
> 
> [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/iommu.c  |  9 +++++++++
>   drivers/iommu/intel/iommu.h  |  1 +
>   include/uapi/linux/iommufd.h | 12 +++++++++++-
>   3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index c7704e7efd4a..a0341a069fbf 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2193,6 +2193,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
>   	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
>   		return -EINVAL;
>   
> +	if (!(prot & DMA_PTE_WRITE) && domain->is_nested_parent) {
> +		pr_err_ratelimited("Read-only mapping is disallowed on the domain which serves as the parent in a nested configuration, due to HW errata (ERRATA_772415_SPR17)\n");
> +		return -EINVAL;
> +	}
> +
>   	attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
>   	attr |= DMA_FL_PTE_PRESENT;
>   	if (domain->use_first_level) {
> @@ -4101,6 +4106,9 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
>   		domain = iommu_domain_alloc(dev->bus);
>   		if (!domain)
>   			return ERR_PTR(-ENOMEM);
> +		container_of(domain,
> +			     struct dmar_domain,
> +			     domain)->is_nested_parent = request_nest_parent;

How about
		to_dmar_domain(domain)->is_nested_parent = ...;
?

I would also prefer to introduce is_nested_parent_domain to the user
domain allocation patch (patch 7/8). This field should be checked when
allocating a nested user domain.

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8f81a5c9fcc0..d3f6bc1f6590 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4121,6 +4121,8 @@ intel_iommu_domain_alloc_user(struct device *dev, 
u32 flags,
                 return ERR_PTR(-EINVAL);
         if (request_nest_parent)
                 return ERR_PTR(-EINVAL);
+       if (!to_dmar_domain(parent)->is_nested_parent)
+               return ERR_PTR(-EINVAL);

         return intel_nested_domain_alloc(parent, user_data);
  }

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation
  2023-10-20 11:49   ` Baolu Lu
@ 2023-10-23 11:00     ` Liu, Yi L
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2023-10-23 11:00 UTC (permalink / raw)
  To: Baolu Lu, joro@8bytes.org, alex.williamson@redhat.com,
	jgg@nvidia.com, Tian, Kevin, robin.murphy@arm.com
  Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Duan, Zhenzhong, Martins, Joao, Zeng, Xin, Jacob Pan

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, October 20, 2023 7:49 PM
> 
> On 2023/10/20 17:32, Yi Liu wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> >
> > This adds helper for accepting user parameters and allocate a nested
> > domain.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/intel/Makefile |  2 +-
> >   drivers/iommu/intel/iommu.h  |  2 ++
> >   drivers/iommu/intel/nested.c | 55 ++++++++++++++++++++++++++++++++++++
> >   3 files changed, 58 insertions(+), 1 deletion(-)
> >   create mode 100644 drivers/iommu/intel/nested.c
> >
> > diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
> > index 7af3b8a4f2a0..5dabf081a779 100644
> > --- a/drivers/iommu/intel/Makefile
> > +++ b/drivers/iommu/intel/Makefile
> > @@ -1,6 +1,6 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   obj-$(CONFIG_DMAR_TABLE) += dmar.o
> > -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
> > +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
> >   obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
> >   obj-$(CONFIG_DMAR_PERF) += perf.o
> >   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index a91077a063ee..ff55184456dd 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -866,6 +866,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
> >   void free_pgtable_page(void *vaddr);
> >   void iommu_flush_write_buffer(struct intel_iommu *iommu);
> >   struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
> > +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
> *s2_domain,
> > +					       const struct iommu_user_data *user_data);
> >
> >   #ifdef CONFIG_INTEL_IOMMU_SVM
> >   void intel_svm_check(struct intel_iommu *iommu);
> > diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> > new file mode 100644
> > index 000000000000..5a2920a98e47
> > --- /dev/null
> > +++ b/drivers/iommu/intel/nested.c
> > @@ -0,0 +1,55 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * nested.c - nested mode translation support
> > + *
> > + * Copyright (C) 2023 Intel Corporation
> > + *
> > + * Author: Lu Baolu <baolu.lu@linux.intel.com>
> > + *         Jacob Pan <jacob.jun.pan@linux.intel.com>
> > + *         Yi Liu <yi.l.liu@intel.com>
> > + */
> > +
> > +#define pr_fmt(fmt)	"DMAR: " fmt
> > +
> > +#include <linux/iommu.h>
> > +
> > +#include "iommu.h"
> > +
> > +static void intel_nested_domain_free(struct iommu_domain *domain)
> > +{
> > +	kfree(to_dmar_domain(domain));
> > +}
> > +
> > +static const struct iommu_domain_ops intel_nested_domain_ops = {
> > +	.free			= intel_nested_domain_free,
> > +};
> > +
> > +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
> *s2_domain,
> > +					       const struct iommu_user_data *user_data)
> > +{
> > +	struct iommu_hwpt_vtd_s1 vtd;
> > +	struct dmar_domain *domain;
> > +	int ret;
> > +
> > +	ret = iommu_copy_struct_from_user(&vtd, user_data,
> > +					  IOMMU_HWPT_DATA_VTD_S1, __reserved);
> > +	if (ret)
> > +		return ERR_PTR(ret);
> > +
> > +	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
> > +	if (!domain)
> > +		return NULL;
> 
> 	return ERR_PTR(-ENOMEM);
> ?

Yes, good catch!

> > +
> > +	domain->use_first_level = true;
> > +	domain->s2_domain = to_dmar_domain(s2_domain);
> > +	domain->s1_pgtbl = vtd.pgtbl_addr;
> > +	domain->s1_cfg = vtd;
> > +	domain->domain.ops = &intel_nested_domain_ops;
> > +	domain->domain.type = IOMMU_DOMAIN_NESTED;
> > +	INIT_LIST_HEAD(&domain->devices);
> > +	INIT_LIST_HEAD(&domain->dev_pasids);
> > +	spin_lock_init(&domain->lock);
> > +	xa_init(&domain->iommu_array);
> > +
> > +	return &domain->domain;
> > +}
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation
  2023-10-21  3:07   ` Baolu Lu
@ 2023-10-23 11:11     ` Liu, Yi L
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2023-10-23 11:11 UTC (permalink / raw)
  To: Baolu Lu, joro@8bytes.org, alex.williamson@redhat.com,
	jgg@nvidia.com, Tian, Kevin, robin.murphy@arm.com
  Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Duan, Zhenzhong, Martins, Joao, Zeng, Xin

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Saturday, October 21, 2023 11:08 AM
> 
> On 10/20/23 5:32 PM, Yi Liu wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> >
> > This adds the support for IOMMU_HWPT_DATA_VTD_S1 type.
> >
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/intel/iommu.c | 39 ++++++++++++++++++++++++++-----------
> >   1 file changed, 28 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index b47025fbdea4..c7704e7efd4a 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4076,7 +4076,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32
> flags,
> >   			      struct iommu_domain *parent,
> >   			      const struct iommu_user_data *user_data)
> >   {
> > -	struct iommu_domain *domain;
> > +	bool request_nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
> >   	struct intel_iommu *iommu;
> >
> >   	if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT))
> > @@ -4086,18 +4086,35 @@ intel_iommu_domain_alloc_user(struct device *dev, u32
> flags,
> >   	if (!iommu)
> >   		return ERR_PTR(-ENODEV);
> >
> > -	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu-
> >ecap))
> > +	if (!user_data) { /* Must be PAGING domain */
> > +		struct iommu_domain *domain;
> > +
> > +		if (request_nest_parent && !ecap_nest(iommu->ecap))
> 
> Hardware capability is not sufficient. How about adding below helper:

Yes. But the sm_supported() seems a bit confusing. Is it? actually, it's
a combined check of the sm capability and software choice (use sm or
not). How about renaming it to be sm_enabled()? Although the best
enable status check is to read the SM bit in root address register. No hurry
in this circle, but may be nice to have.

> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index b5f33a7c1973..b04bbabcd696 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -540,6 +540,8 @@ enum {
>   #define sm_supported(iommu)    (intel_iommu_sm &&
> ecap_smts((iommu)->ecap))
>   #define pasid_supported(iommu) (sm_supported(iommu) &&                 \
>                                   ecap_pasid((iommu)->ecap))
> +#define nested_supported(iommu)        (sm_supported(iommu) &&
>         \
> +                                ecap_nest((iommu)->ecap))
> 
>   struct pasid_entry;
>   struct pasid_state_entry;
> 
> And, use nested_supported() here and bleow
> 
> > +			return ERR_PTR(-EOPNOTSUPP);
> > +		if (parent)
> > +			return ERR_PTR(-EINVAL);
> > +		/*
> > +		 * domain_alloc_user op needs to fully initialize a domain
> > +		 * before return, so uses iommu_domain_alloc() here for
> > +		 * simple.
> > +		 */
> > +		domain = iommu_domain_alloc(dev->bus);
> > +		if (!domain)
> > +			return ERR_PTR(-ENOMEM);
> > +		return domain;
> > +	}
> > +
> > +	/* Must be nested domain */
> > +	if (!ecap_nest(iommu->ecap))
> 
> ...here.
> 
> > +		return ERR_PTR(-EOPNOTSUPP);
> > +	if (user_data->type != IOMMU_HWPT_DATA_VTD_S1)
> >   		return ERR_PTR(-EOPNOTSUPP);
> > +	if (!parent || parent->ops != intel_iommu_ops.default_domain_ops)
> > +		return ERR_PTR(-EINVAL);
> > +	if (request_nest_parent)
> > +		return ERR_PTR(-EINVAL);
> >
> > -	/*
> > -	 * domain_alloc_user op needs to fully initialize a domain
> > -	 * before return, so uses iommu_domain_alloc() here for
> > -	 * simple.
> > -	 */
> > -	domain = iommu_domain_alloc(dev->bus);
> > -	if (!domain)
> > -		domain = ERR_PTR(-ENOMEM);
> > -	return domain;
> > +	return intel_nested_domain_alloc(parent, user_data);
> >   }
> >
> >   static void intel_iommu_domain_free(struct iommu_domain *domain)
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain
  2023-10-21  3:24   ` Baolu Lu
@ 2023-10-23 11:15     ` Liu, Yi L
  2023-10-23 12:18       ` Baolu Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Liu, Yi L @ 2023-10-23 11:15 UTC (permalink / raw)
  To: Baolu Lu, joro@8bytes.org, alex.williamson@redhat.com,
	jgg@nvidia.com, Tian, Kevin, robin.murphy@arm.com
  Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Duan, Zhenzhong, Martins, Joao, Zeng, Xin

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Saturday, October 21, 2023 11:24 AM
> 
> On 10/20/23 5:32 PM, Yi Liu wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> >
> > When remapping hardware is configured by system software in scalable mode
> > as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
> > it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
> > in first-stage page-table entries even when second-stage mappings indicate
> > that corresponding first-stage page-table is Read-Only.
> >
> > As the result, contents of pages designated by VMM as Read-Only can be
> > modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
> > address translation process due to DMAs issued by Guest.
> >
> > This disallows read-only mappings in the domain that is supposed to be used
> > as nested parent. Reference from Sapphire Rapids Specification Update [1],
> > errata details, SPR17. Userspace should know this limitation by checking
> > the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the
> IOMMU_GET_HW_INFO
> > ioctl.
> >
> > [1] https://www.intel.com/content/www/us/en/content-details/772415/content-
> details.html
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/intel/iommu.c  |  9 +++++++++
> >   drivers/iommu/intel/iommu.h  |  1 +
> >   include/uapi/linux/iommufd.h | 12 +++++++++++-
> >   3 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index c7704e7efd4a..a0341a069fbf 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -2193,6 +2193,11 @@ __domain_mapping(struct dmar_domain *domain,
> unsigned long iov_pfn,
> >   	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
> >   		return -EINVAL;
> >
> > +	if (!(prot & DMA_PTE_WRITE) && domain->is_nested_parent) {
> > +		pr_err_ratelimited("Read-only mapping is disallowed on the domain
> which serves as the parent in a nested configuration, due to HW errata
> (ERRATA_772415_SPR17)\n");
> > +		return -EINVAL;
> > +	}
> > +
> >   	attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
> >   	attr |= DMA_FL_PTE_PRESENT;
> >   	if (domain->use_first_level) {
> > @@ -4101,6 +4106,9 @@ intel_iommu_domain_alloc_user(struct device *dev, u32
> flags,
> >   		domain = iommu_domain_alloc(dev->bus);
> >   		if (!domain)
> >   			return ERR_PTR(-ENOMEM);
> > +		container_of(domain,
> > +			     struct dmar_domain,
> > +			     domain)->is_nested_parent = request_nest_parent;
> 
> How about
> 		to_dmar_domain(domain)->is_nested_parent = ...;
> ?

Yes.

> 
> I would also prefer to introduce is_nested_parent_domain to the user
> domain allocation patch (patch 7/8). This field should be checked when
> allocating a nested user domain.

A ctually, no need. This should be a common check, so iommufd core already
has the check. So the parent should be a nest parent domain, otherwise already
returned in iommufd.

+	if (!parent->nest_parent)
+		return ERR_PTR(-EINVAL);

https://lore.kernel.org/linux-iommu/20231020091946.12173-8-yi.l.liu@intel.com/

> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 8f81a5c9fcc0..d3f6bc1f6590 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4121,6 +4121,8 @@ intel_iommu_domain_alloc_user(struct device *dev,
> u32 flags,
>                  return ERR_PTR(-EINVAL);
>          if (request_nest_parent)
>                  return ERR_PTR(-EINVAL);
> +       if (!to_dmar_domain(parent)->is_nested_parent)
> +               return ERR_PTR(-EINVAL);
> 
>          return intel_nested_domain_alloc(parent, user_data);
>   }
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain
  2023-10-23 11:15     ` Liu, Yi L
@ 2023-10-23 12:18       ` Baolu Lu
  2023-10-24  2:06         ` Liu, Yi L
  0 siblings, 1 reply; 18+ messages in thread
From: Baolu Lu @ 2023-10-23 12:18 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, alex.williamson@redhat.com,
	jgg@nvidia.com, Tian, Kevin, robin.murphy@arm.com
  Cc: baolu.lu, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Duan, Zhenzhong, Martins, Joao, Zeng, Xin

On 2023/10/23 19:15, Liu, Yi L wrote:
>> I would also prefer to introduce is_nested_parent_domain to the user
>> domain allocation patch (patch 7/8). This field should be checked when
>> allocating a nested user domain.
> A ctually, no need. This should be a common check, so iommufd core already
> has the check. So the parent should be a nest parent domain, otherwise already
> returned in iommufd.
> 
> +	if (!parent->nest_parent)
> +		return ERR_PTR(-EINVAL);

I know this will not cause errors in the code. But since you are
introducing is_parent property in the vt-d driver. The integrity of the
property should be ensured. In this way, it will make the code more
readable and maintainable.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain
  2023-10-23 12:18       ` Baolu Lu
@ 2023-10-24  2:06         ` Liu, Yi L
  0 siblings, 0 replies; 18+ messages in thread
From: Liu, Yi L @ 2023-10-24  2:06 UTC (permalink / raw)
  To: Baolu Lu, joro@8bytes.org, alex.williamson@redhat.com,
	jgg@nvidia.com, Tian, Kevin, robin.murphy@arm.com
  Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Duan, Zhenzhong, Martins, Joao, Zeng, Xin

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, October 23, 2023 8:18 PM
> 
> On 2023/10/23 19:15, Liu, Yi L wrote:
> >> I would also prefer to introduce is_nested_parent_domain to the user
> >> domain allocation patch (patch 7/8). This field should be checked when
> >> allocating a nested user domain.
> > A ctually, no need. This should be a common check, so iommufd core already
> > has the check. So the parent should be a nest parent domain, otherwise already
> > returned in iommufd.
> >
> > +	if (!parent->nest_parent)
> > +		return ERR_PTR(-EINVAL);
> 
> I know this will not cause errors in the code. But since you are
> introducing is_parent property in the vt-d driver. The integrity of the
> property should be ensured. In this way, it will make the code more
> readable and maintainable.

Ok, if consider it as a property, then it's fine. At first, I just want to
make it as a special flag for this errata. But we cannot predict if there
will be more nested parent special stuffs, then this flag is also needed.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-10-24  2:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-20  9:32 [PATCH v6 0/8] Add Intel VT-d nested translation (part 1/2) Yi Liu
2023-10-20  9:32 ` [PATCH v6 1/8] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
2023-10-20  9:32 ` [PATCH v6 2/8] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
2023-10-20  9:32 ` [PATCH v6 3/8] iommu/vt-d: Add helper for nested domain allocation Yi Liu
2023-10-20 11:49   ` Baolu Lu
2023-10-23 11:00     ` Liu, Yi L
2023-10-20  9:32 ` [PATCH v6 4/8] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
2023-10-20  9:32 ` [PATCH v6 5/8] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
2023-10-20 12:20   ` Baolu Lu
2023-10-20  9:32 ` [PATCH v6 6/8] iommu/vt-d: Set the nested domain to a device Yi Liu
2023-10-20  9:32 ` [PATCH v6 7/8] iommu/vt-d: Add nested domain allocation Yi Liu
2023-10-21  3:07   ` Baolu Lu
2023-10-23 11:11     ` Liu, Yi L
2023-10-20  9:32 ` [PATCH v6 8/8] iommu/vt-d: Disallow read-only mappings to nest parent domain Yi Liu
2023-10-21  3:24   ` Baolu Lu
2023-10-23 11:15     ` Liu, Yi L
2023-10-23 12:18       ` Baolu Lu
2023-10-24  2:06         ` Liu, Yi L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox