devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA
@ 2025-09-09 15:45 Shyam Saini
  2025-09-09 15:45 ` [PATCH v4 1/4] arm-smmu: move MSI_IOVA macro definitions Shyam Saini
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Shyam Saini @ 2025-09-09 15:45 UTC (permalink / raw)
  To: thierry.reding, robin.murphy, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

Hi,

Currently, the MSI_IOVA_BASE address is hard-coded to 0x80000000,
assuming that all platforms have this address available for MSI IOVA
reservation. However, this is not always the case, as some platforms
reserve this address for other purposes. Consequently, these platforms
cannot reserve the MSI_IOVA_BASE address for MSI.

There was an [1] attempt to fix this problem by passing the MSI IOVA
base as a kernel command line parameter.

The faulty iommu-addresses can be reserved in DTS and suitable MSI_IOVA
can be selected (Suggested by Jason), However, for individual PCI
devices of_iommu driver can't reserve iommu-addresses as these devices
doesn't inherit parent DTS properties.

There are at least 3 ways to reserve iommu-addresses for individual PCI
devices,
 - 1) By dynamically adding DTS nodes for individual PCI devices using
   [2] CONFIG_PCI_DYNAMIC_OF_NODES, this requires hardcoding PCI device
   IDs in DECLARE_PCI_FIXUP_FINAL

 - 2) By adding PCI devices nodes either in DTS or by modifying FDT at
   boot time in the firmware, eg [3] However, of_iommu driver doesn't
   seem to handle individual PCI devices, additionally this approach
   doesn't seem to much scalable for the complex PCI hierarchy

 - 3) By configuring PCI host controller DTS node for PCI device so that it
   can inherit iommu-addresses defined in the parent node.

This patch series aims to address both the problems described above: 
- select suitable MSI IOVA when the default MSI IOVA conflicts with
  reserved iommu addresses 
- Using 3) approach appropriately handle iommu-addresses for individual
  PCI devices during IOMMU configuration


This patch series accommodates platforms that do not have the default MSI
base address available for MSI reservation by implementing both solutions
described above.

[1]: https://lore.kernel.org/lkml/20200914181307.117792-1-vemegava@linux.microsoft.com/
[2]: https://lwn.net/Articles/939317/
[3]: https://elixir.bootlin.com/linux/v6.16/source/arch/arm64/boot/dts/nvidia/tegra186.dtsi#L1388

Thanks,
Shyam

---
v4:
- Add warn message to indicate MSI IOVA reservation issue
- Fix loop condition in arm_smmu_get_resv_regions()
- Refactor arm_smmu_get_resv_regions() by introducing iommu_set_sw_msi()
  helper function

v3:
- Drop "arm,smmu-faulty-msi-iova" property change
- Fix iommu driver device tree configuration for PCI devices
- Use "iommu-addresses" property to identify reserved MSI IOVA regions,
  and introduce an additional MSI_IOVA_BASE2 macro to select a suitable
  MSI IOVA base if the default overlaps with a reserved region (suggested by Jason)
Link: https://lore.kernel.org/linux-iommu/20250806215539.1240561-1-shyamsaini@linux.microsoft.com/

v2:
- add new dts property to hold faulty MSI IOVA and select appropriate
  MSI IOVA address
Link: https://lore.kernel.org/linux-iommu/20250410225030.2528385-1-shyamsaini@linux.microsoft.com/

v1:
Link: https://lore.kernel.org/linux-iommu/20250116232307.1436693-1-shyamsaini@linux.microsoft.com/

Shyam Saini (4):
  arm-smmu: move MSI_IOVA macro definitions
  iommu/of: fix device tree configuration for PCI devices
  arm-smmu: select suitable MSI IOVA
  drivers: iommu: refactor arm_smmu_get_resv_regions

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 --
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 16 ++------
 drivers/iommu/iommu.c                       | 37 ++++++++++++++++++
 drivers/iommu/of_iommu.c                    | 11 ++++++
 drivers/iommu/virtio-iommu.c                |  2 -
 include/linux/iommu.h                       | 42 +++++++++++++++++++++
 7 files changed, 96 insertions(+), 28 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/4] arm-smmu: move MSI_IOVA macro definitions
  2025-09-09 15:45 [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA Shyam Saini
@ 2025-09-09 15:45 ` Shyam Saini
  2025-09-09 15:45 ` [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices Shyam Saini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Shyam Saini @ 2025-09-09 15:45 UTC (permalink / raw)
  To: thierry.reding, robin.murphy, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

MSI_IOVA* are common among different iommu/smu drivers,
so move them to common iommu.h header file.

Suggested-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 3 ---
 drivers/iommu/virtio-iommu.c                | 2 --
 include/linux/iommu.h                       | 3 +++
 4 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ae23aacc38402..16f5856b4c0e2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -505,9 +505,6 @@ static inline unsigned int arm_smmu_cdtab_l2_idx(unsigned int ssid)
 #define ARM_SMMU_POLL_TIMEOUT_US	1000000 /* 1s! */
 #define ARM_SMMU_POLL_SPIN_COUNT	10
 
-#define MSI_IOVA_BASE			0x8000000
-#define MSI_IOVA_LENGTH			0x100000
-
 enum pri_resp {
 	PRI_RESP_DENY = 0,
 	PRI_RESP_FAIL = 1,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4ced4b5bee4df..4a07650911991 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -50,9 +50,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define MSI_IOVA_BASE			0x8000000
-#define MSI_IOVA_LENGTH			0x100000
-
 static int force_stage;
 module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index b39d6f134ab28..f3d830656cf29 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -24,8 +24,6 @@
 
 #include "dma-iommu.h"
 
-#define MSI_IOVA_BASE			0x8000000
-#define MSI_IOVA_LENGTH			0x100000
 
 #define VIOMMU_REQUEST_VQ		0
 #define VIOMMU_EVENT_VQ			1
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c30d12e16473d..09a35af5a545d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1554,6 +1554,9 @@ static inline void iommu_debugfs_setup(void) {}
 #endif
 
 #ifdef CONFIG_IOMMU_DMA
+#define MSI_IOVA_BASE        0x8000000
+#define MSI_IOVA_LENGTH      0x100000
+
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 #else /* CONFIG_IOMMU_DMA */
 static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices
  2025-09-09 15:45 [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA Shyam Saini
  2025-09-09 15:45 ` [PATCH v4 1/4] arm-smmu: move MSI_IOVA macro definitions Shyam Saini
@ 2025-09-09 15:45 ` Shyam Saini
  2025-09-24 17:44   ` Robin Murphy
  2025-09-09 15:45 ` [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA Shyam Saini
  2025-09-09 15:46 ` [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions Shyam Saini
  3 siblings, 1 reply; 17+ messages in thread
From: Shyam Saini @ 2025-09-09 15:45 UTC (permalink / raw)
  To: thierry.reding, robin.murphy, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

Individual PCI devices lack dedicated device tree nodes, but
their IOMMU configuration (including reserved IOVA regions) is often
defined at the PCI host controller level in the device tree. The
"iommu-addresses" property in reserved-memory nodes specifies IOVA
ranges that should be reserved for specific devices.

Currently, PCI devices cannot access these configurations because their
dev->of_node is NULL, preventing of_iommu_get_resv_regions() from
discovering reserved regions defined in the device tree.

There are at least 3 ways to reserve iommu-addresses for individual PCI
devices,
 - 1) By dynamically adding DTS nodes for individual PCI devices using
   [2] CONFIG_PCI_DYNAMIC_OF_NODES, this requires hardcoding PCI device
   IDs in DECLARE_PCI_FIXUP_FINAL

 - 2) By adding PCI devices nodes either in DTS or by modifying FDT at
   boot time in the firmware, eg [3] However, of_iommu driver doesn't
   seem to handle individual PCI devices, additionally this approach
   doesn't seem to much scalable for the complex PCI hierarchy

 - 3) By configuring PCI host controller DTS node for PCI device so
   that it can inherit iommu-addresses defined in the parent node.

This commit addresses the problem using approach 3) by assigning the
PCI host controller's device tree node to PCI devices during IOMMU
configuration, enabling them to inherit the host controller's device
tree properties. This allows PCI devices to properly discover and
reserve IOVA regions specified in the device tree.

Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
---
 drivers/iommu/of_iommu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 6b989a62def20..077482917e3e8 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -145,6 +145,17 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     of_pci_iommu_init, &info);
 		of_pci_check_device_ats(dev, master_np);
+
+		/*
+		 * For PCI devices, ensure the device's of_node points to the
+		 * PCI host controller's device tree node so that reserved regions
+		 * and other DT-specific IOMMU configuration can be found.
+		 * PCI devices typically don't have individual DT nodes, but
+		 * their configuration (including reserved regions) is defined
+		 * at the PCI host controller level.
+		 */
+		if (!err && master_np && !dev->of_node)
+			dev->of_node = of_node_get(master_np);
 	} else {
 		err = of_iommu_configure_device(master_np, dev, id);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-09 15:45 [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA Shyam Saini
  2025-09-09 15:45 ` [PATCH v4 1/4] arm-smmu: move MSI_IOVA macro definitions Shyam Saini
  2025-09-09 15:45 ` [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices Shyam Saini
@ 2025-09-09 15:45 ` Shyam Saini
  2025-09-18 16:49   ` Will Deacon
  2025-09-09 15:46 ` [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions Shyam Saini
  3 siblings, 1 reply; 17+ messages in thread
From: Shyam Saini @ 2025-09-09 15:45 UTC (permalink / raw)
  To: thierry.reding, robin.murphy, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

Currently ARM SMMU drivers hardcode PCI MSI IOVA address.
Not all the platform have same memory mappings and some platform
could have this address already being mapped for something else.
This can lead to collision and as a consequence the MSI IOVA addr
range is never reserved.

Fix this by reserving faulty IOVA range and selecting alternate MSI_IOVA
suitable for the intended platform.

Example of reserving faulty IOVA range for PCIE device in the DTS:

reserved-memory {
	#address-cells = <2>;
	#size-cells = <2>;
	faulty_iova: resv_faulty {
		iommu-addresses = <&pcieX 0x0 0x8000000 0x0 0x100000>;
	};
};

&pcieX {
	memory-region = <&faulty_iova>;
}

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++-----
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 27 ++++++++++++-----
 include/linux/iommu.h                       | 33 +++++++++++++++++++++
 3 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2a8b46b948f05..748a5513c5dbb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3642,17 +3642,32 @@ static int arm_smmu_of_xlate(struct device *dev,
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
-	struct iommu_resv_region *region;
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
-	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
-	if (!region)
-		return;
-
-	list_add_tail(&region->list, head);
+	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
 
 	iommu_dma_get_resv_regions(dev, head);
+
+	/*
+	 * Use the first msi_base that does not intersect with a platform
+	 * reserved region. The SW MSI base selection is entirely arbitrary.
+	 */
+	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
+		struct iommu_resv_region *region;
+
+		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
+			continue;
+
+		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
+						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
+		if (!region) {
+			pr_warn("IOMMU: Failed to reserve MSI IOVA: No suitable MSI IOVA range available");
+			return;
+		}
+
+		list_add_tail(&region->list, head);
+		return;
+	}
 }
 
 /*
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4a07650911991..84b74b8519386 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1600,17 +1600,30 @@ static int arm_smmu_of_xlate(struct device *dev,
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
-	struct iommu_resv_region *region;
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
-	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
-	if (!region)
-		return;
-
-	list_add_tail(&region->list, head);
+	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
 
 	iommu_dma_get_resv_regions(dev, head);
+
+	/*
+	 * Use the first msi_base that does not intersect with a platform
+	 * reserved region. The SW MSI base selection is entirely arbitrary.
+	 */
+	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
+		struct iommu_resv_region *region;
+
+		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
+			continue;
+
+		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
+						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
+		if (!region)
+			return;
+
+		list_add_tail(&region->list, head);
+		return;
+	}
 }
 
 static int arm_smmu_def_domain_type(struct device *dev)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 09a35af5a545d..ce9d008b91ab5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1555,14 +1555,47 @@ static inline void iommu_debugfs_setup(void) {}
 
 #ifdef CONFIG_IOMMU_DMA
 #define MSI_IOVA_BASE        0x8000000
+#define MSI_IOVA_BASE2       0xA0000000
 #define MSI_IOVA_LENGTH      0x100000
 
+/**
+ * resv_region_intersects - Check if address range overlaps with reserved regions
+ * @msi_base: Start address of the range to check
+ * @length: Length of the range to check
+ * @resv_region_list: List of reserved regions to check against
+ *
+ * Returns true if the specified address range overlaps with any reserved region
+ * in the list, false otherwise.
+ */
+static inline bool resv_region_intersects(phys_addr_t msi_base, size_t length,
+					  struct list_head *resv_region_list)
+{
+	struct iommu_resv_region *region;
+	phys_addr_t start, end, resv_region_end;
+
+	start = msi_base;
+	end = start + length - 1;
+	list_for_each_entry(region, resv_region_list, list) {
+		resv_region_end = region->start + region->length - 1;
+		if (!(start > resv_region_end || end < region->start))
+			return true; /* overlap detected */
+	}
+
+	return false;
+}
+
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 #else /* CONFIG_IOMMU_DMA */
 static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 {
 	return -ENODEV;
 }
+
+static inline bool resv_region_intersects(phys_addr_t msi_base, size_t length,
+					  struct list_head *resv_region_list)
+{
+	return false;
+}
 #endif	/* CONFIG_IOMMU_DMA */
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
  2025-09-09 15:45 [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA Shyam Saini
                   ` (2 preceding siblings ...)
  2025-09-09 15:45 ` [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA Shyam Saini
@ 2025-09-09 15:46 ` Shyam Saini
  2025-09-09 15:58   ` Jason Gunthorpe
  2025-09-13  0:23   ` kernel test robot
  3 siblings, 2 replies; 17+ messages in thread
From: Shyam Saini @ 2025-09-09 15:46 UTC (permalink / raw)
  To: thierry.reding, robin.murphy, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

Both ARM SMMU v2/v3 drivers have common set operations for
arm_smmu_get_resv_regions(), except iommu_dma_get_resv_regions()
call all other operations can be clubed into common code block.
So to avoid code duplication put common operations in a new helper
function iommu_set_sw_msi() and call this helper function from
arm_smmu_get_resv_regions() instead.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++-------------
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 24 ++-----------
 drivers/iommu/iommu.c                       | 37 +++++++++++++++++++++
 include/linux/iommu.h                       |  6 ++++
 4 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 748a5513c5dbb..f4bcd2a8133b6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3642,32 +3642,10 @@ static int arm_smmu_of_xlate(struct device *dev,
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
-	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
-
-	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
-
+	/* Consider reserved regions before setting MSI IOVA */
 	iommu_dma_get_resv_regions(dev, head);
 
-	/*
-	 * Use the first msi_base that does not intersect with a platform
-	 * reserved region. The SW MSI base selection is entirely arbitrary.
-	 */
-	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
-		struct iommu_resv_region *region;
-
-		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
-			continue;
-
-		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
-						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
-		if (!region) {
-			pr_warn("IOMMU: Failed to reserve MSI IOVA: No suitable MSI IOVA range available");
-			return;
-		}
-
-		list_add_tail(&region->list, head);
-		return;
-	}
+	iommu_set_sw_msi(head);
 }
 
 /*
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 84b74b8519386..2ede5d7d89a93 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1600,30 +1600,10 @@ static int arm_smmu_of_xlate(struct device *dev,
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
-	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
-
-	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
-
+	/* Consider reserved regions before setting MSI IOVA */
 	iommu_dma_get_resv_regions(dev, head);
 
-	/*
-	 * Use the first msi_base that does not intersect with a platform
-	 * reserved region. The SW MSI base selection is entirely arbitrary.
-	 */
-	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
-		struct iommu_resv_region *region;
-
-		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
-			continue;
-
-		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
-						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
-		if (!region)
-			return;
-
-		list_add_tail(&region->list, head);
-		return;
-	}
+	iommu_set_sw_msi(head);
 }
 
 static int arm_smmu_def_domain_type(struct device *dev)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 060ebe330ee16..4fff01616cb5e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3829,3 +3829,40 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
 	return ret;
 }
 #endif /* CONFIG_IRQ_MSI_IOMMU */
+
+#if IS_ENABLED(CONFIG_IOMMU_DMA)
+/*
+ * iommu_set_sw_msi - Set up software managed MSI IOVA region
+ * @head: List of reserved IOVA regions for a device
+ *
+ * This creates a software MSI region by selecting a non-conflicting
+ * MSI_IOVA base address.
+ */
+void iommu_set_sw_msi(struct list_head *head)
+{
+	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
+
+	/*
+	 * Use the first msi_base that does not intersect with a platform
+	 * reserved region. The SW MSI base selection is entirely arbitrary.
+	 */
+	for (int i = 0; i < ARRAY_SIZE(msi_bases); i++) {
+		struct iommu_resv_region *region;
+
+		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
+			continue;
+
+		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
+						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
+		if (!region) {
+			pr_warn("IOMMU: Failed to reserve MSI IOVA: No suitable MSI IOVA range available");
+			return;
+		}
+
+		list_add_tail(&region->list, head);
+		return;
+	}
+}
+#endif /* CONFIG_IOMMU_DMA */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ce9d008b91ab5..ac1708fc20ba9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1558,6 +1558,8 @@ static inline void iommu_debugfs_setup(void) {}
 #define MSI_IOVA_BASE2       0xA0000000
 #define MSI_IOVA_LENGTH      0x100000
 
+void iommu_set_sw_msi(struct list_head *head);
+
 /**
  * resv_region_intersects - Check if address range overlaps with reserved regions
  * @msi_base: Start address of the range to check
@@ -1596,6 +1598,10 @@ static inline bool resv_region_intersects(phys_addr_t msi_base, size_t length,
 {
 	return false;
 }
+
+static inline void iommu_set_sw_msi(struct list_head *head)
+{
+}
 #endif	/* CONFIG_IOMMU_DMA */
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
  2025-09-09 15:46 ` [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions Shyam Saini
@ 2025-09-09 15:58   ` Jason Gunthorpe
  2025-09-15 16:28     ` Shyam Saini
  2025-09-13  0:23   ` kernel test robot
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2025-09-09 15:58 UTC (permalink / raw)
  To: Shyam Saini
  Cc: thierry.reding, robin.murphy, robh, joro, iommu, linux-arm-kernel,
	devicetree, virtualization, will, jacob.pan, eric.auger, code,
	eahariha, vijayb, bboscaccy, saravanak, krzk+dt, conor+dt,
	lizhi.hou, clement.leger

On Tue, Sep 09, 2025 at 08:46:00AM -0700, Shyam Saini wrote:
> Both ARM SMMU v2/v3 drivers have common set operations for
> arm_smmu_get_resv_regions(), except iommu_dma_get_resv_regions()
> call all other operations can be clubed into common code block.
> So to avoid code duplication put common operations in a new helper
> function iommu_set_sw_msi() and call this helper function from
> arm_smmu_get_resv_regions() instead.
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++-------------
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       | 24 ++-----------
>  drivers/iommu/iommu.c                       | 37 +++++++++++++++++++++
>  include/linux/iommu.h                       |  6 ++++
>  4 files changed, 47 insertions(+), 46 deletions(-)

I would do this first, not after you made all the changes to both
drivers.

And then once you do it:

> -	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };

Can just be constants written here, nothing else in the kernel should
refer to them any more because they are dynamic. The only way to learn
the value is to read the IOMMU_RESV_SW_MSI. Thus don't write them out
in public headers to prevent mis-use.

I'm comfortable with this side from the iommu perspective once you get
agreement on the DT representation.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
  2025-09-09 15:46 ` [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions Shyam Saini
  2025-09-09 15:58   ` Jason Gunthorpe
@ 2025-09-13  0:23   ` kernel test robot
  1 sibling, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-09-13  0:23 UTC (permalink / raw)
  To: Shyam Saini, thierry.reding, robin.murphy, robh, joro, jgg
  Cc: oe-kbuild-all, iommu, linux-arm-kernel, devicetree,
	virtualization, will, jacob.pan, eric.auger, code, eahariha,
	vijayb, bboscaccy, saravanak, krzk+dt, conor+dt, lizhi.hou,
	clement.leger

Hi Shyam,

kernel test robot noticed the following build errors:

[auto build test ERROR on robh/for-next]
[also build test ERROR on linus/master v6.17-rc5 next-20250912]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shyam-Saini/arm-smmu-move-MSI_IOVA-macro-definitions/20250909-235141
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
patch link:    https://lore.kernel.org/r/20250909154600.910110-5-shyamsaini%40linux.microsoft.com
patch subject: [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
config: arm64-randconfig-002-20250913 (https://download.01.org/0day-ci/archive/20250913/202509130836.idXoj3z8-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 14.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250913/202509130836.idXoj3z8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509130836.idXoj3z8-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "iommu_set_sw_msi" [drivers/iommu/arm/arm-smmu/arm_smmu.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
  2025-09-09 15:58   ` Jason Gunthorpe
@ 2025-09-15 16:28     ` Shyam Saini
  2025-09-15 22:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Shyam Saini @ 2025-09-15 16:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: thierry.reding, robin.murphy, robh, joro, iommu, linux-arm-kernel,
	devicetree, virtualization, will, jacob.pan, eric.auger, code,
	eahariha, vijayb, bboscaccy, saravanak, krzk+dt, conor+dt,
	lizhi.hou, clement.leger

Hi Jason,

On 09 Sep 2025 12:58, Jason Gunthorpe wrote:
> On Tue, Sep 09, 2025 at 08:46:00AM -0700, Shyam Saini wrote:
> > Both ARM SMMU v2/v3 drivers have common set operations for
> > arm_smmu_get_resv_regions(), except iommu_dma_get_resv_regions()
> > call all other operations can be clubed into common code block.
> > So to avoid code duplication put common operations in a new helper
> > function iommu_set_sw_msi() and call this helper function from
> > arm_smmu_get_resv_regions() instead.
> > 
> > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> > Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++-------------
> >  drivers/iommu/arm/arm-smmu/arm-smmu.c       | 24 ++-----------
> >  drivers/iommu/iommu.c                       | 37 +++++++++++++++++++++
> >  include/linux/iommu.h                       |  6 ++++
> >  4 files changed, 47 insertions(+), 46 deletions(-)
> 
> I would do this first, not after you made all the changes to both
> drivers.
> 
> And then once you do it:
> 
> > -	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
> 
> Can just be constants written here, nothing else in the kernel should
> refer to them any more because they are dynamic. The only way to learn
> the value is to read the IOMMU_RESV_SW_MSI. Thus don't write them out
> in public headers to prevent mis-use.

I wil address these in the next version.
 
> I'm comfortable with this side from the iommu perspective once you get
> agreement on the DT representation.

We haven’t heard back from the DT folks or Thierry. Do you think we
should wait a few more days for their feedback, or go ahead and send
v5 as an RFC instead?

Thanks,
Shyam

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions
  2025-09-15 16:28     ` Shyam Saini
@ 2025-09-15 22:59       ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 22:59 UTC (permalink / raw)
  To: Shyam Saini
  Cc: thierry.reding, robin.murphy, robh, joro, iommu, linux-arm-kernel,
	devicetree, virtualization, will, jacob.pan, eric.auger, code,
	eahariha, vijayb, bboscaccy, saravanak, krzk+dt, conor+dt,
	lizhi.hou, clement.leger

On Mon, Sep 15, 2025 at 09:28:42AM -0700, Shyam Saini wrote:
> > I'm comfortable with this side from the iommu perspective once you get
> > agreement on the DT representation.
> 
> We haven’t heard back from the DT folks or Thierry. Do you think we
> should wait a few more days for their feedback, or go ahead and send
> v5 as an RFC instead?

It is about normal to wait about another week and then ping them for
feedback.

Don't send a RFC, you intend your work to be merged.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-09 15:45 ` [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA Shyam Saini
@ 2025-09-18 16:49   ` Will Deacon
  2025-09-18 22:43     ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2025-09-18 16:49 UTC (permalink / raw)
  To: Shyam Saini
  Cc: thierry.reding, robin.murphy, robh, joro, jgg, iommu,
	linux-arm-kernel, devicetree, virtualization, jacob.pan,
	eric.auger, code, eahariha, vijayb, bboscaccy, saravanak, krzk+dt,
	conor+dt, lizhi.hou, clement.leger

On Tue, Sep 09, 2025 at 08:45:59AM -0700, Shyam Saini wrote:
> Currently ARM SMMU drivers hardcode PCI MSI IOVA address.
> Not all the platform have same memory mappings and some platform
> could have this address already being mapped for something else.
> This can lead to collision and as a consequence the MSI IOVA addr
> range is never reserved.
> 
> Fix this by reserving faulty IOVA range and selecting alternate MSI_IOVA
> suitable for the intended platform.
> 
> Example of reserving faulty IOVA range for PCIE device in the DTS:
> 
> reserved-memory {
> 	#address-cells = <2>;
> 	#size-cells = <2>;
> 	faulty_iova: resv_faulty {
> 		iommu-addresses = <&pcieX 0x0 0x8000000 0x0 0x100000>;
> 	};
> };
> 
> &pcieX {
> 	memory-region = <&faulty_iova>;
> }
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++-----
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       | 27 ++++++++++++-----
>  include/linux/iommu.h                       | 33 +++++++++++++++++++++
>  3 files changed, 75 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2a8b46b948f05..748a5513c5dbb 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3642,17 +3642,32 @@ static int arm_smmu_of_xlate(struct device *dev,
>  static void arm_smmu_get_resv_regions(struct device *dev,
>  				      struct list_head *head)
>  {
> -	struct iommu_resv_region *region;
>  	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>  
> -	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> -					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
> -	if (!region)
> -		return;
> -
> -	list_add_tail(&region->list, head);
> +	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
>  
>  	iommu_dma_get_resv_regions(dev, head);
> +
> +	/*
> +	 * Use the first msi_base that does not intersect with a platform
> +	 * reserved region. The SW MSI base selection is entirely arbitrary.
> +	 */
> +	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
> +		struct iommu_resv_region *region;
> +
> +		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
> +			continue;
> +
> +		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
> +						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
> +		if (!region) {
> +			pr_warn("IOMMU: Failed to reserve MSI IOVA: No suitable MSI IOVA range available");
> +			return;
> +		}
> +
> +		list_add_tail(&region->list, head);
> +		return;
> +	}
>  }
>  
>  /*
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 4a07650911991..84b74b8519386 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1600,17 +1600,30 @@ static int arm_smmu_of_xlate(struct device *dev,
>  static void arm_smmu_get_resv_regions(struct device *dev,
>  				      struct list_head *head)
>  {
> -	struct iommu_resv_region *region;
>  	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>  
> -	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> -					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
> -	if (!region)
> -		return;
> -
> -	list_add_tail(&region->list, head);
> +	static const u64 msi_bases[] = { MSI_IOVA_BASE, MSI_IOVA_BASE2 };
>  
>  	iommu_dma_get_resv_regions(dev, head);
> +
> +	/*
> +	 * Use the first msi_base that does not intersect with a platform
> +	 * reserved region. The SW MSI base selection is entirely arbitrary.
> +	 */
> +	for (int i = 0; i != ARRAY_SIZE(msi_bases); i++) {
> +		struct iommu_resv_region *region;
> +
> +		if (resv_region_intersects(msi_bases[i], MSI_IOVA_LENGTH, head))
> +			continue;
> +
> +		region = iommu_alloc_resv_region(msi_bases[i], MSI_IOVA_LENGTH, prot,
> +						 IOMMU_RESV_SW_MSI, GFP_KERNEL);
> +		if (!region)
> +			return;
> +
> +		list_add_tail(&region->list, head);
> +		return;
> +	}

Given that we're walking over the reserved regions to see if we have a
collision with MSI_IOVA_BASE, why not allocate the base address
dynamically if we detect a collision rather than having yet another
hard-coded address which we can't guarantee won't be problematic in future?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-18 16:49   ` Will Deacon
@ 2025-09-18 22:43     ` Jason Gunthorpe
  2025-09-19  7:33       ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2025-09-18 22:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Shyam Saini, thierry.reding, robin.murphy, robh, joro, iommu,
	linux-arm-kernel, devicetree, virtualization, jacob.pan,
	eric.auger, code, eahariha, vijayb, bboscaccy, saravanak, krzk+dt,
	conor+dt, lizhi.hou, clement.leger

On Thu, Sep 18, 2025 at 05:49:39PM +0100, Will Deacon wrote:
> Given that we're walking over the reserved regions to see if we have a
> collision with MSI_IOVA_BASE, why not allocate the base address
> dynamically if we detect a collision rather than having yet another
> hard-coded address which we can't guarantee won't be problematic in future?

I'm nervous about this. Right now the  MSI_IOVA is actually UAPI and
things like VFIO/qmeu need to accomodate it's placement in their own
memory maps.

Nicolin has some patches on the iommufd side to let userspace select
the MSI address instead, but they are not done yet.

So, randomly picking an address sounds like a bad idea to me, it would
almost certainly unpredictably break qemu..

Hopefully by the time we need a 3rd one we have the userspace MSI
control and then we could make reasonably make the kernel
automatically select.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-18 22:43     ` Jason Gunthorpe
@ 2025-09-19  7:33       ` Will Deacon
  2025-09-19 12:08         ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2025-09-19  7:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shyam Saini, thierry.reding, robin.murphy, robh, joro, iommu,
	linux-arm-kernel, devicetree, virtualization, jacob.pan,
	eric.auger, code, eahariha, vijayb, bboscaccy, saravanak, krzk+dt,
	conor+dt, lizhi.hou, clement.leger

On Thu, Sep 18, 2025 at 07:43:22PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 18, 2025 at 05:49:39PM +0100, Will Deacon wrote:
> > Given that we're walking over the reserved regions to see if we have a
> > collision with MSI_IOVA_BASE, why not allocate the base address
> > dynamically if we detect a collision rather than having yet another
> > hard-coded address which we can't guarantee won't be problematic in future?
> 
> I'm nervous about this. Right now the  MSI_IOVA is actually UAPI and
> things like VFIO/qmeu need to accomodate it's placement in their own
> memory maps.

*shrug*

That's only the case for broken systems where the existing MSI_IOVA
can't be made to work. As far as I'm concerned, they get to keep the
pieces and will need to work on the userspace side. It's not like
MSI_IOVA2 is magically going to work (and I bet it won't be tested).

> Nicolin has some patches on the iommufd side to let userspace select
> the MSI address instead, but they are not done yet.

Maybe we should just wait for that? Carrying a temporary hack with ABI
implications to support broken hardware isn't particularly compelling
to me.

> So, randomly picking an address sounds like a bad idea to me, it would
> almost certainly unpredictably break qemu..

It's not really "random", in the sense that it would be deterministic
for a given set of reserved regions.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-19  7:33       ` Will Deacon
@ 2025-09-19 12:08         ` Jason Gunthorpe
  2025-09-23 15:56           ` Shyam Saini
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2025-09-19 12:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Shyam Saini, thierry.reding, robin.murphy, robh, joro, iommu,
	linux-arm-kernel, devicetree, virtualization, jacob.pan,
	eric.auger, code, eahariha, vijayb, bboscaccy, saravanak, krzk+dt,
	conor+dt, lizhi.hou, clement.leger

On Fri, Sep 19, 2025 at 08:33:23AM +0100, Will Deacon wrote:
> pieces and will need to work on the userspace side. It's not like
> MSI_IOVA2 is magically going to work (and I bet it won't be tested).

It could, if someone checks the default memory map a second constant
could be selected that works.

> > Nicolin has some patches on the iommufd side to let userspace select
> > the MSI address instead, but they are not done yet.
> 
> Maybe we should just wait for that? Carrying a temporary hack with ABI
> implications to support broken hardware isn't particularly compelling
> to me.

This patch would still be needed for kernel users.

Arguably the kernel users should just be using the iova allocator from
dma-iommu.c. This whole hard coded constant/sneaky uapi is just a hack
to make vfio work..

So maybe if the single constant doesn't work we could set some
indication that the caller must allocate the MSI iova, the kernel can
use the dma-iommu allocator and VFIO can just refuse to use the device
for now.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-19 12:08         ` Jason Gunthorpe
@ 2025-09-23 15:56           ` Shyam Saini
  2025-09-23 16:19             ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Shyam Saini @ 2025-09-23 15:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Will Deacon
  Cc: thierry.reding, robin.murphy, robh, joro, iommu, linux-arm-kernel,
	devicetree, virtualization, jacob.pan, eric.auger, code, eahariha,
	vijayb, bboscaccy, saravanak, krzk+dt, conor+dt, lizhi.hou,
	clement.leger

Hi Jason, Will,

On 19 Sep 2025 09:08, Jason Gunthorpe wrote:
> On Fri, Sep 19, 2025 at 08:33:23AM +0100, Will Deacon wrote:
> > pieces and will need to work on the userspace side. It's not like
> > MSI_IOVA2 is magically going to work (and I bet it won't be tested).
> 
> It could, if someone checks the default memory map a second constant
> could be selected that works.
> 
> > > Nicolin has some patches on the iommufd side to let userspace select
> > > the MSI address instead, but they are not done yet.
> > 
> > Maybe we should just wait for that? Carrying a temporary hack with ABI
> > implications to support broken hardware isn't particularly compelling
> > to me.
> 
> This patch would still be needed for kernel users.
> 
> Arguably the kernel users should just be using the iova allocator from
> dma-iommu.c. This whole hard coded constant/sneaky uapi is just a hack
> to make vfio work..
> 
> So maybe if the single constant doesn't work we could set some
> indication that the caller must allocate the MSI iova, the kernel can
> use the dma-iommu allocator and VFIO can just refuse to use the device
> for now.

So, are we settling on having two predefined MSI IOVA base constants,
and if both of those conflict with reserved regions on a given platform,
falling back to dynamic allocation via the IOVA allocator? Just checking
if that's the consensus we're reaching.

Thanks,
Shyam

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-23 15:56           ` Shyam Saini
@ 2025-09-23 16:19             ` Jason Gunthorpe
  2025-09-24 18:59               ` Robin Murphy
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2025-09-23 16:19 UTC (permalink / raw)
  To: Shyam Saini
  Cc: Will Deacon, thierry.reding, robin.murphy, robh, joro, iommu,
	linux-arm-kernel, devicetree, virtualization, jacob.pan,
	eric.auger, code, eahariha, vijayb, bboscaccy, saravanak, krzk+dt,
	conor+dt, lizhi.hou, clement.leger

On Tue, Sep 23, 2025 at 08:56:47AM -0700, Shyam Saini wrote:
> Hi Jason, Will,
> 
> On 19 Sep 2025 09:08, Jason Gunthorpe wrote:
> > On Fri, Sep 19, 2025 at 08:33:23AM +0100, Will Deacon wrote:
> > > pieces and will need to work on the userspace side. It's not like
> > > MSI_IOVA2 is magically going to work (and I bet it won't be tested).
> > 
> > It could, if someone checks the default memory map a second constant
> > could be selected that works.
> > 
> > > > Nicolin has some patches on the iommufd side to let userspace select
> > > > the MSI address instead, but they are not done yet.
> > > 
> > > Maybe we should just wait for that? Carrying a temporary hack with ABI
> > > implications to support broken hardware isn't particularly compelling
> > > to me.
> > 
> > This patch would still be needed for kernel users.
> > 
> > Arguably the kernel users should just be using the iova allocator from
> > dma-iommu.c. This whole hard coded constant/sneaky uapi is just a hack
> > to make vfio work..
> > 
> > So maybe if the single constant doesn't work we could set some
> > indication that the caller must allocate the MSI iova, the kernel can
> > use the dma-iommu allocator and VFIO can just refuse to use the device
> > for now.
> 
> So, are we settling on having two predefined MSI IOVA base constants,
> and if both of those conflict with reserved regions on a given platform,
> falling back to dynamic allocation via the IOVA allocator? Just checking
> if that's the consensus we're reaching.

I think Will is arguing against introducing a new constant..

Yesterday I was looking at the SW_MSI code again.. What specific
problem is it you have?

It looks to me like dma-iommu.c is already allocating MSI addresses
using its built in IOVA allocator. So if your DT is marking that space
reserved then it should Just Work right now as dma-iommu.c already
processes the reserved ranges and will allocate MSI addresses around
them?

The base value of the SW_MSI is only used by VFIO - are you trying to
use VFIO with this device, or have I misunderstood the dma-iommu.c
logic?

If it is only VFIO at issue then perhaps we should solve this by
completing the work Nicolin started to allow VFIO userspace to specify
the MSI Aperture?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices
  2025-09-09 15:45 ` [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices Shyam Saini
@ 2025-09-24 17:44   ` Robin Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Robin Murphy @ 2025-09-24 17:44 UTC (permalink / raw)
  To: Shyam Saini, thierry.reding, robh, joro, jgg
  Cc: iommu, linux-arm-kernel, devicetree, virtualization, will,
	jacob.pan, eric.auger, code, eahariha, vijayb, bboscaccy,
	saravanak, krzk+dt, conor+dt, lizhi.hou, clement.leger

On 2025-09-09 4:45 pm, Shyam Saini wrote:
> Individual PCI devices lack dedicated device tree nodes, but
> their IOMMU configuration (including reserved IOVA regions) is often
> defined at the PCI host controller level in the device tree. The
> "iommu-addresses" property in reserved-memory nodes specifies IOVA
> ranges that should be reserved for specific devices.
> 
> Currently, PCI devices cannot access these configurations because their
> dev->of_node is NULL, preventing of_iommu_get_resv_regions() from
> discovering reserved regions defined in the device tree.
> 
> There are at least 3 ways to reserve iommu-addresses for individual PCI
> devices,
>   - 1) By dynamically adding DTS nodes for individual PCI devices using
>     [2] CONFIG_PCI_DYNAMIC_OF_NODES, this requires hardcoding PCI device
>     IDs in DECLARE_PCI_FIXUP_FINAL
> 
>   - 2) By adding PCI devices nodes either in DTS or by modifying FDT at
>     boot time in the firmware, eg [3] However, of_iommu driver doesn't
>     seem to handle individual PCI devices, additionally this approach
>     doesn't seem to much scalable for the complex PCI hierarchy
> 
>   - 3) By configuring PCI host controller DTS node for PCI device so
>     that it can inherit iommu-addresses defined in the parent node.
> 
> This commit addresses the problem using approach 3) by assigning the
> PCI host controller's device tree node to PCI devices during IOMMU
> configuration, enabling them to inherit the host controller's device
> tree properties. This allows PCI devices to properly discover and
> reserve IOVA regions specified in the device tree.
> 
> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
> ---
>   drivers/iommu/of_iommu.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 6b989a62def20..077482917e3e8 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -145,6 +145,17 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>   		err = pci_for_each_dma_alias(to_pci_dev(dev),
>   					     of_pci_iommu_init, &info);
>   		of_pci_check_device_ats(dev, master_np);
> +
> +		/*
> +		 * For PCI devices, ensure the device's of_node points to the
> +		 * PCI host controller's device tree node so that reserved regions
> +		 * and other DT-specific IOMMU configuration can be found.
> +		 * PCI devices typically don't have individual DT nodes, but
> +		 * their configuration (including reserved regions) is defined
> +		 * at the PCI host controller level.
> +		 */
> +		if (!err && master_np && !dev->of_node)
> +			dev->of_node = of_node_get(master_np);

This is just wrong. Disregarding the fiddly aspects of node reuse that 
are completely ignored here, an endpoint device is not the host 
bridge/root complex device, so it is wildly inappropriate to associate 
one with the other's DT node and all its properties, resources, etc.

If it truly is the case that boot firmware has somehow "reserved" some 
small amount of *IOVA* address space for specific endpoints (but without 
any endpoint or SMMU configuration, given that those both get reset by 
VFIO?) then frankly it *should* populate the PCI hierarchy in DT so it 
can accurately and truthfully describe what it has done.

On the other hand, if as I suspect it is simply the case that the host 
bridge has limited windows into system *physical* address space, like 
plenty of other systems do, then just like those other systems that 
should be described as standard "dma-ranges" instead of trying to wave 
silly hacks about.

Thanks,
Robin.

>   	} else {
>   		err = of_iommu_configure_device(master_np, dev, id);
>   	}


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA
  2025-09-23 16:19             ` Jason Gunthorpe
@ 2025-09-24 18:59               ` Robin Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Robin Murphy @ 2025-09-24 18:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Shyam Saini
  Cc: Will Deacon, thierry.reding, robh, joro, iommu, linux-arm-kernel,
	devicetree, virtualization, jacob.pan, eric.auger, code, eahariha,
	vijayb, bboscaccy, saravanak, krzk+dt, conor+dt, lizhi.hou,
	clement.leger

On 2025-09-23 5:19 pm, Jason Gunthorpe wrote:
> On Tue, Sep 23, 2025 at 08:56:47AM -0700, Shyam Saini wrote:
>> Hi Jason, Will,
>>
>> On 19 Sep 2025 09:08, Jason Gunthorpe wrote:
>>> On Fri, Sep 19, 2025 at 08:33:23AM +0100, Will Deacon wrote:
>>>> pieces and will need to work on the userspace side. It's not like
>>>> MSI_IOVA2 is magically going to work (and I bet it won't be tested).
>>>
>>> It could, if someone checks the default memory map a second constant
>>> could be selected that works.
>>>
>>>>> Nicolin has some patches on the iommufd side to let userspace select
>>>>> the MSI address instead, but they are not done yet.
>>>>
>>>> Maybe we should just wait for that? Carrying a temporary hack with ABI
>>>> implications to support broken hardware isn't particularly compelling
>>>> to me.
>>>
>>> This patch would still be needed for kernel users.
>>>
>>> Arguably the kernel users should just be using the iova allocator from
>>> dma-iommu.c. This whole hard coded constant/sneaky uapi is just a hack
>>> to make vfio work..
>>>
>>> So maybe if the single constant doesn't work we could set some
>>> indication that the caller must allocate the MSI iova, the kernel can
>>> use the dma-iommu allocator and VFIO can just refuse to use the device
>>> for now.
>>
>> So, are we settling on having two predefined MSI IOVA base constants,
>> and if both of those conflict with reserved regions on a given platform,
>> falling back to dynamic allocation via the IOVA allocator? Just checking
>> if that's the consensus we're reaching.
> 
> I think Will is arguing against introducing a new constant..
> 
> Yesterday I was looking at the SW_MSI code again.. What specific
> problem is it you have?
> 
> It looks to me like dma-iommu.c is already allocating MSI addresses
> using its built in IOVA allocator. So if your DT is marking that space
> reserved then it should Just Work right now as dma-iommu.c already
> processes the reserved ranges and will allocate MSI addresses around
> them?
> 
> The base value of the SW_MSI is only used by VFIO - are you trying to
> use VFIO with this device, or have I misunderstood the dma-iommu.c
> logic?

Indeed the sole user of the entire 
IOMMU_RESV_SW_MSI/iommu_dma_get_msi_cookie() mechanism is that one place 
in vfio_iommu_type1. iommu-dma itself treats MSIs just like any other 
DMA mapping, so if address space limitations are not correctly described 
then any breakage will be to DMA in general.
> If it is only VFIO at issue then perhaps we should solve this by
> completing the work Nicolin started to allow VFIO userspace to specify
> the MSI Aperture?

+1 to that - the arbitrary fake MSI reserved region was only ever meant 
to be a first step to get existing VMMs working with bare minimal 
"squint and pretend it's like x86" changes; MSI_IOVA_BASE was literally 
just picked to fit the standard Qemu virt machine memory map nicely. It 
was always intended that we'd eventually have more 
Arm-system-architecture-aware VMMs that would understand it's just a 
notional hole that needs punching in VFIO address space _somewhere_, and 
we'd figure out some interface for negotiating it. There has also always 
been at least one platform where this MSI_IOVA_BASE knowingly could 
never work, but that one (Arm Juno) also has sufficient other 
impediments to realistic VFIO usage (I've had it working, but it's 
definitely no more than a novelty) that it was never going to justify 
any upstream investment itself.

If we do now have a "serious" VFIO-capable system where the basic bodge 
no longer suffices, that surely does justify it finally being time to do 
the right thing.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-09-24 18:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-09 15:45 [PATCH v4 0/4] arm-smmu: select suitable MSI IOVA Shyam Saini
2025-09-09 15:45 ` [PATCH v4 1/4] arm-smmu: move MSI_IOVA macro definitions Shyam Saini
2025-09-09 15:45 ` [PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices Shyam Saini
2025-09-24 17:44   ` Robin Murphy
2025-09-09 15:45 ` [PATCH v4 3/4] arm-smmu: select suitable MSI IOVA Shyam Saini
2025-09-18 16:49   ` Will Deacon
2025-09-18 22:43     ` Jason Gunthorpe
2025-09-19  7:33       ` Will Deacon
2025-09-19 12:08         ` Jason Gunthorpe
2025-09-23 15:56           ` Shyam Saini
2025-09-23 16:19             ` Jason Gunthorpe
2025-09-24 18:59               ` Robin Murphy
2025-09-09 15:46 ` [PATCH v4 4/4] drivers: iommu: refactor arm_smmu_get_resv_regions Shyam Saini
2025-09-09 15:58   ` Jason Gunthorpe
2025-09-15 16:28     ` Shyam Saini
2025-09-15 22:59       ` Jason Gunthorpe
2025-09-13  0:23   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).