patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
@ 2023-11-15 14:05 Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() Jason Gunthorpe
                   ` (18 more replies)
  0 siblings, 19 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

[Several people have tested this now, so it is something that should sit in
linux-next for a while]

The iommu subsystem uses dev->iommu to store bits of information about the
attached iommu driver. This has been co-opted by the ACPI/OF code to also
be a place to pass around the iommu_fwspec before a driver is probed.

Since both are using the same pointers without any locking it triggers
races if there is concurrent driver loading:

     CPU0                                     CPU1
of_iommu_configure()                iommu_device_register()
 ..                                   bus_iommu_probe()
  iommu_fwspec_of_xlate()              __iommu_probe_device()
                                        iommu_init_device()
   dev_iommu_get()
                                          .. ops->probe fails, no fwspec ..
                                          dev_iommu_free()
   dev->iommu->fwspec    *crash*

My first attempt get correct locking here was to use the device_lock to
protect the entire *_iommu_configure() and iommu_probe() paths. This
allowed safe use of dev->iommu within those paths. Unfortuately enough
drivers abuse the of_iommu_configure() flow without proper locking and
this approach failed.

This approach removes touches of dev->iommu from the *_iommu_configure()
code. The few remaining required touches are moved into iommu.c and
protected with the existing iommu_probe_device_lock.

To do this we change *_iommu_configure() to hold the iommu_fwspec on the
stack while it is being built. Once it is fully formed the core code will
install it into the dev->iommu when it calls probe.

This also removes all the touches of iommu_ops from
the *_iommu_configure() paths and makes that mechanism private to the
iommu core.

A few more lockdep assertions are added to discourage future mis-use.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec

v2:
 - Fix all the kconfig randomization 0-day stuff
 - Add missing kdoc parameters
 - Remove NO_IOMMU, replace it with ENODEV
 - Use PTR_ERR to print errno in the new/moved logging
v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_jgg@nvidia.com

Jason Gunthorpe (17):
  iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
  iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
  iommu/of: Use -ENODEV consistently in of_iommu_configure()
  acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
  iommu: Make iommu_fwspec->ids a distinct allocation
  iommu: Add iommu_fwspec_alloc/dealloc()
  iommu: Add iommu_probe_device_fwspec()
  iommu/of: Do not use dev->iommu within of_iommu_configure()
  iommu: Add iommu_fwspec_append_ids()
  acpi: Do not use dev->iommu within acpi_iommu_configure()
  iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
  iommu: Make iommu_ops_from_fwnode() static
  iommu: Remove dev_iommu_fwspec_set()
  iommu: Remove pointless iommu_fwspec_free()
  iommu: Add ops->of_xlate_fwspec()
  iommu: Mark dev_iommu_get() with lockdep
  iommu: Mark dev_iommu_priv_set() with a lockdep

 arch/arc/mm/dma.c                           |   2 +-
 arch/arm/mm/dma-mapping-nommu.c             |   2 +-
 arch/arm/mm/dma-mapping.c                   |  10 +-
 arch/arm64/mm/dma-mapping.c                 |   4 +-
 arch/mips/mm/dma-noncoherent.c              |   2 +-
 arch/riscv/mm/dma-noncoherent.c             |   2 +-
 drivers/acpi/arm64/iort.c                   |  42 ++--
 drivers/acpi/scan.c                         | 104 +++++----
 drivers/acpi/viot.c                         |  45 ++--
 drivers/hv/hv_common.c                      |   2 +-
 drivers/iommu/amd/iommu.c                   |   2 -
 drivers/iommu/apple-dart.c                  |   1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  23 +-
 drivers/iommu/intel/iommu.c                 |   2 -
 drivers/iommu/iommu.c                       | 227 +++++++++++++++-----
 drivers/iommu/of_iommu.c                    | 133 +++++-------
 drivers/iommu/omap-iommu.c                  |   1 -
 drivers/iommu/tegra-smmu.c                  |   1 -
 drivers/iommu/virtio-iommu.c                |   8 +-
 drivers/of/device.c                         |  24 ++-
 include/acpi/acpi_bus.h                     |   8 +-
 include/linux/acpi_iort.h                   |   8 +-
 include/linux/acpi_viot.h                   |   5 +-
 include/linux/dma-map-ops.h                 |   4 +-
 include/linux/iommu.h                       |  47 ++--
 include/linux/of_iommu.h                    |  13 +-
 27 files changed, 424 insertions(+), 307 deletions(-)


base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
-- 
2.42.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 02/17] iommmu/of: Do not return struct iommu_ops from of_iommu_configure() Jason Gunthorpe
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This is not being used to pass ops, it is just a way to tell if an
iommu driver was probed. These days this can be detected directly via
device_iommu_mapped(). Call device_iommu_mapped() in the two places that
need to check it and remove the iommu parameter everywhere.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 arch/arc/mm/dma.c               |  2 +-
 arch/arm/mm/dma-mapping-nommu.c |  2 +-
 arch/arm/mm/dma-mapping.c       | 10 +++++-----
 arch/arm64/mm/dma-mapping.c     |  4 ++--
 arch/mips/mm/dma-noncoherent.c  |  2 +-
 arch/riscv/mm/dma-noncoherent.c |  2 +-
 drivers/acpi/scan.c             |  3 +--
 drivers/hv/hv_common.c          |  2 +-
 drivers/of/device.c             |  2 +-
 include/linux/dma-map-ops.h     |  4 ++--
 10 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 2a7fbbb83b7056..197707bc765889 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -91,7 +91,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
  * Plug in direct dma map ops.
  */
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			const struct iommu_ops *iommu, bool coherent)
+			bool coherent)
 {
 	/*
 	 * IOC hardware snoops all DMA traffic keeping the caches consistent
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index cfd9c933d2f09c..b94850b579952a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -34,7 +34,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			const struct iommu_ops *iommu, bool coherent)
+			bool coherent)
 {
 	if (IS_ENABLED(CONFIG_CPU_V7M)) {
 		/*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5409225b4abc06..6c359a3af8d9c7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1713,7 +1713,7 @@ void arm_iommu_detach_device(struct device *dev)
 EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
 
 static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    const struct iommu_ops *iommu, bool coherent)
+				    bool coherent)
 {
 	struct dma_iommu_mapping *mapping;
 
@@ -1748,7 +1748,7 @@ static void arm_teardown_iommu_dma_ops(struct device *dev)
 #else
 
 static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    const struct iommu_ops *iommu, bool coherent)
+				    bool coherent)
 {
 }
 
@@ -1757,7 +1757,7 @@ static void arm_teardown_iommu_dma_ops(struct device *dev) { }
 #endif	/* CONFIG_ARM_DMA_USE_IOMMU */
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			const struct iommu_ops *iommu, bool coherent)
+			bool coherent)
 {
 	/*
 	 * Due to legacy code that sets the ->dma_coherent flag from a bus
@@ -1776,8 +1776,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (dev->dma_ops)
 		return;
 
-	if (iommu)
-		arm_setup_iommu_dma_ops(dev, dma_base, size, iommu, coherent);
+	if (device_iommu_mapped(dev))
+		arm_setup_iommu_dma_ops(dev, dma_base, size, coherent);
 
 	xen_setup_dma_ops(dev);
 	dev->archdata.dma_ops_setup = true;
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 3cb101e8cb29ba..61886e43e3a10f 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -47,7 +47,7 @@ void arch_teardown_dma_ops(struct device *dev)
 #endif
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			const struct iommu_ops *iommu, bool coherent)
+			bool coherent)
 {
 	int cls = cache_line_size_of_cpu();
 
@@ -58,7 +58,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		   ARCH_DMA_MINALIGN, cls);
 
 	dev->dma_coherent = coherent;
-	if (iommu)
+	if (device_iommu_mapped(dev))
 		iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
 
 	xen_setup_dma_ops(dev);
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 3c4fc97b9f394b..0f3cec663a12cd 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -138,7 +138,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-		const struct iommu_ops *iommu, bool coherent)
+		bool coherent)
 {
 	dev->dma_coherent = coherent;
 }
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index 4e4e469b8dd66c..843107f834b231 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -129,7 +129,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-		const struct iommu_ops *iommu, bool coherent)
+			bool coherent)
 {
 	WARN_TAINT(!coherent && riscv_cbom_block_size > ARCH_DMA_MINALIGN,
 		   TAINT_CPU_OUT_OF_SPEC,
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index fa5dd71a80fad9..9682291188c49c 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1636,8 +1636,7 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
 	if (PTR_ERR(iommu) == -EPROBE_DEFER)
 		return -EPROBE_DEFER;
 
-	arch_setup_dma_ops(dev, 0, U64_MAX,
-				iommu, attr == DEV_DMA_COHERENT);
+	arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT);
 
 	return 0;
 }
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 4372f5d146ab22..0285a74363b3d1 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -488,7 +488,7 @@ void hv_setup_dma_ops(struct device *dev, bool coherent)
 	 * Hyper-V does not offer a vIOMMU in the guest
 	 * VM, so pass 0/NULL for the IOMMU settings
 	 */
-	arch_setup_dma_ops(dev, 0, 0, NULL, coherent);
+	arch_setup_dma_ops(dev, 0, 0, coherent);
 }
 EXPORT_SYMBOL_GPL(hv_setup_dma_ops);
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 1ca42ad9dd159d..65c71be71a8d45 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -193,7 +193,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
 	dev_dbg(dev, "device is%sbehind an iommu\n",
 		iommu ? " " : " not ");
 
-	arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
+	arch_setup_dma_ops(dev, dma_start, size, coherent);
 
 	if (!iommu)
 		of_dma_set_restricted_buffer(dev, np);
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f2fc203fb8a1a2..2cb98a12c50348 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -426,10 +426,10 @@ bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
 
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-		const struct iommu_ops *iommu, bool coherent);
+		bool coherent);
 #else
 static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base,
-		u64 size, const struct iommu_ops *iommu, bool coherent)
+		u64 size, bool coherent)
 {
 }
 #endif /* CONFIG_ARCH_HAS_SETUP_DMA_OPS */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 02/17] iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure() Jason Gunthorpe
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/of_iommu.c | 31 +++++++++++++++++++------------
 drivers/of/device.c      | 22 +++++++++++++++-------
 include/linux/of_iommu.h | 13 ++++++-------
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 157b286e36bf3a..b47dcb66cde98d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -107,20 +107,26 @@ static int of_iommu_configure_device(struct device_node *master_np,
 		      of_iommu_configure_dev(master_np, dev);
 }
 
-const struct iommu_ops *of_iommu_configure(struct device *dev,
-					   struct device_node *master_np,
-					   const u32 *id)
+/*
+ * Returns:
+ *  0 on success, an iommu was configured
+ *  -ENODEV if the device does not have any IOMMU
+ *  -EPROBEDEFER if probing should be tried again
+ *  -errno fatal errors
+ */
+int of_iommu_configure(struct device *dev, struct device_node *master_np,
+		       const u32 *id)
 {
 	const struct iommu_ops *ops = NULL;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	int err = NO_IOMMU;
 
 	if (!master_np)
-		return NULL;
+		return -ENODEV;
 
 	if (fwspec) {
 		if (fwspec->ops)
-			return fwspec->ops;
+			return 0;
 
 		/* In the deferred case, start again from scratch */
 		iommu_fwspec_free(dev);
@@ -163,14 +169,15 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 		err = iommu_probe_device(dev);
 
 	/* Ignore all other errors apart from EPROBE_DEFER */
-	if (err == -EPROBE_DEFER) {
-		ops = ERR_PTR(err);
-	} else if (err < 0) {
-		dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-		ops = NULL;
+	if (err < 0) {
+		if (err == -EPROBE_DEFER)
+			return err;
+		dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+		return err;
 	}
-
-	return ops;
+	if (!ops)
+		return -ENODEV;
+	return 0;
 }
 
 static enum iommu_resv_type __maybe_unused
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 65c71be71a8d45..873d933e8e6d1d 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -93,12 +93,12 @@ of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
 int of_dma_configure_id(struct device *dev, struct device_node *np,
 			bool force_dma, const u32 *id)
 {
-	const struct iommu_ops *iommu;
 	const struct bus_dma_region *map = NULL;
 	struct device_node *bus_np;
 	u64 dma_start = 0;
 	u64 mask, end, size = 0;
 	bool coherent;
+	int iommu_ret;
 	int ret;
 
 	if (np == dev->of_node)
@@ -181,21 +181,29 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
 	dev_dbg(dev, "device is%sdma coherent\n",
 		coherent ? " " : " not ");
 
-	iommu = of_iommu_configure(dev, np, id);
-	if (PTR_ERR(iommu) == -EPROBE_DEFER) {
+	iommu_ret = of_iommu_configure(dev, np, id);
+	if (iommu_ret == -EPROBE_DEFER) {
 		/* Don't touch range map if it wasn't set from a valid dma-ranges */
 		if (!ret)
 			dev->dma_range_map = NULL;
 		kfree(map);
 		return -EPROBE_DEFER;
-	}
+	} else if (iommu_ret == -ENODEV) {
+		dev_dbg(dev, "device is not behind an iommu\n");
+	} else if (iommu_ret) {
+		dev_err(dev, "iommu configuration for device failed with %pe\n",
+			ERR_PTR(iommu_ret));
 
-	dev_dbg(dev, "device is%sbehind an iommu\n",
-		iommu ? " " : " not ");
+		/*
+		 * Historically this routine doesn't fail driver probing
+		 * due to errors in of_iommu_configure()
+		 */
+	} else
+		dev_dbg(dev, "device is behind an iommu\n");
 
 	arch_setup_dma_ops(dev, dma_start, size, coherent);
 
-	if (!iommu)
+	if (iommu_ret)
 		of_dma_set_restricted_buffer(dev, np);
 
 	return 0;
diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h
index 9a5e6b410dd2fb..e61cbbe12dac6f 100644
--- a/include/linux/of_iommu.h
+++ b/include/linux/of_iommu.h
@@ -8,20 +8,19 @@ struct iommu_ops;
 
 #ifdef CONFIG_OF_IOMMU
 
-extern const struct iommu_ops *of_iommu_configure(struct device *dev,
-					struct device_node *master_np,
-					const u32 *id);
+extern int of_iommu_configure(struct device *dev, struct device_node *master_np,
+			      const u32 *id);
 
 extern void of_iommu_get_resv_regions(struct device *dev,
 				      struct list_head *list);
 
 #else
 
-static inline const struct iommu_ops *of_iommu_configure(struct device *dev,
-					 struct device_node *master_np,
-					 const u32 *id)
+static inline int of_iommu_configure(struct device *dev,
+				     struct device_node *master_np,
+				     const u32 *id)
 {
-	return NULL;
+	return -ENODEV;
 }
 
 static inline void of_iommu_get_resv_regions(struct device *dev,
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 02/17] iommmu/of: Do not return struct iommu_ops from of_iommu_configure() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:42   ` Jerry Snitselaar
  2023-11-15 14:05 ` [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() Jason Gunthorpe
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Instead of returning 1 and trying to handle positive error codes just
stick to the convention of returning -ENODEV. Remove references to ops
from of_iommu_configure(), a NULL ops will already generate an error code.

There is no reason to check dev->bus, if err=0 at this point then the
called configure functions thought there was an iommu and we should try to
probe it. Remove it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/of_iommu.c | 48 +++++++++++++---------------------------
 1 file changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index b47dcb66cde98d..a68a4d1dc0725c 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -17,8 +17,6 @@
 #include <linux/slab.h>
 #include <linux/fsl/mc.h>
 
-#define NO_IOMMU	1
-
 static int of_iommu_xlate(struct device *dev,
 			  struct of_phandle_args *iommu_spec)
 {
@@ -29,7 +27,7 @@ static int of_iommu_xlate(struct device *dev,
 	ops = iommu_ops_from_fwnode(fwnode);
 	if ((ops && !ops->of_xlate) ||
 	    !of_device_is_available(iommu_spec->np))
-		return NO_IOMMU;
+		return -ENODEV;
 
 	ret = iommu_fwspec_init(dev, &iommu_spec->np->fwnode, ops);
 	if (ret)
@@ -61,7 +59,7 @@ static int of_iommu_configure_dev_id(struct device_node *master_np,
 			 "iommu-map-mask", &iommu_spec.np,
 			 iommu_spec.args);
 	if (err)
-		return err == -ENODEV ? NO_IOMMU : err;
+		return err;
 
 	err = of_iommu_xlate(dev, &iommu_spec);
 	of_node_put(iommu_spec.np);
@@ -72,7 +70,7 @@ static int of_iommu_configure_dev(struct device_node *master_np,
 				  struct device *dev)
 {
 	struct of_phandle_args iommu_spec;
-	int err = NO_IOMMU, idx = 0;
+	int err = -ENODEV, idx = 0;
 
 	while (!of_parse_phandle_with_args(master_np, "iommus",
 					   "#iommu-cells",
@@ -117,9 +115,8 @@ static int of_iommu_configure_device(struct device_node *master_np,
 int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		       const u32 *id)
 {
-	const struct iommu_ops *ops = NULL;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-	int err = NO_IOMMU;
+	int err;
 
 	if (!master_np)
 		return -ENODEV;
@@ -150,34 +147,19 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		err = of_iommu_configure_device(master_np, dev, id);
 	}
 
-	/*
-	 * Two success conditions can be represented by non-negative err here:
-	 * >0 : there is no IOMMU, or one was unavailable for non-fatal reasons
-	 *  0 : we found an IOMMU, and dev->fwspec is initialised appropriately
-	 * <0 : any actual error
-	 */
-	if (!err) {
-		/* The fwspec pointer changed, read it again */
-		fwspec = dev_iommu_fwspec_get(dev);
-		ops    = fwspec->ops;
-	}
-	/*
-	 * If we have reason to believe the IOMMU driver missed the initial
-	 * probe for dev, replay it to get things in order.
-	 */
-	if (!err && dev->bus)
-		err = iommu_probe_device(dev);
-
-	/* Ignore all other errors apart from EPROBE_DEFER */
-	if (err < 0) {
-		if (err == -EPROBE_DEFER)
-			return err;
-		dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+	if (err == -ENODEV || err == -EPROBE_DEFER)
 		return err;
-	}
-	if (!ops)
-		return -ENODEV;
+	if (err)
+		goto err_log;
+
+	err = iommu_probe_device(dev);
+	if (err)
+		goto err_log;
 	return 0;
+
+err_log:
+	dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+	return err;
 }
 
 static enum iommu_resv_type __maybe_unused
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:45   ` Jerry Snitselaar
  2023-11-15 14:05 ` [PATCH v2 05/17] iommu: Make iommu_fwspec->ids a distinct allocation Jason Gunthorpe
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/acpi/scan.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 9682291188c49c..d171d193f2a51c 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1562,8 +1562,7 @@ static inline const struct iommu_ops *acpi_iommu_fwspec_ops(struct device *dev)
 	return fwspec ? fwspec->ops : NULL;
 }
 
-static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
-						       const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
 	int err;
 	const struct iommu_ops *ops;
@@ -1574,7 +1573,7 @@ static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
 	 */
 	ops = acpi_iommu_fwspec_ops(dev);
 	if (ops)
-		return ops;
+		return 0;
 
 	err = iort_iommu_configure_id(dev, id_in);
 	if (err && err != -EPROBE_DEFER)
@@ -1589,12 +1588,14 @@ static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
 
 	/* Ignore all other errors apart from EPROBE_DEFER */
 	if (err == -EPROBE_DEFER) {
-		return ERR_PTR(err);
+		return err;
 	} else if (err) {
 		dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-		return NULL;
+		return -ENODEV;
 	}
-	return acpi_iommu_fwspec_ops(dev);
+	if (!acpi_iommu_fwspec_ops(dev))
+		return -ENODEV;
+	return 0;
 }
 
 #else /* !CONFIG_IOMMU_API */
@@ -1623,7 +1624,7 @@ static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
 			  const u32 *input_id)
 {
-	const struct iommu_ops *iommu;
+	int ret;
 
 	if (attr == DEV_DMA_NOT_SUPPORTED) {
 		set_dma_ops(dev, &dma_dummy_ops);
@@ -1632,10 +1633,15 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
 
 	acpi_arch_dma_setup(dev);
 
-	iommu = acpi_iommu_configure_id(dev, input_id);
-	if (PTR_ERR(iommu) == -EPROBE_DEFER)
+	ret = acpi_iommu_configure_id(dev, input_id);
+	if (ret == -EPROBE_DEFER)
 		return -EPROBE_DEFER;
 
+	/*
+	 * Historically this routine doesn't fail driver probing due to errors
+	 * in acpi_iommu_configure_id()
+	 */
+
 	arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT);
 
 	return 0;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 05/17] iommu: Make iommu_fwspec->ids a distinct allocation
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc() Jason Gunthorpe
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

The optimization of kreallocing the entire fwspec only works if the fwspec
pointer is always stored in the dev->iommu. Since we want to change this
remove the optimization and make the ids array a distinct allocation.

Allow a single id to be stored inside the iommu_fwspec as a common case
optimization.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 20 ++++++++++++--------
 include/linux/iommu.h |  3 ++-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f17a1113f3d6a3..18a82a20934d53 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2931,8 +2931,7 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 	if (!dev_iommu_get(dev))
 		return -ENOMEM;
 
-	/* Preallocate for the overwhelmingly common case of 1 ID */
-	fwspec = kzalloc(struct_size(fwspec, ids, 1), GFP_KERNEL);
+	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
 	if (!fwspec)
 		return -ENOMEM;
 
@@ -2965,13 +2964,18 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 		return -EINVAL;
 
 	new_num = fwspec->num_ids + num_ids;
-	if (new_num > 1) {
-		fwspec = krealloc(fwspec, struct_size(fwspec, ids, new_num),
-				  GFP_KERNEL);
-		if (!fwspec)
+	if (new_num <= 1) {
+		if (fwspec->ids != &fwspec->single_id)
+			kfree(fwspec->ids);
+		fwspec->ids = &fwspec->single_id;
+	} else if (new_num > fwspec->num_ids) {
+		ids = krealloc_array(
+			fwspec->ids != &fwspec->single_id ? fwspec->ids : NULL,
+			new_num, sizeof(fwspec->ids[0]),
+			GFP_KERNEL | __GFP_ZERO);
+		if (!ids)
 			return -ENOMEM;
-
-		dev_iommu_fwspec_set(dev, fwspec);
+		fwspec->ids = ids;
 	}
 
 	for (i = 0; i < num_ids; i++)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ec289c1016f5f2..e98a4ca8f536b7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -798,7 +798,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	u32			flags;
 	unsigned int		num_ids;
-	u32			ids[];
+	u32			single_id;
+	u32			*ids;
 };
 
 /* ATS is supported */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 05/17] iommu: Make iommu_fwspec->ids a distinct allocation Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-19  8:10   ` Hector Martin
  2023-11-15 14:05 ` [PATCH v2 07/17] iommu: Add iommu_probe_device_fwspec() Jason Gunthorpe
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Allow fwspec to exist independently from the dev->iommu by providing
functions to allow allocating and freeing the raw struct iommu_fwspec.

Reflow the existing paths to call the new alloc/dealloc functions.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 82 ++++++++++++++++++++++++++++++++-----------
 include/linux/iommu.h | 11 +++++-
 2 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 18a82a20934d53..86bbb9e75c7e03 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -361,10 +361,8 @@ static void dev_iommu_free(struct device *dev)
 	struct dev_iommu *param = dev->iommu;
 
 	dev->iommu = NULL;
-	if (param->fwspec) {
-		fwnode_handle_put(param->fwspec->iommu_fwnode);
-		kfree(param->fwspec);
-	}
+	if (param->fwspec)
+		iommu_fwspec_dealloc(param->fwspec);
 	kfree(param);
 }
 
@@ -2920,10 +2918,61 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return ops;
 }
 
+static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
+				     struct device *dev,
+				     struct fwnode_handle *iommu_fwnode)
+{
+	const struct iommu_ops *ops;
+
+	if (fwspec->iommu_fwnode) {
+		/*
+		 * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
+		 * case of multiple iommus for one device they must point to the
+		 * same driver, checked via same ops.
+		 */
+		ops = iommu_ops_from_fwnode(iommu_fwnode);
+		if (fwspec->ops != ops)
+			return -EINVAL;
+		return 0;
+	}
+
+	if (!fwspec->ops) {
+		ops = iommu_ops_from_fwnode(iommu_fwnode);
+		if (!ops)
+			return driver_deferred_probe_check_state(dev);
+		fwspec->ops = ops;
+	}
+
+	of_node_get(to_of_node(iommu_fwnode));
+	fwspec->iommu_fwnode = iommu_fwnode;
+	return 0;
+}
+
+struct iommu_fwspec *iommu_fwspec_alloc(void)
+{
+	struct iommu_fwspec *fwspec;
+
+	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
+	if (!fwspec)
+		return ERR_PTR(-ENOMEM);
+	return fwspec;
+}
+
+void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec)
+{
+	if (!fwspec)
+		return;
+
+	if (fwspec->iommu_fwnode)
+		fwnode_handle_put(fwspec->iommu_fwnode);
+	kfree(fwspec);
+}
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops)
 {
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	int ret;
 
 	if (fwspec)
 		return ops == fwspec->ops ? 0 : -EINVAL;
@@ -2931,29 +2980,22 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 	if (!dev_iommu_get(dev))
 		return -ENOMEM;
 
-	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
-	if (!fwspec)
-		return -ENOMEM;
+	fwspec = iommu_fwspec_alloc();
+	if (IS_ERR(fwspec))
+		return PTR_ERR(fwspec);
 
-	of_node_get(to_of_node(iommu_fwnode));
-	fwspec->iommu_fwnode = iommu_fwnode;
 	fwspec->ops = ops;
+	ret = iommu_fwspec_assign_iommu(fwspec, dev, iommu_fwnode);
+	if (ret) {
+		iommu_fwspec_dealloc(fwspec);
+		return ret;
+	}
+
 	dev_iommu_fwspec_set(dev, fwspec);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_init);
 
-void iommu_fwspec_free(struct device *dev)
-{
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-	if (fwspec) {
-		fwnode_handle_put(fwspec->iommu_fwnode);
-		kfree(fwspec);
-		dev_iommu_fwspec_set(dev, NULL);
-	}
-}
-EXPORT_SYMBOL_GPL(iommu_fwspec_free);
 
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e98a4ca8f536b7..c7c68cb59aa4dc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -813,9 +813,18 @@ struct iommu_sva {
 	struct iommu_domain		*domain;
 };
 
+struct iommu_fwspec *iommu_fwspec_alloc(void);
+void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
-void iommu_fwspec_free(struct device *dev);
+static inline void iommu_fwspec_free(struct device *dev)
+{
+	if (!dev->iommu)
+		return;
+	iommu_fwspec_dealloc(dev->iommu->fwspec);
+	dev->iommu->fwspec = NULL;
+}
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 07/17] iommu: Add iommu_probe_device_fwspec()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:05 ` [PATCH v2 08/17] iommu/of: Do not use dev->iommu within of_iommu_configure() Jason Gunthorpe
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Instead of obtaining an iommu_fwspec from dev->iommu allow a caller
allocated fwspec to be passed into the probe logic. To keep the driver ops
APIs the same the fwspec is stored in dev->iommu under the
iommu_probe_device_lock.

If a fwspec is available use it to provide the ops instead of the bus.

The lifecycle logic is a bit tortured because of how the existing driver
code works. The new routine unconditionally takes ownership, even for
failure. This could be simplified we can get rid of the remaining
iommu_fwspec_init() callers someday.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 53 +++++++++++++++++++++++++++++++------------
 include/linux/iommu.h |  6 ++++-
 2 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 86bbb9e75c7e03..667495faa461f7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -386,16 +386,24 @@ static u32 dev_iommu_get_max_pasids(struct device *dev)
 
 /*
  * Init the dev->iommu and dev->iommu_group in the struct device and get the
- * driver probed
+ * driver probed. Take ownership of fwspec, it always freed on error
+ * or freed by iommu_deinit_device().
  */
-static int iommu_init_device(struct device *dev, const struct iommu_ops *ops)
+static int iommu_init_device(struct device *dev, struct iommu_fwspec *fwspec,
+			     const struct iommu_ops *ops)
 {
 	struct iommu_device *iommu_dev;
 	struct iommu_group *group;
 	int ret;
 
-	if (!dev_iommu_get(dev))
+	if (!dev_iommu_get(dev)) {
+		iommu_fwspec_dealloc(fwspec);
 		return -ENOMEM;
+	}
+
+	if (dev->iommu->fwspec && dev->iommu->fwspec != fwspec)
+		iommu_fwspec_dealloc(dev->iommu->fwspec);
+	dev->iommu->fwspec = fwspec;
 
 	if (!try_module_get(ops->owner)) {
 		ret = -EINVAL;
@@ -483,16 +491,17 @@ static void iommu_deinit_device(struct device *dev)
 	dev_iommu_free(dev);
 }
 
-static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
+static int __iommu_probe_device(struct device *dev,
+				struct iommu_fwspec *caller_fwspec,
+				struct list_head *group_list)
 {
-	const struct iommu_ops *ops = dev->bus->iommu_ops;
+	struct iommu_fwspec *fwspec = caller_fwspec;
+	const struct iommu_ops *ops;
 	struct iommu_group *group;
 	static DEFINE_MUTEX(iommu_probe_device_lock);
 	struct group_device *gdev;
 	int ret;
 
-	if (!ops)
-		return -ENODEV;
 	/*
 	 * Serialise to avoid races between IOMMU drivers registering in
 	 * parallel and/or the "replay" calls from ACPI/OF code via client
@@ -502,13 +511,25 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 	 */
 	mutex_lock(&iommu_probe_device_lock);
 
-	/* Device is probed already if in a group */
-	if (dev->iommu_group) {
-		ret = 0;
+	if (!fwspec && dev->iommu)
+		fwspec = dev->iommu->fwspec;
+	if (fwspec)
+		ops = fwspec->ops;
+	else
+		ops = dev->bus->iommu_ops;
+	if (!ops) {
+		ret = -ENODEV;
 		goto out_unlock;
 	}
 
-	ret = iommu_init_device(dev, ops);
+	/* Device is probed already if in a group */
+	if (dev->iommu_group) {
+		ret = 0;
+		iommu_fwspec_dealloc(caller_fwspec);
+		goto out_unlock;
+	}
+
+	ret = iommu_init_device(dev, fwspec, ops);
 	if (ret)
 		goto out_unlock;
 
@@ -566,12 +587,16 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 	return ret;
 }
 
-int iommu_probe_device(struct device *dev)
+/*
+ * Ownership of fwspec always transfers to iommu_probe_device_fwspec(), it will
+ * be free'd even on failure.
+ */
+int iommu_probe_device_fwspec(struct device *dev, struct iommu_fwspec *fwspec)
 {
 	const struct iommu_ops *ops;
 	int ret;
 
-	ret = __iommu_probe_device(dev, NULL);
+	ret = __iommu_probe_device(dev, fwspec, NULL);
 	if (ret)
 		return ret;
 
@@ -1820,7 +1845,7 @@ static int probe_iommu_group(struct device *dev, void *data)
 	struct list_head *group_list = data;
 	int ret;
 
-	ret = __iommu_probe_device(dev, group_list);
+	ret = __iommu_probe_device(dev, NULL, group_list);
 	if (ret == -ENODEV)
 		ret = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c7c68cb59aa4dc..ca86cd3fe50a82 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -855,7 +855,11 @@ static inline void dev_iommu_priv_set(struct device *dev, void *priv)
 	dev->iommu->priv = priv;
 }
 
-int iommu_probe_device(struct device *dev);
+int iommu_probe_device_fwspec(struct device *dev, struct iommu_fwspec *fwspec);
+static inline int iommu_probe_device(struct device *dev)
+{
+	return iommu_probe_device_fwspec(dev, NULL);
+}
 
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 08/17] iommu/of: Do not use dev->iommu within of_iommu_configure()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 07/17] iommu: Add iommu_probe_device_fwspec() Jason Gunthorpe
@ 2023-11-15 14:05 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 09/17] iommu: Add iommu_fwspec_append_ids() Jason Gunthorpe
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:05 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This call chain is using dev->iommu->fwspec to pass around the fwspec
between the three parts (of_iommu_configure(), of_iommu_xlate(),
iommu_probe_device()).

However there is no locking around the accesses to dev->iommu, so this is
all racy.

Allocate a clean, local, fwspec at the start of of_iommu_configure(), pass
it through all functions on the stack to fill it with data, and finally
pass it into iommu_probe_device_fwspec() which will load it into
dev->iommu under a lock.

Move the actual call to ops->of_xlate into the core code under
iommu_fwspec_of_xlate().

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c    | 29 ++++++++++++++
 drivers/iommu/of_iommu.c | 82 +++++++++++++++++-----------------------
 include/linux/iommu.h    |  3 ++
 3 files changed, 67 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 667495faa461f7..108922829698e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2973,6 +2973,35 @@ static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
 	return 0;
 }
 
+int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
+			  struct fwnode_handle *iommu_fwnode,
+			  struct of_phandle_args *iommu_spec)
+{
+	int ret;
+
+	ret = iommu_fwspec_assign_iommu(fwspec, dev, iommu_fwnode);
+	if (ret)
+		return ret;
+
+	if (!fwspec->ops->of_xlate)
+		return -ENODEV;
+
+	if (!dev_iommu_get(dev))
+		return -ENOMEM;
+
+	/*
+	 * ops->of_xlate() requires the fwspec to be passed through dev->iommu,
+	 * set it temporarily.
+	 */
+	if (dev->iommu->fwspec && dev->iommu->fwspec != fwspec)
+		iommu_fwspec_dealloc(dev->iommu->fwspec);
+	dev->iommu->fwspec = fwspec;
+	ret = fwspec->ops->of_xlate(dev, iommu_spec);
+	if (dev->iommu->fwspec == fwspec)
+		dev->iommu->fwspec = NULL;
+	return ret;
+}
+
 struct iommu_fwspec *iommu_fwspec_alloc(void)
 {
 	struct iommu_fwspec *fwspec;
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a68a4d1dc0725c..e611cb7455417f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -17,40 +17,19 @@
 #include <linux/slab.h>
 #include <linux/fsl/mc.h>
 
-static int of_iommu_xlate(struct device *dev,
+static int of_iommu_xlate(struct iommu_fwspec *fwspec, struct device *dev,
 			  struct of_phandle_args *iommu_spec)
 {
-	const struct iommu_ops *ops;
-	struct fwnode_handle *fwnode = &iommu_spec->np->fwnode;
-	int ret;
-
-	ops = iommu_ops_from_fwnode(fwnode);
-	if ((ops && !ops->of_xlate) ||
-	    !of_device_is_available(iommu_spec->np))
+	if (!of_device_is_available(iommu_spec->np))
 		return -ENODEV;
 
-	ret = iommu_fwspec_init(dev, &iommu_spec->np->fwnode, ops);
-	if (ret)
-		return ret;
-	/*
-	 * The otherwise-empty fwspec handily serves to indicate the specific
-	 * IOMMU device we're waiting for, which will be useful if we ever get
-	 * a proper probe-ordering dependency mechanism in future.
-	 */
-	if (!ops)
-		return driver_deferred_probe_check_state(dev);
-
-	if (!try_module_get(ops->owner))
-		return -ENODEV;
-
-	ret = ops->of_xlate(dev, iommu_spec);
-	module_put(ops->owner);
-	return ret;
+	return iommu_fwspec_of_xlate(fwspec, dev, &iommu_spec->np->fwnode,
+				     iommu_spec);
 }
 
-static int of_iommu_configure_dev_id(struct device_node *master_np,
-				     struct device *dev,
-				     const u32 *id)
+static int of_iommu_configure_dev_id(struct iommu_fwspec *fwspec,
+				     struct device_node *master_np,
+				     struct device *dev, const u32 *id)
 {
 	struct of_phandle_args iommu_spec = { .args_count = 1 };
 	int err;
@@ -61,12 +40,13 @@ static int of_iommu_configure_dev_id(struct device_node *master_np,
 	if (err)
 		return err;
 
-	err = of_iommu_xlate(dev, &iommu_spec);
+	err = of_iommu_xlate(fwspec, dev, &iommu_spec);
 	of_node_put(iommu_spec.np);
 	return err;
 }
 
-static int of_iommu_configure_dev(struct device_node *master_np,
+static int of_iommu_configure_dev(struct iommu_fwspec *fwspec,
+				  struct device_node *master_np,
 				  struct device *dev)
 {
 	struct of_phandle_args iommu_spec;
@@ -75,7 +55,7 @@ static int of_iommu_configure_dev(struct device_node *master_np,
 	while (!of_parse_phandle_with_args(master_np, "iommus",
 					   "#iommu-cells",
 					   idx, &iommu_spec)) {
-		err = of_iommu_xlate(dev, &iommu_spec);
+		err = of_iommu_xlate(fwspec, dev, &iommu_spec);
 		of_node_put(iommu_spec.np);
 		idx++;
 		if (err)
@@ -88,6 +68,7 @@ static int of_iommu_configure_dev(struct device_node *master_np,
 struct of_pci_iommu_alias_info {
 	struct device *dev;
 	struct device_node *np;
+	struct iommu_fwspec *fwspec;
 };
 
 static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
@@ -95,14 +76,16 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
 	struct of_pci_iommu_alias_info *info = data;
 	u32 input_id = alias;
 
-	return of_iommu_configure_dev_id(info->np, info->dev, &input_id);
+	return of_iommu_configure_dev_id(info->fwspec, info->np, info->dev,
+					 &input_id);
 }
 
-static int of_iommu_configure_device(struct device_node *master_np,
+static int of_iommu_configure_device(struct iommu_fwspec *fwspec,
+				     struct device_node *master_np,
 				     struct device *dev, const u32 *id)
 {
-	return (id) ? of_iommu_configure_dev_id(master_np, dev, id) :
-		      of_iommu_configure_dev(master_np, dev);
+	return (id) ? of_iommu_configure_dev_id(fwspec, master_np, dev, id) :
+		      of_iommu_configure_dev(fwspec, master_np, dev);
 }
 
 /*
@@ -115,19 +98,15 @@ static int of_iommu_configure_device(struct device_node *master_np,
 int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		       const u32 *id)
 {
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct iommu_fwspec *fwspec;
 	int err;
 
 	if (!master_np)
 		return -ENODEV;
 
-	if (fwspec) {
-		if (fwspec->ops)
-			return 0;
-
-		/* In the deferred case, start again from scratch */
-		iommu_fwspec_free(dev);
-	}
+	fwspec = iommu_fwspec_alloc();
+	if (IS_ERR(fwspec))
+		return PTR_ERR(fwspec);
 
 	/*
 	 * We don't currently walk up the tree looking for a parent IOMMU.
@@ -138,27 +117,36 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		struct of_pci_iommu_alias_info info = {
 			.dev = dev,
 			.np = master_np,
+			.fwspec = fwspec,
 		};
 
 		pci_request_acs();
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     of_pci_iommu_init, &info);
 	} else {
-		err = of_iommu_configure_device(master_np, dev, id);
+		err = of_iommu_configure_device(fwspec, master_np, dev, id);
 	}
 
 	if (err == -ENODEV || err == -EPROBE_DEFER)
-		return err;
+		goto err_free;
 	if (err)
 		goto err_log;
 
-	err = iommu_probe_device(dev);
-	if (err)
+	err = iommu_probe_device_fwspec(dev, fwspec);
+	if (err) {
+		/*
+		 * Ownership for fwspec always passes into
+		 * iommu_probe_device_fwspec()
+		 */
+		fwspec = NULL;
 		goto err_log;
+	}
 	return 0;
 
 err_log:
 	dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+err_free:
+	iommu_fwspec_dealloc(fwspec);
 	return err;
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ca86cd3fe50a82..cea65461eed01c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -815,6 +815,9 @@ struct iommu_sva {
 
 struct iommu_fwspec *iommu_fwspec_alloc(void);
 void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
+int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
+			  struct fwnode_handle *iommu_fwnode,
+			  struct of_phandle_args *iommu_spec);
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 09/17] iommu: Add iommu_fwspec_append_ids()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2023-11-15 14:05 ` [PATCH v2 08/17] iommu/of: Do not use dev->iommu within of_iommu_configure() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 10/17] acpi: Do not use dev->iommu within acpi_iommu_configure() Jason Gunthorpe
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This is a version of iommu_fwspec_add_ids() that takes in the fwspec as an
argument instead of getting it through dev->iommu.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 17 +++++++++++------
 include/linux/iommu.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 108922829698e9..2076345d0d6653 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3050,15 +3050,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_init);
 
-
-int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
+int iommu_fwspec_append_ids(struct iommu_fwspec *fwspec, u32 *ids, int num_ids)
 {
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	int i, new_num;
 
-	if (!fwspec)
-		return -EINVAL;
-
 	new_num = fwspec->num_ids + num_ids;
 	if (new_num <= 1) {
 		if (fwspec->ids != &fwspec->single_id)
@@ -3080,6 +3075,16 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	fwspec->num_ids = new_num;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(iommu_fwspec_append_ids);
+
+int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+	if (!fwspec)
+		return -EINVAL;
+	return iommu_fwspec_append_ids(fwspec, ids, num_ids);
+}
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
 
 /*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cea65461eed01c..bbbba7d0a159ba 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -830,6 +830,7 @@ static inline void iommu_fwspec_free(struct device *dev)
 }
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+int iommu_fwspec_append_ids(struct iommu_fwspec *fwspec, u32 *ids, int num_ids);
 
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 10/17] acpi: Do not use dev->iommu within acpi_iommu_configure()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 09/17] iommu: Add iommu_fwspec_append_ids() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 11/17] iommu: Hold iommu_probe_device_lock while calling ops->of_xlate Jason Gunthorpe
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This call chain is using dev->iommu->fwspec to pass around the fwspec
between the three parts (acpi_iommu_configure_id(),
acpi_iommu_fwspec_init(), iommu_probe_device()).

However there is no locking around the accesses to dev->iommu, so this is
all racy.

Allocate a clean, local, fwspec at the start of acpi_iommu_configure_id(),
pass it through all functions on the stack to fill it with data, and
finally pass it into iommu_probe_device_fwspec() which will load it into
dev->iommu under a lock.

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/acpi/arm64/iort.c | 42 +++++++++---------
 drivers/acpi/scan.c       | 89 ++++++++++++++++++---------------------
 drivers/acpi/viot.c       | 45 +++++++++++---------
 drivers/iommu/iommu.c     |  5 +--
 include/acpi/acpi_bus.h   |  8 ++--
 include/linux/acpi_iort.h |  8 +++-
 include/linux/acpi_viot.h |  5 ++-
 include/linux/iommu.h     |  2 +
 8 files changed, 103 insertions(+), 101 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 6496ff5a6ba20d..b08682282ee5a7 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1218,10 +1218,9 @@ static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
 	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
 }
 
-static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
-			    u32 streamid)
+static int iort_iommu_xlate(struct iommu_fwspec *fwspec, struct device *dev,
+			    struct acpi_iort_node *node, u32 streamid)
 {
-	const struct iommu_ops *ops;
 	struct fwnode_handle *iort_fwnode;
 
 	if (!node)
@@ -1239,17 +1238,14 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
 	 * in the kernel or not, defer the IOMMU configuration
 	 * or just abort it.
 	 */
-	ops = iommu_ops_from_fwnode(iort_fwnode);
-	if (!ops)
-		return iort_iommu_driver_enabled(node->type) ?
-		       -EPROBE_DEFER : -ENODEV;
-
-	return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
+	return acpi_iommu_fwspec_init(fwspec, dev, streamid, iort_fwnode,
+				      iort_iommu_driver_enabled(node->type));
 }
 
 struct iort_pci_alias_info {
 	struct device *dev;
 	struct acpi_iort_node *node;
+	struct iommu_fwspec *fwspec;
 };
 
 static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
@@ -1260,7 +1256,7 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
 
 	parent = iort_node_map_id(info->node, alias, &streamid,
 				  IORT_IOMMU_TYPE);
-	return iort_iommu_xlate(info->dev, parent, streamid);
+	return iort_iommu_xlate(info->fwspec, info->dev, parent, streamid);
 }
 
 static void iort_named_component_init(struct device *dev,
@@ -1280,7 +1276,8 @@ static void iort_named_component_init(struct device *dev,
 		dev_warn(dev, "Could not add device properties\n");
 }
 
-static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
+static int iort_nc_iommu_map(struct iommu_fwspec *fwspec, struct device *dev,
+			     struct acpi_iort_node *node)
 {
 	struct acpi_iort_node *parent;
 	int err = -ENODEV, i = 0;
@@ -1293,13 +1290,13 @@ static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
 						   i++);
 
 		if (parent)
-			err = iort_iommu_xlate(dev, parent, streamid);
+			err = iort_iommu_xlate(fwspec, dev, parent, streamid);
 	} while (parent && !err);
 
 	return err;
 }
 
-static int iort_nc_iommu_map_id(struct device *dev,
+static int iort_nc_iommu_map_id(struct iommu_fwspec *fwspec, struct device *dev,
 				struct acpi_iort_node *node,
 				const u32 *in_id)
 {
@@ -1308,7 +1305,7 @@ static int iort_nc_iommu_map_id(struct device *dev,
 
 	parent = iort_node_map_id(node, *in_id, &streamid, IORT_IOMMU_TYPE);
 	if (parent)
-		return iort_iommu_xlate(dev, parent, streamid);
+		return iort_iommu_xlate(fwspec, dev, parent, streamid);
 
 	return -ENODEV;
 }
@@ -1317,20 +1314,22 @@ static int iort_nc_iommu_map_id(struct device *dev,
 /**
  * iort_iommu_configure_id - Set-up IOMMU configuration for a device.
  *
+ * @fwspec: The iommu_fwspec to fill in with fw information about the device
  * @dev: device to configure
  * @id_in: optional input id const value pointer
  *
  * Returns: 0 on success, <0 on failure
  */
-int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+int iort_iommu_configure_id(struct iommu_fwspec *fwspec, struct device *dev,
+			    const u32 *id_in)
 {
 	struct acpi_iort_node *node;
 	int err = -ENODEV;
 
 	if (dev_is_pci(dev)) {
-		struct iommu_fwspec *fwspec;
 		struct pci_bus *bus = to_pci_dev(dev)->bus;
-		struct iort_pci_alias_info info = { .dev = dev };
+		struct iort_pci_alias_info info = { .dev = dev,
+						    .fwspec = fwspec };
 
 		node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX,
 				      iort_match_node_callback, &bus->dev);
@@ -1341,8 +1340,7 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
 
-		fwspec = dev_iommu_fwspec_get(dev);
-		if (fwspec && iort_pci_rc_supports_ats(node))
+		if (iort_pci_rc_supports_ats(node))
 			fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
 	} else {
 		node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
@@ -1350,8 +1348,8 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 		if (!node)
 			return -ENODEV;
 
-		err = id_in ? iort_nc_iommu_map_id(dev, node, id_in) :
-			      iort_nc_iommu_map(dev, node);
+		err = id_in ? iort_nc_iommu_map_id(fwspec, dev, node, id_in) :
+			      iort_nc_iommu_map(fwspec, dev, node);
 
 		if (!err)
 			iort_named_component_init(dev, node);
@@ -1363,8 +1361,6 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 #else
 void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head)
 { }
-int iort_iommu_configure_id(struct device *dev, const u32 *input_id)
-{ return -ENODEV; }
 #endif
 
 static int nc_dma_get_range(struct device *dev, u64 *size)
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d171d193f2a51c..5d467ff58ff24b 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1543,74 +1543,67 @@ int acpi_dma_get_range(struct device *dev, const struct bus_dma_region **map)
 }
 
 #ifdef CONFIG_IOMMU_API
-int acpi_iommu_fwspec_init(struct device *dev, u32 id,
-			   struct fwnode_handle *fwnode,
-			   const struct iommu_ops *ops)
+int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
+			   u32 id, struct fwnode_handle *fwnode,
+			   bool iommu_driver_available)
 {
-	int ret = iommu_fwspec_init(dev, fwnode, ops);
+	int ret;
 
-	if (!ret)
-		ret = iommu_fwspec_add_ids(dev, &id, 1);
-
-	return ret;
-}
-
-static inline const struct iommu_ops *acpi_iommu_fwspec_ops(struct device *dev)
-{
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-	return fwspec ? fwspec->ops : NULL;
+	ret = iommu_fwspec_assign_iommu(fwspec, dev, fwnode);
+	if (ret) {
+		if (ret == -EPROBE_DEFER && !iommu_driver_available)
+			return -ENODEV;
+		return ret;
+	}
+	return iommu_fwspec_append_ids(fwspec, &id, 1);
 }
 
 static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
 	int err;
-	const struct iommu_ops *ops;
+	struct iommu_fwspec *fwspec;
 
-	/*
-	 * If we already translated the fwspec there is nothing left to do,
-	 * return the iommu_ops.
-	 */
-	ops = acpi_iommu_fwspec_ops(dev);
-	if (ops)
-		return 0;
+	fwspec = iommu_fwspec_alloc();
+	if (IS_ERR(fwspec))
+		return PTR_ERR(fwspec);
 
-	err = iort_iommu_configure_id(dev, id_in);
-	if (err && err != -EPROBE_DEFER)
-		err = viot_iommu_configure(dev);
+	err = iort_iommu_configure_id(fwspec, dev, id_in);
+	if (err == -ENODEV)
+		err = viot_iommu_configure(fwspec, dev);
+	if (err == -ENODEV || err == -EPROBE_DEFER)
+		goto err_free;
+	if (err)
+		goto err_log;
 
-	/*
-	 * If we have reason to believe the IOMMU driver missed the initial
-	 * iommu_probe_device() call for dev, replay it to get things in order.
-	 */
-	if (!err && dev->bus)
-		err = iommu_probe_device(dev);
-
-	/* Ignore all other errors apart from EPROBE_DEFER */
-	if (err == -EPROBE_DEFER) {
-		return err;
-	} else if (err) {
-		dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-		return -ENODEV;
+	err = iommu_probe_device_fwspec(dev, fwspec);
+	if (err) {
+		/*
+		 * Ownership for fwspec always passes into
+		 * iommu_probe_device_fwspec()
+		 */
+		fwspec = NULL;
+		goto err_log;
 	}
-	if (!acpi_iommu_fwspec_ops(dev))
-		return -ENODEV;
-	return 0;
+
+err_log:
+	dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
+err_free:
+	iommu_fwspec_dealloc(fwspec);
+	return err;
 }
 
 #else /* !CONFIG_IOMMU_API */
 
-int acpi_iommu_fwspec_init(struct device *dev, u32 id,
-			   struct fwnode_handle *fwnode,
-			   const struct iommu_ops *ops)
+int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
+			   u32 id, struct fwnode_handle *fwnode,
+			   bool iommu_driver_available)
 {
 	return -ENODEV;
 }
 
-static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
-						       const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
-	return NULL;
+	return -ENODEV;
 }
 
 #endif /* !CONFIG_IOMMU_API */
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
index c8025921c129b2..1d0da99e65e6af 100644
--- a/drivers/acpi/viot.c
+++ b/drivers/acpi/viot.c
@@ -304,11 +304,9 @@ void __init acpi_viot_init(void)
 	acpi_put_table(hdr);
 }
 
-static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
-			       u32 epid)
+static int viot_dev_iommu_init(struct iommu_fwspec *fwspec, struct device *dev,
+			       struct viot_iommu *viommu, u32 epid)
 {
-	const struct iommu_ops *ops;
-
 	if (!viommu)
 		return -ENODEV;
 
@@ -316,19 +314,20 @@ static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
 	if (device_match_fwnode(dev, viommu->fwnode))
 		return -EINVAL;
 
-	ops = iommu_ops_from_fwnode(viommu->fwnode);
-	if (!ops)
-		return IS_ENABLED(CONFIG_VIRTIO_IOMMU) ?
-			-EPROBE_DEFER : -ENODEV;
-
-	return acpi_iommu_fwspec_init(dev, epid, viommu->fwnode, ops);
+	return acpi_iommu_fwspec_init(fwspec, dev, epid, viommu->fwnode,
+				      IS_ENABLED(CONFIG_VIRTIO_IOMMU));
 }
 
+struct viot_pci_alias_info {
+	struct device *dev;
+	struct iommu_fwspec *fwspec;
+};
+
 static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
 {
 	u32 epid;
 	struct viot_endpoint *ep;
-	struct device *aliased_dev = data;
+	struct viot_pci_alias_info *info = data;
 	u32 domain_nr = pci_domain_nr(pdev->bus);
 
 	list_for_each_entry(ep, &viot_pci_ranges, list) {
@@ -339,14 +338,15 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
 			epid = ((domain_nr - ep->segment_start) << 16) +
 				dev_id - ep->bdf_start + ep->endpoint_id;
 
-			return viot_dev_iommu_init(aliased_dev, ep->viommu,
-						   epid);
+			return viot_dev_iommu_init(info->fwspec, info->dev,
+						   ep->viommu, epid);
 		}
 	}
 	return -ENODEV;
 }
 
-static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
+static int viot_mmio_dev_iommu_init(struct iommu_fwspec *fwspec,
+				    struct platform_device *pdev)
 {
 	struct resource *mem;
 	struct viot_endpoint *ep;
@@ -357,24 +357,29 @@ static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
 
 	list_for_each_entry(ep, &viot_mmio_endpoints, list) {
 		if (ep->address == mem->start)
-			return viot_dev_iommu_init(&pdev->dev, ep->viommu,
-						   ep->endpoint_id);
+			return viot_dev_iommu_init(fwspec, &pdev->dev,
+						   ep->viommu, ep->endpoint_id);
 	}
 	return -ENODEV;
 }
 
 /**
  * viot_iommu_configure - Setup IOMMU ops for an endpoint described by VIOT
+ * @fwspec: The iommu_fwspec to fill in with fw information about the device
  * @dev: the endpoint
  *
  * Return: 0 on success, <0 on failure
  */
-int viot_iommu_configure(struct device *dev)
+int viot_iommu_configure(struct iommu_fwspec *fwspec, struct device *dev)
 {
-	if (dev_is_pci(dev))
+	if (dev_is_pci(dev)) {
+		struct viot_pci_alias_info info = { .dev = dev,
+						    .fwspec = fwspec };
 		return pci_for_each_dma_alias(to_pci_dev(dev),
-					      viot_pci_dev_iommu_init, dev);
+					      viot_pci_dev_iommu_init, &info);
+	}
 	else if (dev_is_platform(dev))
-		return viot_mmio_dev_iommu_init(to_platform_device(dev));
+		return viot_mmio_dev_iommu_init(fwspec,
+						to_platform_device(dev));
 	return -ENODEV;
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2076345d0d6653..f7bda1c0959d34 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2943,9 +2943,8 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return ops;
 }
 
-static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
-				     struct device *dev,
-				     struct fwnode_handle *iommu_fwnode)
+int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec, struct device *dev,
+			      struct fwnode_handle *iommu_fwnode)
 {
 	const struct iommu_ops *ops;
 
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index afeed6e72049e4..4197c0004a30b0 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -12,6 +12,8 @@
 #include <linux/device.h>
 #include <linux/property.h>
 
+struct iommu_fwspec;
+
 struct acpi_handle_list {
 	u32 count;
 	acpi_handle* handles;
@@ -628,9 +630,9 @@ struct acpi_pci_root {
 
 bool acpi_dma_supported(const struct acpi_device *adev);
 enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
-int acpi_iommu_fwspec_init(struct device *dev, u32 id,
-			   struct fwnode_handle *fwnode,
-			   const struct iommu_ops *ops);
+int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
+			   u32 id, struct fwnode_handle *fwnode,
+			   bool iommu_driver_available);
 int acpi_dma_get_range(struct device *dev, const struct bus_dma_region **map);
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
 			   const u32 *input_id);
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1cb65592c95dd3..7644922ecb0705 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -11,6 +11,8 @@
 #include <linux/fwnode.h>
 #include <linux/irqdomain.h>
 
+struct iommu_fwspec;
+
 #define IORT_IRQ_MASK(irq)		(irq & 0xffffffffULL)
 #define IORT_IRQ_TRIGGER_MASK(irq)	((irq >> 32) & 0xffffffffULL)
 
@@ -40,7 +42,8 @@ void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode,
 		       struct list_head *head);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
-int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
+int iort_iommu_configure_id(struct iommu_fwspec *fwspec, struct device *dev,
+			    const u32 *id_in);
 void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
@@ -57,7 +60,8 @@ void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head *hea
 /* IOMMU interface */
 static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
 { return -ENODEV; }
-static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+static inline int iort_iommu_configure_id(struct iommu_fwspec *fwspec,
+					  struct device *dev, const u32 *id_in)
 { return -ENODEV; }
 static inline
 void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head)
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
index a5a12243156377..f1874cb6d43c09 100644
--- a/include/linux/acpi_viot.h
+++ b/include/linux/acpi_viot.h
@@ -8,11 +8,12 @@
 #ifdef CONFIG_ACPI_VIOT
 void __init acpi_viot_early_init(void);
 void __init acpi_viot_init(void);
-int viot_iommu_configure(struct device *dev);
+int viot_iommu_configure(struct iommu_fwspec *fwspec, struct device *dev);
 #else
 static inline void acpi_viot_early_init(void) {}
 static inline void acpi_viot_init(void) {}
-static inline int viot_iommu_configure(struct device *dev)
+static inline int viot_iommu_configure(struct iommu_fwspec *fwspec,
+				       struct device *dev)
 {
 	return -ENODEV;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index bbbba7d0a159ba..72ec71bd31a376 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -818,6 +818,8 @@ void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
 int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
 			  struct fwnode_handle *iommu_fwnode,
 			  struct of_phandle_args *iommu_spec);
+int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec, struct device *dev,
+			      struct fwnode_handle *iommu_fwnode);
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 11/17] iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 10/17] acpi: Do not use dev->iommu within acpi_iommu_configure() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This resolves the race around touching dev->iommu while generating the OF
fwspec on the of_iommu_configure() flow:

     CPU0                                     CPU1
of_iommu_configure()                iommu_device_register()
 ..                                   bus_iommu_probe()
  iommu_fwspec_of_xlate()              __iommu_probe_device()
                                        iommu_init_device()
   dev_iommu_get()
                                          .. ops->probe fails, no fwspec ..
                                          dev_iommu_free()
   dev->iommu->fwspec    *crash*

CPU1 is holding the iommu_probe_device_lock for iommu_init_device(),
holding it around the of_xlate() and its related manipulation of
dev->iommu will close it.

The approach also closes a similar race for what should be a successful
probe where the above basic construction results in ops->probe observing a
partially initialized fwspec.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Closes: https://lore.kernel.org/linux-arm-kernel/20231017163337.GE282036@ziepe.ca/T/#mee0d7bdc375541934a571ae69f43b9660f8e7312
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f7bda1c0959d34..5af98cad06f9ef 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -41,6 +41,7 @@
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 static DEFINE_IDA(iommu_global_pasid_ida);
+static DEFINE_MUTEX(iommu_probe_device_lock);
 
 static unsigned int iommu_def_domain_type __read_mostly;
 static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
@@ -498,7 +499,6 @@ static int __iommu_probe_device(struct device *dev,
 	struct iommu_fwspec *fwspec = caller_fwspec;
 	const struct iommu_ops *ops;
 	struct iommu_group *group;
-	static DEFINE_MUTEX(iommu_probe_device_lock);
 	struct group_device *gdev;
 	int ret;
 
@@ -2985,8 +2985,11 @@ int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
 	if (!fwspec->ops->of_xlate)
 		return -ENODEV;
 
-	if (!dev_iommu_get(dev))
+	mutex_lock(&iommu_probe_device_lock);
+	if (!dev_iommu_get(dev)) {
+		mutex_unlock(&iommu_probe_device_lock);
 		return -ENOMEM;
+	}
 
 	/*
 	 * ops->of_xlate() requires the fwspec to be passed through dev->iommu,
@@ -2998,6 +3001,7 @@ int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
 	ret = fwspec->ops->of_xlate(dev, iommu_spec);
 	if (dev->iommu->fwspec == fwspec)
 		dev->iommu->fwspec = NULL;
+	mutex_unlock(&iommu_probe_device_lock);
 	return ret;
 }
 
@@ -3027,6 +3031,8 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	int ret;
 
+	lockdep_assert_held(&iommu_probe_device_lock);
+
 	if (fwspec)
 		return ops == fwspec->ops ? 0 : -EINVAL;
 
@@ -3080,6 +3086,8 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 {
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
+	lockdep_assert_held(&iommu_probe_device_lock);
+
 	if (!fwspec)
 		return -EINVAL;
 	return iommu_fwspec_append_ids(fwspec, ids, num_ids);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 11/17] iommu: Hold iommu_probe_device_lock while calling ops->of_xlate Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 15:09   ` Jerry Snitselaar
  2023-11-16 14:36   ` Moritz Fischer
  2023-11-15 14:06 ` [PATCH v2 13/17] iommu: Remove dev_iommu_fwspec_set() Jason Gunthorpe
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

There are no external callers now.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 3 ++-
 include/linux/iommu.h | 7 -------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5af98cad06f9ef..ea6aede326131e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2928,7 +2928,8 @@ bool iommu_default_passthrough(void)
 }
 EXPORT_SYMBOL_GPL(iommu_default_passthrough);
 
-const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
+static const struct iommu_ops *
+iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 {
 	const struct iommu_ops *ops = NULL;
 	struct iommu_device *iommu;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 72ec71bd31a376..05c5ad6bad6339 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -831,7 +831,6 @@ static inline void iommu_fwspec_free(struct device *dev)
 	dev->iommu->fwspec = NULL;
 }
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
-const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 int iommu_fwspec_append_ids(struct iommu_fwspec *fwspec, u32 *ids, int num_ids);
 
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
@@ -1187,12 +1186,6 @@ static inline int iommu_fwspec_add_ids(struct device *dev, u32 *ids,
 	return -ENODEV;
 }
 
-static inline
-const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
-{
-	return NULL;
-}
-
 static inline int
 iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 13/17] iommu: Remove dev_iommu_fwspec_set()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (11 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 14/17] iommu: Remove pointless iommu_fwspec_free() Jason Gunthorpe
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

This is only used internally to iommu.c now, get rid of it to discourage
things outside iommu.c from trying to manipulate dev->iommu->fwspec.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 2 +-
 include/linux/iommu.h | 6 ------
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ea6aede326131e..8fc3d0ff881260 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3051,7 +3051,7 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		return ret;
 	}
 
-	dev_iommu_fwspec_set(dev, fwspec);
+	dev->iommu->fwspec = fwspec;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_init);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 05c5ad6bad6339..352070c3ab3126 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -841,12 +841,6 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 		return NULL;
 }
 
-static inline void dev_iommu_fwspec_set(struct device *dev,
-					struct iommu_fwspec *fwspec)
-{
-	dev->iommu->fwspec = fwspec;
-}
-
 static inline void *dev_iommu_priv_get(struct device *dev)
 {
 	if (dev->iommu)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 14/17] iommu: Remove pointless iommu_fwspec_free()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (12 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 13/17] iommu: Remove dev_iommu_fwspec_set() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 15/17] iommu: Add ops->of_xlate_fwspec() Jason Gunthorpe
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

These days the core code will free the fwspec if probe fails, no reason
for any driver to call this on a probe failure path.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 14 +++++---------
 drivers/iommu/tegra-smmu.c            |  1 -
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d6d1a2a55cc069..854efcb1b84ddf 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1348,6 +1348,8 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 
 	if (using_legacy_binding) {
 		ret = arm_smmu_register_legacy_master(dev, &smmu);
+		if (ret)
+			return ERR_PTR(ret);
 
 		/*
 		 * If dev->iommu_fwspec is initally NULL, arm_smmu_register_legacy_master()
@@ -1355,15 +1357,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 		 * later use.
 		 */
 		fwspec = dev_iommu_fwspec_get(dev);
-		if (ret)
-			goto out_free;
 	} else if (fwspec && fwspec->ops == &arm_smmu_ops) {
 		smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
 	} else {
 		return ERR_PTR(-ENODEV);
 	}
 
-	ret = -EINVAL;
 	for (i = 0; i < fwspec->num_ids; i++) {
 		u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
 		u16 mask = FIELD_GET(ARM_SMMU_SMR_MASK, fwspec->ids[i]);
@@ -1371,20 +1370,19 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 		if (sid & ~smmu->streamid_mask) {
 			dev_err(dev, "stream ID 0x%x out of range for SMMU (0x%x)\n",
 				sid, smmu->streamid_mask);
-			goto out_free;
+			return ERR_PTR(-EINVAL);
 		}
 		if (mask & ~smmu->smr_mask_mask) {
 			dev_err(dev, "SMR mask 0x%x out of range for SMMU (0x%x)\n",
 				mask, smmu->smr_mask_mask);
-			goto out_free;
+			return ERR_PTR(-EINVAL);
 		}
 	}
 
-	ret = -ENOMEM;
 	cfg = kzalloc(offsetof(struct arm_smmu_master_cfg, smendx[i]),
 		      GFP_KERNEL);
 	if (!cfg)
-		goto out_free;
+		return ERR_PTR(-ENOMEM);
 
 	cfg->smmu = smmu;
 	dev_iommu_priv_set(dev, cfg);
@@ -1408,8 +1406,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 
 out_cfg_free:
 	kfree(cfg);
-out_free:
-	iommu_fwspec_free(dev);
 	return ERR_PTR(ret);
 }
 
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 310871728ab4b6..e3101aa2f35689 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -844,7 +844,6 @@ static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
 	err = ops->of_xlate(dev, args);
 	if (err < 0) {
 		dev_err(dev, "failed to parse SW group ID: %d\n", err);
-		iommu_fwspec_free(dev);
 		return err;
 	}
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 15/17] iommu: Add ops->of_xlate_fwspec()
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (13 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 14/17] iommu: Remove pointless iommu_fwspec_free() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 16/17] iommu: Mark dev_iommu_get() with lockdep Jason Gunthorpe
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

The new callback takes in the fwspec instead of retrieving it from the
dev->iommu. Provide iommu_fwspec_append_ids() to work directly on the
fwspec.

Convert SMMU, SMMUv3, and virtio to use iommu_fwspec_append_ids() and the
new entry point.

This avoids having to touch dev->iommu at all, and doesn't require the
iommu_probe_device_lock.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 8 +++++---
 drivers/iommu/iommu.c                       | 3 +++
 drivers/iommu/virtio-iommu.c                | 8 +++++---
 include/linux/iommu.h                       | 3 +++
 5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7445454c2af244..b1309f04ebc0d9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2748,9 +2748,11 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 	return ret;
 }
 
-static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int arm_smmu_of_xlate_fwspec(struct iommu_fwspec *fwspec,
+				    struct device *dev,
+				    struct of_phandle_args *args)
 {
-	return iommu_fwspec_add_ids(dev, args->args, 1);
+	return iommu_fwspec_append_ids(fwspec, args->args, 1);
 }
 
 static void arm_smmu_get_resv_regions(struct device *dev,
@@ -2858,7 +2860,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-	.of_xlate		= arm_smmu_of_xlate,
+	.of_xlate_fwspec	= arm_smmu_of_xlate_fwspec,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.remove_dev_pasid	= arm_smmu_remove_dev_pasid,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 854efcb1b84ddf..8c4a60d8e5d522 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1510,7 +1510,9 @@ static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
 	return ret;
 }
 
-static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int arm_smmu_of_xlate_fwspec(struct iommu_fwspec *fwspec,
+				    struct device *dev,
+				    struct of_phandle_args *args)
 {
 	u32 mask, fwid = 0;
 
@@ -1522,7 +1524,7 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 	else if (!of_property_read_u32(args->np, "stream-match-mask", &mask))
 		fwid |= FIELD_PREP(ARM_SMMU_SMR_MASK, mask);
 
-	return iommu_fwspec_add_ids(dev, &fwid, 1);
+	return iommu_fwspec_append_ids(fwspec, &fwid, 1);
 }
 
 static void arm_smmu_get_resv_regions(struct device *dev,
@@ -1562,7 +1564,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.release_device		= arm_smmu_release_device,
 	.probe_finalize		= arm_smmu_probe_finalize,
 	.device_group		= arm_smmu_device_group,
-	.of_xlate		= arm_smmu_of_xlate,
+	.of_xlate_fwspec	= arm_smmu_of_xlate_fwspec,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.def_domain_type	= arm_smmu_def_domain_type,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8fc3d0ff881260..de6dcb244bff4a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2983,6 +2983,9 @@ int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
 	if (ret)
 		return ret;
 
+	if (fwspec->ops->of_xlate_fwspec)
+		return fwspec->ops->of_xlate_fwspec(fwspec, dev, iommu_spec);
+
 	if (!fwspec->ops->of_xlate)
 		return -ENODEV;
 
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 379ebe03efb6d4..2283f1d1155981 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1027,9 +1027,11 @@ static struct iommu_group *viommu_device_group(struct device *dev)
 		return generic_device_group(dev);
 }
 
-static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int viommu_of_xlate_fwspec(struct iommu_fwspec *fwspec,
+				  struct device *dev,
+				  struct of_phandle_args *args)
 {
-	return iommu_fwspec_add_ids(dev, args->args, 1);
+	return iommu_fwspec_append_ids(fwspec, args->args, 1);
 }
 
 static bool viommu_capable(struct device *dev, enum iommu_cap cap)
@@ -1050,7 +1052,7 @@ static struct iommu_ops viommu_ops = {
 	.release_device		= viommu_release_device,
 	.device_group		= viommu_device_group,
 	.get_resv_regions	= viommu_get_resv_regions,
-	.of_xlate		= viommu_of_xlate,
+	.of_xlate_fwspec	= viommu_of_xlate_fwspec,
 	.owner			= THIS_MODULE,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev		= viommu_attach_dev,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 352070c3ab3126..3495db0c3e4631 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -43,6 +43,7 @@ struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
 struct iommu_dma_cookie;
+struct iommu_fwspec;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
@@ -395,6 +396,8 @@ struct iommu_ops {
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
 
+	int (*of_xlate_fwspec)(struct iommu_fwspec *fwspec, struct device *dev,
+			       struct of_phandle_args *args);
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 	bool (*is_attach_deferred)(struct device *dev);
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 16/17] iommu: Mark dev_iommu_get() with lockdep
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (14 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 15/17] iommu: Add ops->of_xlate_fwspec() Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:06 ` [PATCH v2 17/17] iommu: Mark dev_iommu_priv_set() with a lockdep Jason Gunthorpe
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

Allocation of dev->iommu must be done under the
iommu_probe_device_lock. Mark this with lockdep to discourage future
mistakes.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index de6dcb244bff4a..34c4b07a6aafae 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -345,6 +345,8 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
 {
 	struct dev_iommu *param = dev->iommu;
 
+	lockdep_assert_held(&iommu_probe_device_lock);
+
 	if (param)
 		return param;
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 17/17] iommu: Mark dev_iommu_priv_set() with a lockdep
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (15 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 16/17] iommu: Mark dev_iommu_get() with lockdep Jason Gunthorpe
@ 2023-11-15 14:06 ` Jason Gunthorpe
  2023-11-15 14:54 ` [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jerry Snitselaar
  2023-11-15 15:22 ` Robin Murphy
  18 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 14:06 UTC (permalink / raw)
  To: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

A perfect driver would only call dev_iommu_priv_set() from its probe
callback. We've made it functionally correct to call it from the of_xlate
by adding a lock around that call.

lockdep assert that iommu_probe_device_lock is held to discourage misuse.

Exclude PPC kernels with CONFIG_FSL_PAMU turned on because FSL_PAMU uses a
global static for its priv and abuses priv for its domain.

Remove the pointless stores of NULL, all these are on paths where the core
code will free dev->iommu after the op returns.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/iommu.c                   | 2 --
 drivers/iommu/apple-dart.c                  | 1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 1 -
 drivers/iommu/intel/iommu.c                 | 2 --
 drivers/iommu/iommu.c                       | 9 +++++++++
 drivers/iommu/omap-iommu.c                  | 1 -
 include/linux/iommu.h                       | 5 +----
 8 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index fcc987f5d4edc3..8199c678c2dc2a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -551,8 +551,6 @@ static void amd_iommu_uninit_device(struct device *dev)
 	if (dev_data->domain)
 		detach_device(dev);
 
-	dev_iommu_priv_set(dev, NULL);
-
 	/*
 	 * We keep dev_data around for unplugged devices and reuse it when the
 	 * device is re-plugged - not doing so would introduce a ton of races.
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..56cfc33042e0b5 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -740,7 +740,6 @@ static void apple_dart_release_device(struct device *dev)
 {
 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 
-	dev_iommu_priv_set(dev, NULL);
 	kfree(cfg);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b1309f04ebc0d9..df81fcd25a75b0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2698,7 +2698,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 
 err_free_master:
 	kfree(master);
-	dev_iommu_priv_set(dev, NULL);
 	return ERR_PTR(ret);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 8c4a60d8e5d522..6fc040a4168aa3 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1423,7 +1423,6 @@ static void arm_smmu_release_device(struct device *dev)
 
 	arm_smmu_rpm_put(cfg->smmu);
 
-	dev_iommu_priv_set(dev, NULL);
 	kfree(cfg);
 }
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3531b956556c7d..4a5792888e6433 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4457,7 +4457,6 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 		ret = intel_pasid_alloc_table(dev);
 		if (ret) {
 			dev_err(dev, "PASID table allocation failed\n");
-			dev_iommu_priv_set(dev, NULL);
 			kfree(info);
 			return ERR_PTR(ret);
 		}
@@ -4475,7 +4474,6 @@ static void intel_iommu_release_device(struct device *dev)
 	dmar_remove_one_dev_info(dev);
 	intel_pasid_free_table(dev);
 	intel_iommu_debugfs_remove_dev(info);
-	dev_iommu_priv_set(dev, NULL);
 	kfree(info);
 	set_dma_ops(dev, NULL);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 34c4b07a6aafae..fbfb9ba4da0ee2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -387,6 +387,15 @@ static u32 dev_iommu_get_max_pasids(struct device *dev)
 	return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
 }
 
+void dev_iommu_priv_set(struct device *dev, void *priv)
+{
+	/* FSL_PAMU does something weird */
+	if (!IS_ENABLED(CONFIG_FSL_PAMU))
+		lockdep_assert_held(&iommu_probe_device_lock);
+	dev->iommu->priv = priv;
+}
+EXPORT_SYMBOL_GPL(dev_iommu_priv_set);
+
 /*
  * Init the dev->iommu and dev->iommu_group in the struct device and get the
  * driver probed. Take ownership of fwspec, it always freed on error
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c66b070841dd41..c9528065a59afa 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1719,7 +1719,6 @@ static void omap_iommu_release_device(struct device *dev)
 	if (!dev->of_node || !arch_data)
 		return;
 
-	dev_iommu_priv_set(dev, NULL);
 	kfree(arch_data);
 
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3495db0c3e4631..8be153a54c8ca2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -852,10 +852,7 @@ static inline void *dev_iommu_priv_get(struct device *dev)
 		return NULL;
 }
 
-static inline void dev_iommu_priv_set(struct device *dev, void *priv)
-{
-	dev->iommu->priv = priv;
-}
+void dev_iommu_priv_set(struct device *dev, void *priv);
 
 int iommu_probe_device_fwspec(struct device *dev, struct iommu_fwspec *fwspec);
 static inline int iommu_probe_device(struct device *dev)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure()
  2023-11-15 14:05 ` [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure() Jason Gunthorpe
@ 2023-11-15 14:42   ` Jerry Snitselaar
  0 siblings, 0 replies; 35+ messages in thread
From: Jerry Snitselaar @ 2023-11-15 14:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Moritz Fischer, Zhenhua Huang,
	Rafael J. Wysocki, Rob Herring

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
  2023-11-15 14:05 ` [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() Jason Gunthorpe
@ 2023-11-15 14:45   ` Jerry Snitselaar
  0 siblings, 0 replies; 35+ messages in thread
From: Jerry Snitselaar @ 2023-11-15 14:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Moritz Fischer, Zhenhua Huang,
	Rafael J. Wysocki, Rob Herring

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (16 preceding siblings ...)
  2023-11-15 14:06 ` [PATCH v2 17/17] iommu: Mark dev_iommu_priv_set() with a lockdep Jason Gunthorpe
@ 2023-11-15 14:54 ` Jerry Snitselaar
  2023-11-15 15:22 ` Robin Murphy
  18 siblings, 0 replies; 35+ messages in thread
From: Jerry Snitselaar @ 2023-11-15 14:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Moritz Fischer, Zhenhua Huang,
	Rafael J. Wysocki, Rob Herring

Did patch 12 v2 get sent? I'm not seeing it locally, nor in lore, and b4 doesn't find it when pulling then thread.

Regards,
Jerry


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static
  2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
@ 2023-11-15 15:09   ` Jerry Snitselaar
  2023-11-16 14:36   ` Moritz Fischer
  1 sibling, 0 replies; 35+ messages in thread
From: Jerry Snitselaar @ 2023-11-15 15:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Moritz Fischer, Zhenhua Huang,
	Rafael J. Wysocki, Rob Herring

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
                   ` (17 preceding siblings ...)
  2023-11-15 14:54 ` [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jerry Snitselaar
@ 2023-11-15 15:22 ` Robin Murphy
  2023-11-15 15:36   ` Jason Gunthorpe
  18 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2023-11-15 15:22 UTC (permalink / raw)
  To: Jason Gunthorpe, acpica-devel, Alyssa Rosenzweig, Albert Ou,
	asahi, Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
> [Several people have tested this now, so it is something that should sit in
> linux-next for a while]

What's the aim here? This is obviously far, far too much for a stable 
fix, but then it's also not the refactoring we want for the future 
either, since it's moving in the wrong direction of cementing the 
fundamental brokenness further in place rather than getting any closer 
to removing it.

Thanks,
Robin.

> The iommu subsystem uses dev->iommu to store bits of information about the
> attached iommu driver. This has been co-opted by the ACPI/OF code to also
> be a place to pass around the iommu_fwspec before a driver is probed.
> 
> Since both are using the same pointers without any locking it triggers
> races if there is concurrent driver loading:
> 
>       CPU0                                     CPU1
> of_iommu_configure()                iommu_device_register()
>   ..                                   bus_iommu_probe()
>    iommu_fwspec_of_xlate()              __iommu_probe_device()
>                                          iommu_init_device()
>     dev_iommu_get()
>                                            .. ops->probe fails, no fwspec ..
>                                            dev_iommu_free()
>     dev->iommu->fwspec    *crash*
> 
> My first attempt get correct locking here was to use the device_lock to
> protect the entire *_iommu_configure() and iommu_probe() paths. This
> allowed safe use of dev->iommu within those paths. Unfortuately enough
> drivers abuse the of_iommu_configure() flow without proper locking and
> this approach failed.
> 
> This approach removes touches of dev->iommu from the *_iommu_configure()
> code. The few remaining required touches are moved into iommu.c and
> protected with the existing iommu_probe_device_lock.
> 
> To do this we change *_iommu_configure() to hold the iommu_fwspec on the
> stack while it is being built. Once it is fully formed the core code will
> install it into the dev->iommu when it calls probe.
> 
> This also removes all the touches of iommu_ops from
> the *_iommu_configure() paths and makes that mechanism private to the
> iommu core.
> 
> A few more lockdep assertions are added to discourage future mis-use.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec
> 
> v2:
>   - Fix all the kconfig randomization 0-day stuff
>   - Add missing kdoc parameters
>   - Remove NO_IOMMU, replace it with ENODEV
>   - Use PTR_ERR to print errno in the new/moved logging
> v1: https://lore.kernel.org/r/0-v1-5f734af130a3+34f-iommu_fwspec_jgg@nvidia.com
> 
> Jason Gunthorpe (17):
>    iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
>    iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
>    iommu/of: Use -ENODEV consistently in of_iommu_configure()
>    acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
>    iommu: Make iommu_fwspec->ids a distinct allocation
>    iommu: Add iommu_fwspec_alloc/dealloc()
>    iommu: Add iommu_probe_device_fwspec()
>    iommu/of: Do not use dev->iommu within of_iommu_configure()
>    iommu: Add iommu_fwspec_append_ids()
>    acpi: Do not use dev->iommu within acpi_iommu_configure()
>    iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
>    iommu: Make iommu_ops_from_fwnode() static
>    iommu: Remove dev_iommu_fwspec_set()
>    iommu: Remove pointless iommu_fwspec_free()
>    iommu: Add ops->of_xlate_fwspec()
>    iommu: Mark dev_iommu_get() with lockdep
>    iommu: Mark dev_iommu_priv_set() with a lockdep
> 
>   arch/arc/mm/dma.c                           |   2 +-
>   arch/arm/mm/dma-mapping-nommu.c             |   2 +-
>   arch/arm/mm/dma-mapping.c                   |  10 +-
>   arch/arm64/mm/dma-mapping.c                 |   4 +-
>   arch/mips/mm/dma-noncoherent.c              |   2 +-
>   arch/riscv/mm/dma-noncoherent.c             |   2 +-
>   drivers/acpi/arm64/iort.c                   |  42 ++--
>   drivers/acpi/scan.c                         | 104 +++++----
>   drivers/acpi/viot.c                         |  45 ++--
>   drivers/hv/hv_common.c                      |   2 +-
>   drivers/iommu/amd/iommu.c                   |   2 -
>   drivers/iommu/apple-dart.c                  |   1 -
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   9 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu.c       |  23 +-
>   drivers/iommu/intel/iommu.c                 |   2 -
>   drivers/iommu/iommu.c                       | 227 +++++++++++++++-----
>   drivers/iommu/of_iommu.c                    | 133 +++++-------
>   drivers/iommu/omap-iommu.c                  |   1 -
>   drivers/iommu/tegra-smmu.c                  |   1 -
>   drivers/iommu/virtio-iommu.c                |   8 +-
>   drivers/of/device.c                         |  24 ++-
>   include/acpi/acpi_bus.h                     |   8 +-
>   include/linux/acpi_iort.h                   |   8 +-
>   include/linux/acpi_viot.h                   |   5 +-
>   include/linux/dma-map-ops.h                 |   4 +-
>   include/linux/iommu.h                       |  47 ++--
>   include/linux/of_iommu.h                    |  13 +-
>   27 files changed, 424 insertions(+), 307 deletions(-)
> 
> 
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-15 15:22 ` Robin Murphy
@ 2023-11-15 15:36   ` Jason Gunthorpe
  2023-11-15 20:23     ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-15 15:36 UTC (permalink / raw)
  To: Robin Murphy
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On Wed, Nov 15, 2023 at 03:22:09PM +0000, Robin Murphy wrote:
> On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
> > [Several people have tested this now, so it is something that should sit in
> > linux-next for a while]
> 
> What's the aim here? This is obviously far, far too much for a
> stable fix,

To fix the locking bug and ugly abuse of dev->iommu?

I wouldn't say that, it is up to the people who care about this to
decide. It seems alot of people are hitting it so maybe it should be
backported in some situations. Regardless, we should not continue to
have this locking bug in v6.8.

> but then it's also not the refactoring we want for the future either, since
> it's moving in the wrong direction of cementing the fundamental brokenness
> further in place rather than getting any closer to removing it.

I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.

Do you also want to get rid of iommu_fwspec, or at least thin it out?
That seems reasonable too, I think that becomes within reach once
of_xlate is gone.

What do you see as "cemeting"?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-15 15:36   ` Jason Gunthorpe
@ 2023-11-15 20:23     ` Robin Murphy
  2023-11-16  4:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2023-11-15 20:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 03:22:09PM +0000, Robin Murphy wrote:
>> On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
>>> [Several people have tested this now, so it is something that should sit in
>>> linux-next for a while]
>>
>> What's the aim here? This is obviously far, far too much for a
>> stable fix,
> 
> To fix the locking bug and ugly abuse of dev->iommu?

Fixing the locking can be achieved by fixing the locking, as I have now 
demonstrated.

> I wouldn't say that, it is up to the people who care about this to
> decide. It seems alot of people are hitting it so maybe it should be
> backported in some situations. Regardless, we should not continue to
> have this locking bug in v6.8.
> 
>> but then it's also not the refactoring we want for the future either, since
>> it's moving in the wrong direction of cementing the fundamental brokenness
>> further in place rather than getting any closer to removing it.
> 
> I haven't seen patches or an outline on what you have in mind though?
> 
> In my view I would like to get rid of of_xlate(), at a minimum. It is
> a micro-optimization I don't think we need. I see a pretty
> straightforward path to get there from here.

Micro-optimisation!? OK, I think I have to say it. Please stop trying to 
rewrite code you don't understand.

> Do you also want to get rid of iommu_fwspec, or at least thin it out?
> That seems reasonable too, I think that becomes within reach once
> of_xlate is gone.
> 
> What do you see as "cemeting"?

Most of this series constitutes a giant sweeping redesign of a whole 
bunch of internal machinery to permit it to be used concurrently, where 
that concurrency should still not exist in the first place because the 
thing that allows it to happen also causes other problems like groups 
being broken. Once the real problem is fixed there will be no need for 
any of this, and at worst some of it will then actually get in the way.

I feel like I've explained it many times already, but what needs to 
happen is for the firmware parsing and of_xlate stage to be initiated by 
__iommu_probe_device() itself. The first step is my bus ops series (if 
I'm ever allowed to get it landed...) which gets to the state of 
expecting to start from a fwspec. Then it's a case of shuffling around 
what's currently in the bus_type dma_configure methods such that point 
is where the fwspec is created as well, and the driver-probe-time work 
is almost removed except for still deferring if a device is waiting for 
its IOMMU instance (since that instance turning up and registering will 
retrigger the rest itself). And there at last, a trivial lifecycle and 
access pattern for dev->iommu (with the overlapping bits of iommu_fwspec 
finally able to be squashed as well), and finally an end to 8 long and 
unfortunate years of calling things in the wrong order in ways they were 
never supposed to be.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-15 20:23     ` Robin Murphy
@ 2023-11-16  4:17       ` Jason Gunthorpe
  2023-11-21 16:06         ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-16  4:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On Wed, Nov 15, 2023 at 08:23:54PM +0000, Robin Murphy wrote:
> On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:
> > On Wed, Nov 15, 2023 at 03:22:09PM +0000, Robin Murphy wrote:
> > > On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
> > > > [Several people have tested this now, so it is something that should sit in
> > > > linux-next for a while]
> > > 
> > > What's the aim here? This is obviously far, far too much for a
> > > stable fix,
> > 
> > To fix the locking bug and ugly abuse of dev->iommu?
> 
> Fixing the locking can be achieved by fixing the locking, as I have now
> demonstrated.

Obviously. I rejected that right away because of how incredibly
wrongly layered and hacky it is to do something like that.

> > I haven't seen patches or an outline on what you have in mind though?
> > 
> > In my view I would like to get rid of of_xlate(), at a minimum. It is
> > a micro-optimization I don't think we need. I see a pretty
> > straightforward path to get there from here.
> 
> Micro-optimisation!? OK, I think I have to say it. Please stop trying to
> rewrite code you don't understand.

I understand it fine. The list of (fwnode_handle, of_phandle_args)
tuples doesn't change between when of_xlate is callled and when probe
is called. Probe can have the same list. As best I can tell the extra
ops avoids maybe some memory allocation, maybe an extra iteration.

What it does do is screw up alot of the drivers that seem to want to
allocate the per-device data in of_xlate and make it convoluted and
memory leaking buggy on error paths.

So, I would move toward having the driver's probe invoke a helper like:

   iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);

Which generates the same list of (fwnode_handle, of_phandle_args) that
was passed to of_xlate today, but is ordered sensibly within the
sequence of probe for what many drivers seem to want to do.

So, it is not so much that that the idea of of_xlate goes away, but
the specific op->of_xlate does, it gets shifted into a helper that
invokes the same function in a more logical spot.

The per-device data can be allocated at the top of probe and passed
through args to fix the lifetime bugs.

It is pretty simple to do.

> Most of this series constitutes a giant sweeping redesign of a whole bunch
> of internal machinery to permit it to be used concurrently, where that
> concurrency should still not exist in the first place because the thing that
> allows it to happen also causes other problems like groups being broken.
> Once the real problem is fixed there will be no need for any of this, and at
> worst some of it will then actually get in the way.

Not quite. This decouples two unrelated things into seperate
concerns. It is not so much about the concurrency but removing the
abuse of dev->iommu by code that has no need to touch it at all.

Decoupling makes moving code around easier since the relationships are
easier to reason about.

You can still allocated a fwnode, populate it, and do the rest of the
flow under a probe function just fine.
 
> I feel like I've explained it many times already, but what needs to happen
> is for the firmware parsing and of_xlate stage to be initiated by
> __iommu_probe_device() itself.

Yes, OK I see. I don't see a problem, I think this still a good
improvement even in that world it is undesirable to use dev->iommu as
a temporary, even if the locking can work.

> ever allowed to get it landed...) which gets to the state of
> expecting to

Repost it? Rc1 is out and you need to add one hunk to the new user
domain creation in iommufd.

> start from a fwspec. Then it's a case of shuffling around what's currently
> in the bus_type dma_configure methods such that point is where the fwspec is
> created as well, and the driver-probe-time work is almost removed except for
> still deferring if a device is waiting for its IOMMU instance (since that
> instance turning up and registering will retrigger the rest itself). And
> there at last, a trivial lifecycle and access pattern for dev->iommu (with
> the overlapping bits of iommu_fwspec finally able to be squashed as well),
> and finally an end to 8 long and unfortunate years of calling things in the
> wrong order in ways they were never supposed to be.

Having just gone through this all in detail I don't think it is as
entirely straightforward as this, the open coded callers to
of_dma_configure() are not going to be so nice to unravel.

Regardless, I don't see any worrying incompatibility with that
direction.

Cheers,
Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static
  2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
  2023-11-15 15:09   ` Jerry Snitselaar
@ 2023-11-16 14:36   ` Moritz Fischer
  1 sibling, 0 replies; 35+ messages in thread
From: Moritz Fischer @ 2023-11-16 14:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Robin Murphy,
	Sudeep Holla, Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On Wed, Nov 15, 2023 at 10:06:03AM -0400, Jason Gunthorpe wrote:
> There are no external callers now.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
> ---
>  drivers/iommu/iommu.c | 3 ++-
>  include/linux/iommu.h | 7 -------
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5af98cad06f9ef..ea6aede326131e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2928,7 +2928,8 @@ bool iommu_default_passthrough(void)
>  }
>  EXPORT_SYMBOL_GPL(iommu_default_passthrough);
>  
> -const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> +static const struct iommu_ops *
> +iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>  {
>  	const struct iommu_ops *ops = NULL;
>  	struct iommu_device *iommu;
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 72ec71bd31a376..05c5ad6bad6339 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -831,7 +831,6 @@ static inline void iommu_fwspec_free(struct device *dev)
>  	dev->iommu->fwspec = NULL;
>  }
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
> -const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>  int iommu_fwspec_append_ids(struct iommu_fwspec *fwspec, u32 *ids, int num_ids);
>  
>  static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
> @@ -1187,12 +1186,6 @@ static inline int iommu_fwspec_add_ids(struct device *dev, u32 *ids,
>  	return -ENODEV;
>  }
>  
> -static inline
> -const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> -{
> -	return NULL;
> -}
> -
>  static inline int
>  iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
>  {
> -- 
> 2.42.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-15 14:05 ` [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc() Jason Gunthorpe
@ 2023-11-19  8:10   ` Hector Martin
  2023-11-19  9:19     ` Hector Martin
  0 siblings, 1 reply; 35+ messages in thread
From: Hector Martin @ 2023-11-19  8:10 UTC (permalink / raw)
  To: Jason Gunthorpe, acpica-devel, Alyssa Rosenzweig, Albert Ou,
	asahi, Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon
  Cc: André Draszik, Lu Baolu, Christoph Hellwig, Jerry Snitselaar,
	Moritz Fischer, Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On 2023/11/15 23:05, Jason Gunthorpe wrote:
> Allow fwspec to exist independently from the dev->iommu by providing
> functions to allow allocating and freeing the raw struct iommu_fwspec.
> 
> Reflow the existing paths to call the new alloc/dealloc functions.
> 
> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/iommu.c | 82 ++++++++++++++++++++++++++++++++-----------
>  include/linux/iommu.h | 11 +++++-
>  2 files changed, 72 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 18a82a20934d53..86bbb9e75c7e03 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -361,10 +361,8 @@ static void dev_iommu_free(struct device *dev)
>  	struct dev_iommu *param = dev->iommu;
>  
>  	dev->iommu = NULL;
> -	if (param->fwspec) {
> -		fwnode_handle_put(param->fwspec->iommu_fwnode);
> -		kfree(param->fwspec);
> -	}
> +	if (param->fwspec)
> +		iommu_fwspec_dealloc(param->fwspec);
>  	kfree(param);
>  }
>  
> @@ -2920,10 +2918,61 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>  	return ops;
>  }
>  
> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
> +				     struct device *dev,
> +				     struct fwnode_handle *iommu_fwnode)
> +{
> +	const struct iommu_ops *ops;
> +
> +	if (fwspec->iommu_fwnode) {
> +		/*
> +		 * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
> +		 * case of multiple iommus for one device they must point to the
> +		 * same driver, checked via same ops.
> +		 */
> +		ops = iommu_ops_from_fwnode(iommu_fwnode);

This carries over a related bug from the original code: If a device has
two IOMMUs and the first one probes but the second one defers, ops will
be NULL here and the check will fail with EINVAL.

Adding a check for that case here fixes it:

		if (!ops)
			return driver_deferred_probe_check_state(dev);

With that, for the whole series:

Tested-by: Hector Martin <marcan@marcan.st>

I can't specifically test for the probe races the series intends to fix
though, since that bug we only hit extremely rarely. I'm just testing
that nothing breaks.

> +		if (fwspec->ops != ops)
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	if (!fwspec->ops) {
> +		ops = iommu_ops_from_fwnode(iommu_fwnode);
> +		if (!ops)
> +			return driver_deferred_probe_check_state(dev);
> +		fwspec->ops = ops;
> +	}
> +
> +	of_node_get(to_of_node(iommu_fwnode));
> +	fwspec->iommu_fwnode = iommu_fwnode;
> +	return 0;
> +}
> +
> +struct iommu_fwspec *iommu_fwspec_alloc(void)
> +{
> +	struct iommu_fwspec *fwspec;
> +
> +	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
> +	if (!fwspec)
> +		return ERR_PTR(-ENOMEM);
> +	return fwspec;
> +}
> +
> +void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec)
> +{
> +	if (!fwspec)
> +		return;
> +
> +	if (fwspec->iommu_fwnode)
> +		fwnode_handle_put(fwspec->iommu_fwnode);
> +	kfree(fwspec);
> +}
> +
>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>  		      const struct iommu_ops *ops)
>  {
>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	int ret;
>  
>  	if (fwspec)
>  		return ops == fwspec->ops ? 0 : -EINVAL;
> @@ -2931,29 +2980,22 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>  	if (!dev_iommu_get(dev))
>  		return -ENOMEM;
>  
> -	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
> -	if (!fwspec)
> -		return -ENOMEM;
> +	fwspec = iommu_fwspec_alloc();
> +	if (IS_ERR(fwspec))
> +		return PTR_ERR(fwspec);
>  
> -	of_node_get(to_of_node(iommu_fwnode));
> -	fwspec->iommu_fwnode = iommu_fwnode;
>  	fwspec->ops = ops;
> +	ret = iommu_fwspec_assign_iommu(fwspec, dev, iommu_fwnode);
> +	if (ret) {
> +		iommu_fwspec_dealloc(fwspec);
> +		return ret;
> +	}
> +
>  	dev_iommu_fwspec_set(dev, fwspec);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_init);
>  
> -void iommu_fwspec_free(struct device *dev)
> -{
> -	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> -
> -	if (fwspec) {
> -		fwnode_handle_put(fwspec->iommu_fwnode);
> -		kfree(fwspec);
> -		dev_iommu_fwspec_set(dev, NULL);
> -	}
> -}
> -EXPORT_SYMBOL_GPL(iommu_fwspec_free);
>  
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e98a4ca8f536b7..c7c68cb59aa4dc 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -813,9 +813,18 @@ struct iommu_sva {
>  	struct iommu_domain		*domain;
>  };
>  
> +struct iommu_fwspec *iommu_fwspec_alloc(void);
> +void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
> +
>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>  		      const struct iommu_ops *ops);
> -void iommu_fwspec_free(struct device *dev);
> +static inline void iommu_fwspec_free(struct device *dev)
> +{
> +	if (!dev->iommu)
> +		return;
> +	iommu_fwspec_dealloc(dev->iommu->fwspec);
> +	dev->iommu->fwspec = NULL;
> +}
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>  

- Hector

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-19  8:10   ` Hector Martin
@ 2023-11-19  9:19     ` Hector Martin
  2023-11-19 14:13       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Hector Martin @ 2023-11-19  9:19 UTC (permalink / raw)
  To: Jason Gunthorpe, acpica-devel, Alyssa Rosenzweig, Albert Ou,
	asahi, Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon



On 2023/11/19 17:10, Hector Martin wrote:
> On 2023/11/15 23:05, Jason Gunthorpe wrote:
>> Allow fwspec to exist independently from the dev->iommu by providing
>> functions to allow allocating and freeing the raw struct iommu_fwspec.
>>
>> Reflow the existing paths to call the new alloc/dealloc functions.
>>
>> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> ---
>>  drivers/iommu/iommu.c | 82 ++++++++++++++++++++++++++++++++-----------
>>  include/linux/iommu.h | 11 +++++-
>>  2 files changed, 72 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 18a82a20934d53..86bbb9e75c7e03 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -361,10 +361,8 @@ static void dev_iommu_free(struct device *dev)
>>  	struct dev_iommu *param = dev->iommu;
>>  
>>  	dev->iommu = NULL;
>> -	if (param->fwspec) {
>> -		fwnode_handle_put(param->fwspec->iommu_fwnode);
>> -		kfree(param->fwspec);
>> -	}
>> +	if (param->fwspec)
>> +		iommu_fwspec_dealloc(param->fwspec);
>>  	kfree(param);
>>  }
>>  
>> @@ -2920,10 +2918,61 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>>  	return ops;
>>  }
>>  
>> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
>> +				     struct device *dev,
>> +				     struct fwnode_handle *iommu_fwnode)
>> +{
>> +	const struct iommu_ops *ops;
>> +
>> +	if (fwspec->iommu_fwnode) {
>> +		/*
>> +		 * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
>> +		 * case of multiple iommus for one device they must point to the
>> +		 * same driver, checked via same ops.
>> +		 */
>> +		ops = iommu_ops_from_fwnode(iommu_fwnode);
> 
> This carries over a related bug from the original code: If a device has
> two IOMMUs and the first one probes but the second one defers, ops will
> be NULL here and the check will fail with EINVAL.
> 
> Adding a check for that case here fixes it:
> 
> 		if (!ops)
> 			return driver_deferred_probe_check_state(dev);
> 
> With that, for the whole series:
> 
> Tested-by: Hector Martin <marcan@marcan.st>
> 
> I can't specifically test for the probe races the series intends to fix
> though, since that bug we only hit extremely rarely. I'm just testing
> that nothing breaks.

Actually no, this fix is not sufficient. If the first IOMMU is ready
then the xlate path allocates dev->iommu, which then
__iommu_probe_device takes as a sign that all IOMMUs are ready and does
the device init. Then when the xlate comes along again after suceeding
with the second IOMMU, __iommu_probe_device sees the device is already
in a group and never initializes the second IOMMU, leaving the device
with only one IOMMU.

This patch fixes it, but honestly, at this point I have no idea how to
"properly" fix this. There is *way* too much subtlety in this whole
codepath.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2477dec29740..2e4baf0572e7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2935,6 +2935,12 @@ int iommu_fwspec_of_xlate(struct iommu_fwspec
*fwspec, struct device *dev,
 	int ret;

 	ret = iommu_fwspec_assign_iommu(fwspec, dev, iommu_fwnode);
+	if (ret == -EPROBE_DEFER) {
+		mutex_lock(&iommu_probe_device_lock);
+		if (dev->iommu)
+			dev_iommu_free(dev);
+		mutex_unlock(&iommu_probe_device_lock);
+	}
 	if (ret)
 		return ret;

> 
>> +		if (fwspec->ops != ops)
>> +			return -EINVAL;
>> +		return 0;
>> +	}
>> +
>> +	if (!fwspec->ops) {
>> +		ops = iommu_ops_from_fwnode(iommu_fwnode);
>> +		if (!ops)
>> +			return driver_deferred_probe_check_state(dev);
>> +		fwspec->ops = ops;
>> +	}
>> +
>> +	of_node_get(to_of_node(iommu_fwnode));
>> +	fwspec->iommu_fwnode = iommu_fwnode;
>> +	return 0;
>> +}
>> +
>> +struct iommu_fwspec *iommu_fwspec_alloc(void)
>> +{
>> +	struct iommu_fwspec *fwspec;
>> +
>> +	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
>> +	if (!fwspec)
>> +		return ERR_PTR(-ENOMEM);
>> +	return fwspec;
>> +}
>> +
>> +void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec)
>> +{
>> +	if (!fwspec)
>> +		return;
>> +
>> +	if (fwspec->iommu_fwnode)
>> +		fwnode_handle_put(fwspec->iommu_fwnode);
>> +	kfree(fwspec);
>> +}
>> +
>>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>>  		      const struct iommu_ops *ops)
>>  {
>>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> +	int ret;
>>  
>>  	if (fwspec)
>>  		return ops == fwspec->ops ? 0 : -EINVAL;
>> @@ -2931,29 +2980,22 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>>  	if (!dev_iommu_get(dev))
>>  		return -ENOMEM;
>>  
>> -	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
>> -	if (!fwspec)
>> -		return -ENOMEM;
>> +	fwspec = iommu_fwspec_alloc();
>> +	if (IS_ERR(fwspec))
>> +		return PTR_ERR(fwspec);
>>  
>> -	of_node_get(to_of_node(iommu_fwnode));
>> -	fwspec->iommu_fwnode = iommu_fwnode;
>>  	fwspec->ops = ops;
>> +	ret = iommu_fwspec_assign_iommu(fwspec, dev, iommu_fwnode);
>> +	if (ret) {
>> +		iommu_fwspec_dealloc(fwspec);
>> +		return ret;
>> +	}
>> +
>>  	dev_iommu_fwspec_set(dev, fwspec);
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_fwspec_init);
>>  
>> -void iommu_fwspec_free(struct device *dev)
>> -{
>> -	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> -
>> -	if (fwspec) {
>> -		fwnode_handle_put(fwspec->iommu_fwnode);
>> -		kfree(fwspec);
>> -		dev_iommu_fwspec_set(dev, NULL);
>> -	}
>> -}
>> -EXPORT_SYMBOL_GPL(iommu_fwspec_free);
>>  
>>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index e98a4ca8f536b7..c7c68cb59aa4dc 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -813,9 +813,18 @@ struct iommu_sva {
>>  	struct iommu_domain		*domain;
>>  };
>>  
>> +struct iommu_fwspec *iommu_fwspec_alloc(void);
>> +void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
>> +
>>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>>  		      const struct iommu_ops *ops);
>> -void iommu_fwspec_free(struct device *dev);
>> +static inline void iommu_fwspec_free(struct device *dev)
>> +{
>> +	if (!dev->iommu)
>> +		return;
>> +	iommu_fwspec_dealloc(dev->iommu->fwspec);
>> +	dev->iommu->fwspec = NULL;
>> +}
>>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>>  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>>  
> 
> - Hector
> 
> 

- Hector

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-19  9:19     ` Hector Martin
@ 2023-11-19 14:13       ` Jason Gunthorpe
  2023-11-21  6:47         ` Hector Martin
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-19 14:13 UTC (permalink / raw)
  To: Hector Martin
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon

On Sun, Nov 19, 2023 at 06:19:43PM +0900, Hector Martin wrote:
> >> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
> >> +				     struct device *dev,
> >> +				     struct fwnode_handle *iommu_fwnode)
> >> +{
> >> +	const struct iommu_ops *ops;
> >> +
> >> +	if (fwspec->iommu_fwnode) {
> >> +		/*
> >> +		 * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
> >> +		 * case of multiple iommus for one device they must point to the
> >> +		 * same driver, checked via same ops.
> >> +		 */
> >> +		ops = iommu_ops_from_fwnode(iommu_fwnode);
> > 
> > This carries over a related bug from the original code: If a device has
> > two IOMMUs and the first one probes but the second one defers, ops will
> > be NULL here and the check will fail with EINVAL.
> > 
> > Adding a check for that case here fixes it:
> > 
> > 		if (!ops)
> > 			return driver_deferred_probe_check_state(dev);

Yes!

> > With that, for the whole series:
> > 
> > Tested-by: Hector Martin <marcan@marcan.st>
> > 
> > I can't specifically test for the probe races the series intends to fix
> > though, since that bug we only hit extremely rarely. I'm just testing
> > that nothing breaks.
> 
> Actually no, this fix is not sufficient. If the first IOMMU is ready
> then the xlate path allocates dev->iommu, which then
> __iommu_probe_device takes as a sign that all IOMMUs are ready and does
> the device init.

It doesn't.. The code there is:

	if (!fwspec && dev->iommu)
		fwspec = dev->iommu->fwspec;
	if (fwspec)
		ops = fwspec->ops;
	else
		ops = dev->bus->iommu_ops;
	if (!ops) {
		ret = -ENODEV;
		goto out_unlock;
	}

Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is
returned fwspec will be freed and dev->iommu->fwspec will be NULL
here.

In the NULL case it does a 'bus probe' with a NULL fwspec and all the
fwspec drivers return immediately from their probe functions.

Did I miss something?

> Then when the xlate comes along again after suceeding
> with the second IOMMU, __iommu_probe_device sees the device is already
> in a group and never initializes the second IOMMU, leaving the device
> with only one IOMMU.

This should be fixed by the first hunk to check every iommu and fail?

BTW, do you have a systems with same device attached to multiple
iommus?

I've noticed another bug here, many drivers don't actually support
differing iommu instances and nothing seems to check it..

Thanks,
Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-19 14:13       ` Jason Gunthorpe
@ 2023-11-21  6:47         ` Hector Martin
  2023-11-21 16:00           ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Hector Martin @ 2023-11-21  6:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon



On 2023/11/19 23:13, Jason Gunthorpe wrote:
> On Sun, Nov 19, 2023 at 06:19:43PM +0900, Hector Martin wrote:
>>>> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
>>>> +				     struct device *dev,
>>>> +				     struct fwnode_handle *iommu_fwnode)
>>>> +{
>>>> +	const struct iommu_ops *ops;
>>>> +
>>>> +	if (fwspec->iommu_fwnode) {
>>>> +		/*
>>>> +		 * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
>>>> +		 * case of multiple iommus for one device they must point to the
>>>> +		 * same driver, checked via same ops.
>>>> +		 */
>>>> +		ops = iommu_ops_from_fwnode(iommu_fwnode);
>>>
>>> This carries over a related bug from the original code: If a device has
>>> two IOMMUs and the first one probes but the second one defers, ops will
>>> be NULL here and the check will fail with EINVAL.
>>>
>>> Adding a check for that case here fixes it:
>>>
>>> 		if (!ops)
>>> 			return driver_deferred_probe_check_state(dev);
> 
> Yes!
> 
>>> With that, for the whole series:
>>>
>>> Tested-by: Hector Martin <marcan@marcan.st>
>>>
>>> I can't specifically test for the probe races the series intends to fix
>>> though, since that bug we only hit extremely rarely. I'm just testing
>>> that nothing breaks.
>>
>> Actually no, this fix is not sufficient. If the first IOMMU is ready
>> then the xlate path allocates dev->iommu, which then
>> __iommu_probe_device takes as a sign that all IOMMUs are ready and does
>> the device init.
> 
> It doesn't.. The code there is:
> 
> 	if (!fwspec && dev->iommu)
> 		fwspec = dev->iommu->fwspec;
> 	if (fwspec)
> 		ops = fwspec->ops;
> 	else
> 		ops = dev->bus->iommu_ops;
> 	if (!ops) {
> 		ret = -ENODEV;
> 		goto out_unlock;
> 	}
> 
> Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is
> returned fwspec will be freed and dev->iommu->fwspec will be NULL
> here.
> 
> In the NULL case it does a 'bus probe' with a NULL fwspec and all the
> fwspec drivers return immediately from their probe functions.
> 
> Did I miss something?

apple_dart is not a fwspec driver and doesn't do that :-)

> 
>> Then when the xlate comes along again after suceeding
>> with the second IOMMU, __iommu_probe_device sees the device is already
>> in a group and never initializes the second IOMMU, leaving the device
>> with only one IOMMU.
> 
> This should be fixed by the first hunk to check every iommu and fail?
> 
> BTW, do you have a systems with same device attached to multiple
> iommus?

Yes, Apple ARM64 machines all have multiple ganged IOMMUs for certain
devices (USB and ISP). We also attach all display IOMMUs to the global
virtual display-subsystem device to handle framebuffer mappings, instead
of trying to dynamically map them to a bunch of individual display
controllers (which is a lot more painful). That last one is what
reliably reproduces this problem, display breaks without both previous
patches ever since we started supporting more than one display output.
The first one is not enough.

> I've noticed another bug here, many drivers don't actually support
> differing iommu instances and nothing seems to check it..

apple-dart does (as long as all the IOMMUs are using that driver).

> 
> Thanks,
> Jason
> 
> 

- Hector

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-21  6:47         ` Hector Martin
@ 2023-11-21 16:00           ` Jason Gunthorpe
  2023-11-23  9:08             ` Hector Martin
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-21 16:00 UTC (permalink / raw)
  To: Hector Martin
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon

On Tue, Nov 21, 2023 at 03:47:48PM +0900, Hector Martin wrote:
> > Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is
> > returned fwspec will be freed and dev->iommu->fwspec will be NULL
> > here.
> > 
> > In the NULL case it does a 'bus probe' with a NULL fwspec and all the
> > fwspec drivers return immediately from their probe functions.
> > 
> > Did I miss something?
> 
> apple_dart is not a fwspec driver and doesn't do that :-)

It implements of_xlate that makes it a driver using the fwspec probe
path.

The issue is in apple-dart. Its logic for avoiding bus probe vs
fwspec probe is not correct.

It does:

static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
{
 [..]
 	dev_iommu_priv_set(dev, cfg);


Then:

static struct iommu_device *apple_dart_probe_device(struct device *dev)
{
	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
	struct apple_dart_stream_map *stream_map;
	int i;

	if (!cfg)
		return ERR_PTR(-ENODEV);

Which leaks the cfg memory on rare error cases and wrongly allows the
driver to probe without a fwspec, which I think is what you are
hitting.

It should be

       if (!dev_iommu_fwspec_get(dev) || !cfg)
		return ERR_PTR(-ENODEV);

To ensure the driver never probes on the bus path.

Clearing the dev->iommu in the core code has the side effect of
clearing (and leaking) the cfg which would hide this issue.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-16  4:17       ` Jason Gunthorpe
@ 2023-11-21 16:06         ` Robin Murphy
  2023-11-21 17:55           ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2023-11-21 16:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On 2023-11-16 4:17 am, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 08:23:54PM +0000, Robin Murphy wrote:
>> On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:
>>> On Wed, Nov 15, 2023 at 03:22:09PM +0000, Robin Murphy wrote:
>>>> On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
>>>>> [Several people have tested this now, so it is something that should sit in
>>>>> linux-next for a while]
>>>>
>>>> What's the aim here? This is obviously far, far too much for a
>>>> stable fix,
>>>
>>> To fix the locking bug and ugly abuse of dev->iommu?
>>
>> Fixing the locking can be achieved by fixing the locking, as I have now
>> demonstrated.
> 
> Obviously. I rejected that right away because of how incredibly
> wrongly layered and hacky it is to do something like that.

What, and dressing up the fundamental layering violation by baking it 
even further into the API flow, while still not actually fixing it or 
any of its *other* symptoms, is somehow better?

Ultimately, this series is still basically doing the same thing my patch 
does - extending the scope of the existing iommu_probe_device_lock hack 
to cover fwspec creation. A hack is a hack, so frankly I'd rather it be 
simple and obvious and look like one, and being easy to remove again is 
an obvious bonus too.

>>> I haven't seen patches or an outline on what you have in mind though?
>>>
>>> In my view I would like to get rid of of_xlate(), at a minimum. It is
>>> a micro-optimization I don't think we need. I see a pretty
>>> straightforward path to get there from here.
>>
>> Micro-optimisation!? OK, I think I have to say it. Please stop trying to
>> rewrite code you don't understand.
> 
> I understand it fine. The list of (fwnode_handle, of_phandle_args)
> tuples doesn't change between when of_xlate is callled and when probe
> is called. Probe can have the same list. As best I can tell the extra
> ops avoids maybe some memory allocation, maybe an extra iteration.
> 
> What it does do is screw up alot of the drivers that seem to want to
> allocate the per-device data in of_xlate and make it convoluted and
> memory leaking buggy on error paths.
> 
> So, I would move toward having the driver's probe invoke a helper like:
> 
>     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> 
> Which generates the same list of (fwnode_handle, of_phandle_args) that
> was passed to of_xlate today, but is ordered sensibly within the
> sequence of probe for what many drivers seem to want to do.

Grep for of_xlate. It is a standard and well-understood callback pattern 
for a subsystem to parse a common DT binding and pass a driver-specific 
specifier to a driver to interpret. Or maybe you just have a peculiar 
definition of what you think "micro-optimisation" means? :/

> So, it is not so much that that the idea of of_xlate goes away, but
> the specific op->of_xlate does, it gets shifted into a helper that
> invokes the same function in a more logical spot.

I'm curious how you imagine an IOMMU driver's ->probe function could be 
called *before* parsing the firmware to work out what, if any, IOMMU, 
and thus driver, a device is associated with. Unless you think we should 
have the horrible perf model of passing the device to *every* registered 
->probe callback in turn until someone claims it. And then every driver 
has to have identical boilerplate to go off and parse the generic 
"iommus" binding... which is the whole damn reason for *not* going down 
that route and instead using an of_xlate mechanism in the first place.

> The per-device data can be allocated at the top of probe and passed
> through args to fix the lifetime bugs.
> 
> It is pretty simple to do.

I believe the kids these days would say "Say you don't understand the 
code without saying you don't understand the code."

>> Most of this series constitutes a giant sweeping redesign of a whole bunch
>> of internal machinery to permit it to be used concurrently, where that
>> concurrency should still not exist in the first place because the thing that
>> allows it to happen also causes other problems like groups being broken.
>> Once the real problem is fixed there will be no need for any of this, and at
>> worst some of it will then actually get in the way.
> 
> Not quite. This decouples two unrelated things into seperate
> concerns. It is not so much about the concurrency but removing the
> abuse of dev->iommu by code that has no need to touch it at all.

Sorry, the "abuse" of storing IOMMU-API-specific data in the place we 
intentionally created to consolidate all the IOMMU-API-specific data 
into? Yes, there is an issue with the circumstances in which this data 
is sometimes accessed, but as I'm starting to tire of repeating, that 
issue fundamentally dates back to 2017, and the implications were 
unfortunately overlooked when dev->iommu was later introduced and fwspec 
moved into it (since the non-DT probing paths still worked as originally 
designed). Pretending that dev->iommu is the issue here is missing the 
point.

> Decoupling makes moving code around easier since the relationships are
> easier to reason about.

Again with the odd definitions of "easier". You know what I think is 
easy? Having a thing be in the obvious place where it should be (but 
used in the way that was intended). What I would consider objectively 
less easy is having a thing sometimes be there but sometimes be 
somewhere else with loads more API machinery to juggle between the two. 
Especially when once again, that machinery is itself prone to new bugs.

Once again you've got hung up on one particular detail of one symptom of 
the *real* issue, so although I can see and follow your chain of 
reasoning, the fact that it starts from the wrong place makes it not 
particularly useful in the bigger picture.

> You can still allocated a fwnode, populate it, and do the rest of the
> flow under a probe function just fine.
>   
>> I feel like I've explained it many times already, but what needs to happen
>> is for the firmware parsing and of_xlate stage to be initiated by
>> __iommu_probe_device() itself.
> 
> Yes, OK I see. I don't see a problem, I think this still a good
> improvement even in that world it is undesirable to use dev->iommu as
> a temporary, even if the locking can work.
> 
>> ever allowed to get it landed...) which gets to the state of
>> expecting to
> 
> Repost it? Rc1 is out and you need to add one hunk to the new user
> domain creation in iommufd.

Well yeah, I'm trying to get that rebase finished (hence why I'm finding 
broken IOMMUFD selftests), but as always I'm also busy with a lot of 
other non-upstream things, and every time I have managed to do it so far 
this year it's ended up being blocked by conflicting changes, so I 
reserve my optimism...

>> start from a fwspec. Then it's a case of shuffling around what's currently
>> in the bus_type dma_configure methods such that point is where the fwspec is
>> created as well, and the driver-probe-time work is almost removed except for
>> still deferring if a device is waiting for its IOMMU instance (since that
>> instance turning up and registering will retrigger the rest itself). And
>> there at last, a trivial lifecycle and access pattern for dev->iommu (with
>> the overlapping bits of iommu_fwspec finally able to be squashed as well),
>> and finally an end to 8 long and unfortunate years of calling things in the
>> wrong order in ways they were never supposed to be.
> 
> Having just gone through this all in detail I don't think it is as
> entirely straightforward as this, the open coded callers to
> of_dma_configure() are not going to be so nice to unravel.

I've only had this turning over in the back of my mind for about the 
last 4 years now, so I think I have a good understanding of what to 
expect, thanks.

Robin.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
  2023-11-21 16:06         ` Robin Murphy
@ 2023-11-21 17:55           ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2023-11-21 17:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Hector Martin, Palmer Dabbelt, patches, Paul Walmsley,
	Rafael J. Wysocki, Robert Moore, Rob Herring, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon, André Draszik, Lu Baolu,
	Christoph Hellwig, Jerry Snitselaar, Moritz Fischer,
	Zhenhua Huang, Rafael J. Wysocki, Rob Herring

On Tue, Nov 21, 2023 at 04:06:15PM +0000, Robin Murphy wrote:

> > Obviously. I rejected that right away because of how incredibly
> > wrongly layered and hacky it is to do something like that.
> 
> What, and dressing up the fundamental layering violation by baking it even
> further into the API flow, while still not actually fixing it or any of its
> *other* symptoms, is somehow better?

It puts the locks and the controlled data in the right place, in the
right layer.

> Ultimately, this series is still basically doing the same thing my patch
> does - extending the scope of the existing iommu_probe_device_lock hack to
> cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple
> and obvious and look like one, and being easy to remove again is an obvious
> bonus too.

Not entirely, as I've said this series removes the use of the dev->iommu
during fwspec creation. This is different than just extending the lock.

> > So, I would move toward having the driver's probe invoke a helper like:
> > 
> >     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> > 
> > Which generates the same list of (fwnode_handle, of_phandle_args) that
> > was passed to of_xlate today, but is ordered sensibly within the
> > sequence of probe for what many drivers seem to want to do.
> 
> Grep for of_xlate. It is a standard and well-understood callback pattern for
> a subsystem to parse a common DT binding and pass a driver-specific
> specifier to a driver to interpret. Or maybe you just have a peculiar
> definition of what you think "micro-optimisation" means? :/

Don't know about other subsystems, here is making a mess. Look at what
the drivers are doing. The two SMMU drivers are sort of sane, but
everything else has coded half their pobe into of_xlate. It doesn't
make alot of sense.

> > So, it is not so much that that the idea of of_xlate goes away, but
> > the specific op->of_xlate does, it gets shifted into a helper that
> > invokes the same function in a more logical spot.
> 
> I'm curious how you imagine an IOMMU driver's ->probe function could be
> called *before* parsing the firmware to work out what, if any, IOMMU, and
> thus driver, a device is associated with.

You've jumped ahead here, I'm talking about removing ops->of_xlate.

With everything kept the same as today this means we'd scan the FW
description twice. Once to find the ops and once inside the driver to
parse it.

When I say micro-optimization this is what I mean - structuring the
code to try to avoid multiple-scans of the FW table.

Once the drivers are self-parsing I see there are two paths to keep it
at one FW parse:

 1) Have the core code parse and cache common cases in the iommu_fwspec.
    Driver then pulls out the common cases from the iommu_fwspec and
    reparsed in uncommon cases.

 2) Accept we have only a single ops in all real systems and just
    invoke the driver to parse it. That parse will cache enough
    information to decide if EPROBE_DEFER is needed.

In either case the drivers would call the same APIs and have the same
logic. The choice is just infrastructure-side stuff.

It is a different view than trying to do everything up front, but I'm
not seeing that it is so differently efficient as to be a performance
problem.

On the plus side it looks to make the drivers alot simpler and more
logical.

> And then every driver has to have
> identical boilerplate to go off and parse the generic "iommus" binding...
> which is the whole damn reason for *not* going down that route and instead
> using an of_xlate mechanism in the first place.

Let's not guess. I've attached below a sketch conversion for
apple-dart.

Diffstat says it *saves* 1 line. But also we fix several bugs!

 - iommu_of_xlate() will check for NULL fwspec and reject the bus
   probe path

 - iommu_of_xlate() can match the iommu's from the iommu list and
   check that the OF actually points only to iommus with this driver's
   ops (missed sanity check)

 - The of parsing machinery already computes the iommu_driver but throws
   it out. This forces all of the drivers to do their own thing. Pass
   it in and thus apple_dart_of_xlate() doesn't need to mess around
   with of_find_device_by_node() and we fix the bug where it leaks the
   reference on the struct device (woops!)

 - We don't leak the cfg allocation that apple_dart_of_xlate() did on
   various error paths. All error paths logically free in probe. We
   don't have to think about what happens if of_xlate is called twice
   for the same args on error/reprobe paths.

Many drivers follow this pattern of apple-dart and would end up like this.

Drivers that just need the u32 array would be simpler, more like:

   struct iommu_driver *instance;
   unsigned int num_ids;

   instance = iommu_of_get_iommu(..., &num_ids);
   if (IS_ERR(instance))
      return ERR_CAST(instance);
   cfg = kzalloc(struct_size(cfg, sids, num_ids), GFP_KERNEL);
   if (!cfg)
      return -ENOMEM;
   iommu_of_read_u32_ids(..., &cfg->sids);
   [..]
   return instance;

I haven't sketched *every* driver yet, but I've sketched enough to be
confident.

Robin, I know it is not what you have in your head, but you should
stop with the insults and be more open to other perspectives.

> > The per-device data can be allocated at the top of probe and passed
> > through args to fix the lifetime bugs.
> > 
> > It is pretty simple to do.
> 
> I believe the kids these days would say "Say you don't understand the code
> without saying you don't understand the code."

I think you are too fixated on what you have in your mind. It will
take me a bit but I will come with a series to move all the FW parsing
into probe along this model. Once done it is trivial to make bus probe
work like it should.

Regarding this series, I don't really see a problem. There is no
"concrete" or anything like that.

> > Not quite. This decouples two unrelated things into seperate
> > concerns. It is not so much about the concurrency but removing the
> > abuse of dev->iommu by code that has no need to touch it at all.
> 
> Sorry, the "abuse" of storing IOMMU-API-specific data in the place we
> intentionally created to consolidate all the IOMMU-API-specific data
> into?

The global state should not be filled until the driver is probed. It
is an abuse to use a global instead of an on-stack variable when
building it. Publishing only fully initialized data to global
visibility is concurrency 101. :(

Jason

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..476938722460b8 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -721,19 +721,29 @@ static struct iommu_domain apple_dart_blocked_domain = {
 
 static struct iommu_device *apple_dart_probe_device(struct device *dev)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_master_cfg *cfg;
 	struct apple_dart_stream_map *stream_map;
+	int ret;
 	int i;
 
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 	if (!cfg)
-		return ERR_PTR(-ENODEV);
+		return ERR_PTR(-ENOMEM);
+	ret = iommu_of_xlate(dev_iommu_fwspec_get(dev), &apple_dart_iommu_ops,
+			     1, &apple_dart_of_xlate, cfg);
+	if (ret)
+		goto err_free;
 
 	for_each_stream_map(i, cfg, stream_map)
 		device_link_add(
 			dev, stream_map->dart->dev,
 			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
 
+	dev_iommu_priv_set(dev, cfg);
 	return &cfg->stream_maps[0].dart->iommu;
+err_free:
+	kfree(cfg);
+	return ret;
 }
 
 static void apple_dart_release_device(struct device *dev)
@@ -777,25 +787,15 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 	kfree(dart_domain);
 }
 
-static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int apple_dart_of_xlate(struct iommu_device *iommu,
+			       struct of_phandle_args *args, void *priv)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
-	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
-	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
-	struct apple_dart *cfg_dart;
-	int i, sid;
+	struct apple_dart *dart = container_of(iommu, struct apple_dart, iommu);
+	struct apple_dart_master_cfg *cfg = priv;
+	struct apple_dart *cfg_dart = cfg->stream_maps[0].dart;
+	int sid = args->args[0];
+	int i;
 
-	if (args->args_count != 1)
-		return -EINVAL;
-	sid = args->args[0];
-
-	if (!cfg)
-		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-	dev_iommu_priv_set(dev, cfg);
-
-	cfg_dart = cfg->stream_maps[0].dart;
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
@@ -980,7 +980,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
-	.of_xlate = apple_dart_of_xlate,
 	.def_domain_type = apple_dart_def_domain_type,
 	.get_resv_regions = apple_dart_get_resv_regions,
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc()
  2023-11-21 16:00           ` Jason Gunthorpe
@ 2023-11-23  9:08             ` Hector Martin
  0 siblings, 0 replies; 35+ messages in thread
From: Hector Martin @ 2023-11-23  9:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: acpica-devel, Alyssa Rosenzweig, Albert Ou, asahi,
	Catalin Marinas, Dexuan Cui, devicetree, David Woodhouse,
	Frank Rowand, Hanjun Guo, Haiyang Zhang, iommu,
	Jean-Philippe Brucker, Jonathan Hunter, Joerg Roedel,
	K. Y. Srinivasan, Len Brown, linux-acpi, linux-arm-kernel,
	linux-hyperv, linux-mips, linux-riscv, linux-snps-arc,
	linux-tegra, Russell King, Lorenzo Pieralisi, Marek Szyprowski,
	Palmer Dabbelt, patches, Paul Walmsley, Rafael J. Wysocki,
	Robert Moore, Rob Herring, Robin Murphy, Sudeep Holla,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding,
	Thomas Bogendoerfer, Krishna Reddy, Vineet Gupta, virtualization,
	Wei Liu, Will Deacon

On 2023/11/22 1:00, Jason Gunthorpe wrote:
> On Tue, Nov 21, 2023 at 03:47:48PM +0900, Hector Martin wrote:
>>> Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is
>>> returned fwspec will be freed and dev->iommu->fwspec will be NULL
>>> here.
>>>
>>> In the NULL case it does a 'bus probe' with a NULL fwspec and all the
>>> fwspec drivers return immediately from their probe functions.
>>>
>>> Did I miss something?
>>
>> apple_dart is not a fwspec driver and doesn't do that :-)
> 
> It implements of_xlate that makes it a driver using the fwspec probe
> path.
> 
> The issue is in apple-dart. Its logic for avoiding bus probe vs
> fwspec probe is not correct.
> 
> It does:
> 
> static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
> {
>  [..]
>  	dev_iommu_priv_set(dev, cfg);
> 
> 
> Then:
> 
> static struct iommu_device *apple_dart_probe_device(struct device *dev)
> {
> 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
> 	struct apple_dart_stream_map *stream_map;
> 	int i;
> 
> 	if (!cfg)
> 		return ERR_PTR(-ENODEV);
> 
> Which leaks the cfg memory on rare error cases and wrongly allows the
> driver to probe without a fwspec, which I think is what you are
> hitting.
> 
> It should be
> 
>        if (!dev_iommu_fwspec_get(dev) || !cfg)
> 		return ERR_PTR(-ENODEV);
> 
> To ensure the driver never probes on the bus path.
> 
> Clearing the dev->iommu in the core code has the side effect of
> clearing (and leaking) the cfg which would hide this issue.

Aha! Yes, that makes it work with only the first change. I'll throw the
apple-dart fix into our tree (and submit it once I get to clearing out
the branch; the affected consumer driver isn't upstream yet so this
isn't particularly urgent).

- Hector

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-11-23  9:08 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 02/17] iommmu/of: Do not return struct iommu_ops from of_iommu_configure() Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure() Jason Gunthorpe
2023-11-15 14:42   ` Jerry Snitselaar
2023-11-15 14:05 ` [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() Jason Gunthorpe
2023-11-15 14:45   ` Jerry Snitselaar
2023-11-15 14:05 ` [PATCH v2 05/17] iommu: Make iommu_fwspec->ids a distinct allocation Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc() Jason Gunthorpe
2023-11-19  8:10   ` Hector Martin
2023-11-19  9:19     ` Hector Martin
2023-11-19 14:13       ` Jason Gunthorpe
2023-11-21  6:47         ` Hector Martin
2023-11-21 16:00           ` Jason Gunthorpe
2023-11-23  9:08             ` Hector Martin
2023-11-15 14:05 ` [PATCH v2 07/17] iommu: Add iommu_probe_device_fwspec() Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 08/17] iommu/of: Do not use dev->iommu within of_iommu_configure() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 09/17] iommu: Add iommu_fwspec_append_ids() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 10/17] acpi: Do not use dev->iommu within acpi_iommu_configure() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 11/17] iommu: Hold iommu_probe_device_lock while calling ops->of_xlate Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
2023-11-15 15:09   ` Jerry Snitselaar
2023-11-16 14:36   ` Moritz Fischer
2023-11-15 14:06 ` [PATCH v2 13/17] iommu: Remove dev_iommu_fwspec_set() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 14/17] iommu: Remove pointless iommu_fwspec_free() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 15/17] iommu: Add ops->of_xlate_fwspec() Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 16/17] iommu: Mark dev_iommu_get() with lockdep Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 17/17] iommu: Mark dev_iommu_priv_set() with a lockdep Jason Gunthorpe
2023-11-15 14:54 ` [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jerry Snitselaar
2023-11-15 15:22 ` Robin Murphy
2023-11-15 15:36   ` Jason Gunthorpe
2023-11-15 20:23     ` Robin Murphy
2023-11-16  4:17       ` Jason Gunthorpe
2023-11-21 16:06         ` Robin Murphy
2023-11-21 17:55           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).