linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] iommu: Fix the longstanding probe issues
@ 2025-02-28 15:46 Robin Murphy
  2025-02-28 15:46 ` [PATCH v2 1/4] iommu: Handle race with default domain setup Robin Murphy
                   ` (4 more replies)
  0 siblings, 5 replies; 44+ messages in thread
From: Robin Murphy @ 2025-02-28 15:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

v1: https://lore.kernel.org/linux-iommu/cover.1739486121.git.robin.murphy@arm.com/

Hi all,

This spin irons out a couple of issues which v1 had. Firstly there
should now be no change in behaviour for the weird of_dma_configure()
calls, other than possibly getting the warning if they deserve it.
Secondly I think there was still a possibility for probe to run via
the replay path while its "real" probe was waiting to reacquire the
lock; this is now solved by making dev->iommu a reliable indicator of
the probe lifecycle, with a couple more prep patches.

Thanks,
Robin.


Robin Murphy (4):
  iommu: Handle race with default domain setup
  iommu: Resolve ops in iommu_init_device()
  iommu: Keep dev->iommu state consistent
  iommu: Get DT/ACPI parsing into the proper probe path

 drivers/acpi/arm64/dma.c        |  5 +++
 drivers/acpi/scan.c             |  7 -----
 drivers/amba/bus.c              |  3 +-
 drivers/base/platform.c         |  3 +-
 drivers/bus/fsl-mc/fsl-mc-bus.c |  3 +-
 drivers/cdx/cdx.c               |  3 +-
 drivers/iommu/iommu-priv.h      |  2 ++
 drivers/iommu/iommu.c           | 55 ++++++++++++++++++++++++---------
 drivers/iommu/of_iommu.c        | 13 ++++++--
 drivers/of/device.c             |  7 ++++-
 drivers/pci/pci-driver.c        |  3 +-
 11 files changed, 74 insertions(+), 30 deletions(-)

-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/4] iommu: Handle race with default domain setup
  2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
@ 2025-02-28 15:46 ` Robin Murphy
  2025-02-28 15:46 ` [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device() Robin Murphy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 44+ messages in thread
From: Robin Murphy @ 2025-02-28 15:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla, Jason Gunthorpe

It turns out that deferred default domain creation leaves a subtle
race window during iommu_device_register() wherein a client driver may
asynchronously probe in parallel and get as far as performing DMA API
operations with dma-direct, only to be switched to iommu-dma underfoot
once the default domain attachment finally happens, with obviously
disastrous consequences. Even the wonky of_iommu_configure() path is at
risk, since iommu_fwspec_init() will no longer defer client probe as the
instance ops are (necessarily) already registered, and the "replay"
iommu_probe_device() call can see dev->iommu_group already set and so
think there's nothing to do either.

Fortunately we already have the right tool in the right place in the
form of iommu_device_use_default_domain(), which just needs to ensure
that said default domain is actually ready to *be* used. Deferring the
client probe shouldn't have too much impact, given that this only
happens while the IOMMU driver is probing, and thus due to kick the
deferred probe list again once it finishes.

Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: No change

Note this fixes tag is rather nuanced - historically there was a more
general issue before deac0b3bed26 ("iommu: Split off default domain
allocation from group assignment") set the basis for the current
conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
interfaces") is then the point at which it becomes logical to fix the
current race this way; however only from 98ac73f99bc4 can we rely on all
drivers supporting default domains and so avoid false negatives, thus
even though this might apply to older kernels without conflict it would
not be functionally correct. LTS-wise, prior to 6.6 and commit
f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
the impact of this race is merely the historical issue again, but since
deac0b3bed26 that would raise a visible warning if it did lead to a
default domain mismatch, which nobody has ever reported seeing. Thus we
should only need a backport for 6.6, which I tentatively have ready.
---
 drivers/iommu/iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4fca10d107c8..179617bb412d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3107,6 +3107,11 @@ int iommu_device_use_default_domain(struct device *dev)
 		return 0;
 
 	mutex_lock(&group->mutex);
+	/* We may race against bus_iommu_probe() finalising groups here */
+	if (!group->default_domain) {
+		ret = -EPROBE_DEFER;
+		goto unlock_out;
+	}
 	if (group->owner_cnt) {
 		if (group->domain != group->default_domain || group->owner ||
 		    !xa_empty(&group->pasid_array)) {
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device()
  2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
  2025-02-28 15:46 ` [PATCH v2 1/4] iommu: Handle race with default domain setup Robin Murphy
@ 2025-02-28 15:46 ` Robin Murphy
  2025-03-05 17:55   ` Jason Gunthorpe
  2025-02-28 15:46 ` [PATCH v2 3/4] iommu: Keep dev->iommu state consistent Robin Murphy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-02-28 15:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

Since iommu_init_device() was factored out, it is in fact the only
consumer of the ops which __iommu_probe_device() is resolving, so let it
do that itself rather than passing them in. This also puts the ops
lookup at a more logical point relative to the rest of the flow through
__iommu_probe_device().

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: New

 drivers/iommu/iommu.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 179617bb412d..37b7088e129a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -404,14 +404,28 @@ EXPORT_SYMBOL_GPL(dev_iommu_priv_set);
  * Init the dev->iommu and dev->iommu_group in the struct device and get the
  * driver probed
  */
-static int iommu_init_device(struct device *dev, const struct iommu_ops *ops)
+static int iommu_init_device(struct device *dev)
 {
+	const struct iommu_ops *ops;
 	struct iommu_device *iommu_dev;
 	struct iommu_group *group;
 	int ret;
 
 	if (!dev_iommu_get(dev))
 		return -ENOMEM;
+	/*
+	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
+	 * instances with non-NULL fwnodes, and client devices should have been
+	 * identified with a fwspec by this point. Otherwise, we can currently
+	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
+	 * be present, and that any of their registered instances has suitable
+	 * ops for probing, and thus cheekily co-opt the same mechanism.
+	 */
+	ops = iommu_fwspec_ops(dev->iommu->fwspec);
+	if (!ops) {
+		ret = -ENODEV;
+		goto err_free;
+	}
 
 	if (!try_module_get(ops->owner)) {
 		ret = -EINVAL;
@@ -514,22 +528,10 @@ DEFINE_MUTEX(iommu_probe_device_lock);
 
 static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
 {
-	const struct iommu_ops *ops;
 	struct iommu_group *group;
 	struct group_device *gdev;
 	int ret;
 
-	/*
-	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
-	 * instances with non-NULL fwnodes, and client devices should have been
-	 * identified with a fwspec by this point. Otherwise, we can currently
-	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
-	 * be present, and that any of their registered instances has suitable
-	 * ops for probing, and thus cheekily co-opt the same mechanism.
-	 */
-	ops = iommu_fwspec_ops(dev_iommu_fwspec_get(dev));
-	if (!ops)
-		return -ENODEV;
 	/*
 	 * Serialise to avoid races between IOMMU drivers registering in
 	 * parallel and/or the "replay" calls from ACPI/OF code via client
@@ -543,7 +545,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 	if (dev->iommu_group)
 		return 0;
 
-	ret = iommu_init_device(dev, ops);
+	ret = iommu_init_device(dev);
 	if (ret)
 		return ret;
 
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/4] iommu: Keep dev->iommu state consistent
  2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
  2025-02-28 15:46 ` [PATCH v2 1/4] iommu: Handle race with default domain setup Robin Murphy
  2025-02-28 15:46 ` [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device() Robin Murphy
@ 2025-02-28 15:46 ` Robin Murphy
  2025-03-05 18:14   ` Jason Gunthorpe
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
  2025-03-10  8:29 ` [PATCH v2 0/4] iommu: Fix the longstanding probe issues Joerg Roedel
  4 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-02-28 15:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

At the moment, if of_iommu_configure() allocates dev->iommu itself via
iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the
fwspec but leaves the empty dev_iommu hanging around. So far this is
benign (if a tiny bit wasteful), but we'd like to be able to reason
about dev->iommu having a consistent and unambiguous lifecycle. Thus
make sure that the of_iommu cleanup undoes precisely whatever it did.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: New

 drivers/iommu/iommu-priv.h | 2 ++
 drivers/iommu/iommu.c      | 2 +-
 drivers/iommu/of_iommu.c   | 6 +++++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 17e245cf17bb..6574c1f97e14 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -17,6 +17,8 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
 	return dev->iommu->iommu_dev->ops;
 }
 
+void dev_iommu_free(struct device *dev);
+
 const struct iommu_ops *iommu_ops_from_fwnode(const struct fwnode_handle *fwnode);
 
 static inline const struct iommu_ops *iommu_fwspec_ops(struct iommu_fwspec *fwspec)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 37b7088e129a..a3b45b84f42b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -352,7 +352,7 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
 	return param;
 }
 
-static void dev_iommu_free(struct device *dev)
+void dev_iommu_free(struct device *dev)
 {
 	struct dev_iommu *param = dev->iommu;
 
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 97987cd78da9..e10a68b5ffde 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -116,6 +116,7 @@ static void of_pci_check_device_ats(struct device *dev, struct device_node *np)
 int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		       const u32 *id)
 {
+	bool dev_iommu_present;
 	int err;
 
 	if (!master_np)
@@ -127,6 +128,7 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		mutex_unlock(&iommu_probe_device_lock);
 		return 0;
 	}
+	dev_iommu_present = dev->iommu;
 
 	/*
 	 * We don't currently walk up the tree looking for a parent IOMMU.
@@ -147,8 +149,10 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		err = of_iommu_configure_device(master_np, dev, id);
 	}
 
-	if (err)
+	if (err && dev_iommu_present)
 		iommu_fwspec_free(dev);
+	else if (err && dev->iommu)
+		dev_iommu_free(dev);
 	mutex_unlock(&iommu_probe_device_lock);
 
 	if (!err && dev->bus)
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
                   ` (2 preceding siblings ...)
  2025-02-28 15:46 ` [PATCH v2 3/4] iommu: Keep dev->iommu state consistent Robin Murphy
@ 2025-02-28 15:46 ` Robin Murphy
  2025-03-05 18:28   ` Jason Gunthorpe
                     ` (8 more replies)
  2025-03-10  8:29 ` [PATCH v2 0/4] iommu: Fix the longstanding probe issues Joerg Roedel
  4 siblings, 9 replies; 44+ messages in thread
From: Robin Murphy @ 2025-02-28 15:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

In hindsight, there were some crucial subtleties overlooked when moving
{of,acpi}_dma_configure() to driver probe time to allow waiting for
IOMMU drivers with -EPROBE_DEFER, and these have become an
ever-increasing source of problems. The IOMMU API has some fundamental
assumptions that iommu_probe_device() is called for every device added
to the system, in the order in which they are added. Calling it in a
random order or not at all dependent on driver binding leads to
malformed groups, a potential lack of isolation for devices with no
driver, and all manner of unexpected concurrency and race conditions.
We've attempted to mitigate the latter with point-fix bodges like
iommu_probe_device_lock, but it's a losing battle and the time has come
to bite the bullet and address the true source of the problem instead.

The crux of the matter is that the firmware parsing actually serves two
distinct purposes; one is identifying the IOMMU instance associated with
a device so we can check its availability, the second is actually
telling that instance about the relevant firmware-provided data for the
device. However the latter also depends on the former, and at the time
there was no good place to defer and retry that separately from the
availability check we also wanted for client driver probe.

Nowadays, though, we have a proper notion of multiple IOMMU instances in
the core API itself, and each one gets a chance to probe its own devices
upon registration, so we can finally make that work as intended for
DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
be able to run the iommu_fwspec machinery currently buried deep in the
wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
surprisingly straightforward to bootstrap this transformation by pretty
much just calling the same path twice. At client driver probe time,
dev->driver is obviously set; conversely at device_add(), or a
subsequent bus_iommu_probe(), any device waiting for an IOMMU really
should *not* have a driver already, so we can use that as a condition to
disambiguate the two cases, and avoid recursing back into the IOMMU core
at the wrong times.

Obviously this isn't the nicest thing, but for now it gives us a
functional baseline to then unpick the layers in between without many
more awkward cross-subsystem patches. There are some minor side-effects
like dma_range_map potentially being created earlier, and some debug
prints being repeated, but these aren't significantly detrimental. Let's
make things work first, then deal with making them nice.

With the basic flow finally in the right order again, the next step is
probably turning the bus->dma_configure paths inside-out, since all we
really need from bus code is its notion of which device and input ID(s)
to parse the common firmware properties with...

Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2:
 - Comment bus driver changes for clarity
 - Use dev->iommu as the now-robust replay condition
 - Drop the device_iommu_mapped() checks in the firmware paths as they
   weren't doing much - we can't replace probe_device_lock just yet...
 
 drivers/acpi/arm64/dma.c        |  5 +++++
 drivers/acpi/scan.c             |  7 -------
 drivers/amba/bus.c              |  3 ++-
 drivers/base/platform.c         |  3 ++-
 drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
 drivers/cdx/cdx.c               |  3 ++-
 drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
 drivers/iommu/of_iommu.c        |  7 ++++++-
 drivers/of/device.c             |  7 ++++++-
 drivers/pci/pci-driver.c        |  3 ++-
 10 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
index 52b2abf88689..f30f138352b7 100644
--- a/drivers/acpi/arm64/dma.c
+++ b/drivers/acpi/arm64/dma.c
@@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
 	else
 		end = (1ULL << 32) - 1;
 
+	if (dev->dma_range_map) {
+		dev_dbg(dev, "dma_range_map already set\n");
+		return;
+	}
+
 	ret = acpi_dma_get_range(dev, &map);
 	if (!ret && map) {
 		end = dma_range_map_max(map);
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 9f4efa8f75a6..fb1fe9f3b1a3 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 		err = viot_iommu_configure(dev);
 	mutex_unlock(&iommu_probe_device_lock);
 
-	/*
-	 * If we have reason to believe the IOMMU driver missed the initial
-	 * iommu_probe_device() call for dev, replay it to get things in order.
-	 */
-	if (!err && dev->bus)
-		err = iommu_probe_device(dev);
-
 	return err;
 }
 
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 8ef259b4d037..71482d639a6d 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
 		ret = acpi_dma_configure(dev, attr);
 	}
 
-	if (!ret && !drv->driver_managed_dma) {
+	/* @drv may not be valid when we're called from the IOMMU layer */
+	if (!ret && dev->driver && !drv->driver_managed_dma) {
 		ret = iommu_device_use_default_domain(dev);
 		if (ret)
 			arch_teardown_dma_ops(dev);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6f2a33722c52..1813cfd0c4bd 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
 		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
 		ret = acpi_dma_configure(dev, attr);
 	}
-	if (ret || drv->driver_managed_dma)
+	/* @drv may not be valid when we're called from the IOMMU layer */
+	if (ret || !dev->driver || drv->driver_managed_dma)
 		return ret;
 
 	ret = iommu_device_use_default_domain(dev);
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index d1f3d327ddd1..a8be8cf246fb 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
 	else
 		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
 
-	if (!ret && !mc_drv->driver_managed_dma) {
+	/* @mc_drv may not be valid when we're called from the IOMMU layer */
+	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
 		ret = iommu_device_use_default_domain(dev);
 		if (ret)
 			arch_teardown_dma_ops(dev);
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..780fb0c4adba 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
 		return ret;
 	}
 
-	if (!ret && !cdx_drv->driver_managed_dma) {
+	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
+	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
 		ret = iommu_device_use_default_domain(dev);
 		if (ret)
 			arch_teardown_dma_ops(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a3b45b84f42b..1cec7074367a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
 	if (!dev_iommu_get(dev))
 		return -ENOMEM;
 	/*
-	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
-	 * instances with non-NULL fwnodes, and client devices should have been
-	 * identified with a fwspec by this point. Otherwise, we can currently
+	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
+	 * is buried in the bus dma_configure path. Properly unpicking that is
+	 * still a big job, so for now just invoke the whole thing. The device
+	 * already having a driver bound means dma_configure has already run and
+	 * either found no IOMMU to wait for, or we're in its replay call right
+	 * now, so either way there's no point calling it again.
+	 */
+	if (!dev->driver && dev->bus->dma_configure) {
+		mutex_unlock(&iommu_probe_device_lock);
+		dev->bus->dma_configure(dev);
+		mutex_lock(&iommu_probe_device_lock);
+	}
+	/*
+	 * At this point, relevant devices either now have a fwspec which will
+	 * match ops registered with a non-NULL fwnode, or we can reasonably
 	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
 	 * be present, and that any of their registered instances has suitable
 	 * ops for probing, and thus cheekily co-opt the same mechanism.
@@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
 		ret = -ENODEV;
 		goto err_free;
 	}
+	/*
+	 * And if we do now see any replay calls, they would indicate someone
+	 * misusing the dma_configure path outside bus code.
+	 */
+	if (dev->driver)
+		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
 
 	if (!try_module_get(ops->owner)) {
 		ret = -EINVAL;
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e10a68b5ffde..6b989a62def2 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
 		dev_iommu_free(dev);
 	mutex_unlock(&iommu_probe_device_lock);
 
-	if (!err && dev->bus)
+	/*
+	 * If we're not on the iommu_probe_device() path (as indicated by the
+	 * initial dev->iommu) then try to simulate it. This should no longer
+	 * happen unless of_dma_configure() is being misused outside bus code.
+	 */
+	if (!err && dev->bus && !dev_iommu_present)
 		err = iommu_probe_device(dev);
 
 	if (err && err != -EPROBE_DEFER)
diff --git a/drivers/of/device.c b/drivers/of/device.c
index edf3be197265..5053e5d532cc 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
 	bool coherent, set_map = false;
 	int ret;
 
+	if (dev->dma_range_map) {
+		dev_dbg(dev, "dma_range_map already set\n");
+		goto skip_map;
+	}
+
 	if (np == dev->of_node)
 		bus_np = __of_get_dma_parent(np);
 	else
@@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
 		end = dma_range_map_max(map);
 		set_map = true;
 	}
-
+skip_map:
 	/*
 	 * If @dev is expected to be DMA-capable then the bus code that created
 	 * it should have initialised its dma_mask pointer by this point. For
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f57ea36d125d..4468b7327cab 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
 
 	pci_put_host_bridge_device(bridge);
 
-	if (!ret && !driver->driver_managed_dma) {
+	/* @driver may not be valid when we're called from the IOMMU layer */
+	if (!ret && dev->driver && !driver->driver_managed_dma) {
 		ret = iommu_device_use_default_domain(dev);
 		if (ret)
 			arch_teardown_dma_ops(dev);
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device()
  2025-02-28 15:46 ` [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device() Robin Murphy
@ 2025-03-05 17:55   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2025-03-05 17:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On Fri, Feb 28, 2025 at 03:46:31PM +0000, Robin Murphy wrote:
> Since iommu_init_device() was factored out, it is in fact the only
> consumer of the ops which __iommu_probe_device() is resolving, so let it
> do that itself rather than passing them in. This also puts the ops
> lookup at a more logical point relative to the rest of the flow through
> __iommu_probe_device().
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: New
> 
>  drivers/iommu/iommu.c | 30 ++++++++++++++++--------------
>  1 file changed, 16 insertions(+), 14 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/4] iommu: Keep dev->iommu state consistent
  2025-02-28 15:46 ` [PATCH v2 3/4] iommu: Keep dev->iommu state consistent Robin Murphy
@ 2025-03-05 18:14   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2025-03-05 18:14 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On Fri, Feb 28, 2025 at 03:46:32PM +0000, Robin Murphy wrote:
> @@ -127,6 +128,7 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>  		mutex_unlock(&iommu_probe_device_lock);
>  		return 0;
>  	}
> +	dev_iommu_present = dev->iommu;

I feel like this deserves a comment..

Maybe it is:

/*
 * If of_iommu_configure is called outside the iommu probe path
 * dev->iommu should be NULL and it needs to remain as NULL
 * If it is called within the probe path then the dev->iommu
 * was setup by iommu_init_device() and must not be changed.
 */

And I think the commit message should explain what consistent
means.. AFAICT you are going for !dev->iommu means no probe has
succeed / dev->iommu means a probe is ongoing in this call chain or
it has succeeded?

Otherwise:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
@ 2025-03-05 18:28   ` Jason Gunthorpe
  2025-03-07 14:24   ` Lorenzo Pieralisi
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2025-03-05 18:28 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> +	if (!dev->driver && dev->bus->dma_configure) {
> +		mutex_unlock(&iommu_probe_device_lock);
> +		dev->bus->dma_configure(dev);
> +		mutex_lock(&iommu_probe_device_lock);
> +	}

I think it would be very nice to get rid of the lock/unlock.. It makes
me nervous that we continue on assuming dev->iommu was freshly
allocated..

 setup the dev->iommu partially, then drop the lock.

There is only one other caller in:

static int really_probe(struct device *dev, const struct device_driver *drv)
{
	if (dev->bus->dma_configure) {
		ret = dev->bus->dma_configure(dev);
		if (ret)
			goto pinctrl_bind_failed;
	}

Is it feasible to make it so the caller has to hold the
iommu_probe_device_lock prior to calling the op?

That would require moving the locking inside of_dma_configure to less
inside, and using a new iommu_probe_device() wrapper.

However, if you plan to turn this inside out soonish then it would not
be worth the bother.

Anyhow:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
  2025-03-05 18:28   ` Jason Gunthorpe
@ 2025-03-07 14:24   ` Lorenzo Pieralisi
  2025-03-07 20:20     ` Robin Murphy
  2025-03-11 18:42   ` Joerg Roedel
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Lorenzo Pieralisi @ 2025-03-07 14:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Hanjun Guo, Sudeep Holla, Rafael J. Wysocki, Len Brown,
	Russell King, Greg Kroah-Hartman, Danilo Krummrich, Stuart Yoder,
	Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal, Joerg Roedel,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas,
	linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
> 
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
> 
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
> 
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
> 
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2:
>  - Comment bus driver changes for clarity
>  - Use dev->iommu as the now-robust replay condition
>  - Drop the device_iommu_mapped() checks in the firmware paths as they
>    weren't doing much - we can't replace probe_device_lock just yet...
>  
>  drivers/acpi/arm64/dma.c        |  5 +++++
>  drivers/acpi/scan.c             |  7 -------
>  drivers/amba/bus.c              |  3 ++-
>  drivers/base/platform.c         |  3 ++-
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
>  drivers/cdx/cdx.c               |  3 ++-
>  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
>  drivers/iommu/of_iommu.c        |  7 ++++++-
>  drivers/of/device.c             |  7 ++++++-
>  drivers/pci/pci-driver.c        |  3 ++-
>  10 files changed, 48 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> index 52b2abf88689..f30f138352b7 100644
> --- a/drivers/acpi/arm64/dma.c
> +++ b/drivers/acpi/arm64/dma.c
> @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
>  	else
>  		end = (1ULL << 32) - 1;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		return;
> +	}
> +

Why are we checking that dev->dma_range_map exists here rather than
at function entry ? Is that because we want to run the previous code
for some reasons even if dev->dma_range_map is already set ?

Just noticed, the OF counterpart does not seem to take the same
approach, so I am just flagging this up if it matters at all.

Other than that, for acpi/arm64:

Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

>  	ret = acpi_dma_get_range(dev, &map);
>  	if (!ret && map) {
>  		end = dma_range_map_max(map);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>  		err = viot_iommu_configure(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	/*
> -	 * If we have reason to believe the IOMMU driver missed the initial
> -	 * iommu_probe_device() call for dev, replay it to get things in order.
> -	 */
> -	if (!err && dev->bus)
> -		err = iommu_probe_device(dev);
> -
>  	return err;
>  }
>  
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 8ef259b4d037..71482d639a6d 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
>  		ret = acpi_dma_configure(dev, attr);
>  	}
>  
> -	if (!ret && !drv->driver_managed_dma) {
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
>  		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>  		ret = acpi_dma_configure(dev, attr);
>  	}
> -	if (ret || drv->driver_managed_dma)
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (ret || !dev->driver || drv->driver_managed_dma)
>  		return ret;
>  
>  	ret = iommu_device_use_default_domain(dev);
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index d1f3d327ddd1..a8be8cf246fb 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
>  	else
>  		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
>  
> -	if (!ret && !mc_drv->driver_managed_dma) {
> +	/* @mc_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> index c573ed2ee71a..780fb0c4adba 100644
> --- a/drivers/cdx/cdx.c
> +++ b/drivers/cdx/cdx.c
> @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
>  		return ret;
>  	}
>  
> -	if (!ret && !cdx_drv->driver_managed_dma) {
> +	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a3b45b84f42b..1cec7074367a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
>  	if (!dev_iommu_get(dev))
>  		return -ENOMEM;
>  	/*
> -	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> -	 * instances with non-NULL fwnodes, and client devices should have been
> -	 * identified with a fwspec by this point. Otherwise, we can currently
> +	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
> +	 * is buried in the bus dma_configure path. Properly unpicking that is
> +	 * still a big job, so for now just invoke the whole thing. The device
> +	 * already having a driver bound means dma_configure has already run and
> +	 * either found no IOMMU to wait for, or we're in its replay call right
> +	 * now, so either way there's no point calling it again.
> +	 */
> +	if (!dev->driver && dev->bus->dma_configure) {
> +		mutex_unlock(&iommu_probe_device_lock);
> +		dev->bus->dma_configure(dev);
> +		mutex_lock(&iommu_probe_device_lock);
> +	}
> +	/*
> +	 * At this point, relevant devices either now have a fwspec which will
> +	 * match ops registered with a non-NULL fwnode, or we can reasonably
>  	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
>  	 * be present, and that any of their registered instances has suitable
>  	 * ops for probing, and thus cheekily co-opt the same mechanism.
> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>  		ret = -ENODEV;
>  		goto err_free;
>  	}
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>  
>  	if (!try_module_get(ops->owner)) {
>  		ret = -EINVAL;
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e10a68b5ffde..6b989a62def2 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>  		dev_iommu_free(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	if (!err && dev->bus)
> +	/*
> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> +	 * initial dev->iommu) then try to simulate it. This should no longer
> +	 * happen unless of_dma_configure() is being misused outside bus code.
> +	 */
> +	if (!err && dev->bus && !dev_iommu_present)
>  		err = iommu_probe_device(dev);
>  
>  	if (err && err != -EPROBE_DEFER)
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index edf3be197265..5053e5d532cc 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  	bool coherent, set_map = false;
>  	int ret;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		goto skip_map;
> +	}
> +
>  	if (np == dev->of_node)
>  		bus_np = __of_get_dma_parent(np);
>  	else
> @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  		end = dma_range_map_max(map);
>  		set_map = true;
>  	}
> -
> +skip_map:
>  	/*
>  	 * If @dev is expected to be DMA-capable then the bus code that created
>  	 * it should have initialised its dma_mask pointer by this point. For
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f57ea36d125d..4468b7327cab 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
>  
>  	pci_put_host_bridge_device(bridge);
>  
> -	if (!ret && !driver->driver_managed_dma) {
> +	/* @driver may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !driver->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> -- 
> 2.39.2.101.g768bb238c484.dirty
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-07 14:24   ` Lorenzo Pieralisi
@ 2025-03-07 20:20     ` Robin Murphy
  0 siblings, 0 replies; 44+ messages in thread
From: Robin Murphy @ 2025-03-07 20:20 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Hanjun Guo, Sudeep Holla, Rafael J. Wysocki, Len Brown,
	Russell King, Greg Kroah-Hartman, Danilo Krummrich, Stuart Yoder,
	Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal, Joerg Roedel,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas,
	linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 2025-03-07 2:24 pm, Lorenzo Pieralisi wrote:
[...]
>> diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
>> index 52b2abf88689..f30f138352b7 100644
>> --- a/drivers/acpi/arm64/dma.c
>> +++ b/drivers/acpi/arm64/dma.c
>> @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
>>   	else
>>   		end = (1ULL << 32) - 1;
>>   
>> +	if (dev->dma_range_map) {
>> +		dev_dbg(dev, "dma_range_map already set\n");
>> +		return;
>> +	}
>> +
> 
> Why are we checking that dev->dma_range_map exists here rather than
> at function entry ? Is that because we want to run the previous code
> for some reasons even if dev->dma_range_map is already set ?
> 
> Just noticed, the OF counterpart does not seem to take the same
> approach, so I am just flagging this up if it matters at all.

The intent is just to ensure it's safe to come through this path more 
than once for the time being - it doesn't really matter whether or not 
we repeat anything apart from allocating and assigning dma_range_map, 
since that leaks the previous allocation. Thus this check is logically 
guarding the acpi_dma_get_range() call in this path, and the 
of_dma_get_range() call in the other.

> Other than that, for acpi/arm64:
> 
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

Thanks!

Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/4] iommu: Fix the longstanding probe issues
  2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
                   ` (3 preceding siblings ...)
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
@ 2025-03-10  8:29 ` Joerg Roedel
  4 siblings, 0 replies; 44+ messages in thread
From: Joerg Roedel @ 2025-03-10  8:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas,
	linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On Fri, Feb 28, 2025 at 03:46:29PM +0000, Robin Murphy wrote:
> This spin irons out a couple of issues which v1 had. Firstly there
> should now be no change in behaviour for the weird of_dma_configure()
> calls, other than possibly getting the warning if they deserve it.
> Secondly I think there was still a possibility for probe to run via
> the replay path while its "real" probe was waiting to reacquire the
> lock; this is now solved by making dev->iommu a reliable indicator of
> the probe lifecycle, with a couple more prep patches.
> 
> Thanks,
> Robin.
> 
> 
> Robin Murphy (4):
>   iommu: Handle race with default domain setup
>   iommu: Resolve ops in iommu_init_device()
>   iommu: Keep dev->iommu state consistent
>   iommu: Get DT/ACPI parsing into the proper probe path
> 
>  drivers/acpi/arm64/dma.c        |  5 +++
>  drivers/acpi/scan.c             |  7 -----
>  drivers/amba/bus.c              |  3 +-
>  drivers/base/platform.c         |  3 +-
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 +-
>  drivers/cdx/cdx.c               |  3 +-
>  drivers/iommu/iommu-priv.h      |  2 ++
>  drivers/iommu/iommu.c           | 55 ++++++++++++++++++++++++---------
>  drivers/iommu/of_iommu.c        | 13 ++++++--
>  drivers/of/device.c             |  7 ++++-
>  drivers/pci/pci-driver.c        |  3 +-
>  11 files changed, 74 insertions(+), 30 deletions(-)

Applied, thanks Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
  2025-03-05 18:28   ` Jason Gunthorpe
  2025-03-07 14:24   ` Lorenzo Pieralisi
@ 2025-03-11 18:42   ` Joerg Roedel
  2025-03-12  7:07     ` Baolu Lu
  2025-03-12 10:10     ` Robin Murphy
       [not found]   ` <CGME20250313095633eucas1p29cb55f2504b4bcf67c16b3bd3fa9b8cd@eucas1p2.samsung.com>
                     ` (5 subsequent siblings)
  8 siblings, 2 replies; 44+ messages in thread
From: Joerg Roedel @ 2025-03-11 18:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas,
	linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

Hi Robin,

On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");

This warning triggers on my workstation (with an AMD IOMMU), any ideas?

 ------------[ cut here ]------------
 reg-dummy reg-dummy: late IOMMU probe at driver bind, something fishy here!
 WARNING: CPU: 0 PID: 1 at drivers/iommu/iommu.c:449 __iommu_probe_device+0x10b/0x510
 Modules linked in:

 CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6-iommu-next+ #1 1d691d7aebf343bde741cf4c8610d78a2ea2d2d9
 Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019
 RIP: 0010:__iommu_probe_device+0x10b/0x510
 Code: 68 00 74 28 48 8b 6b 50 48 85 ed 75 03 48 8b 2b 48 89 df e8 87 71 06 00 48 89 ea 48 c7 c7 90 dd 2c 8b 48 89 c6 e8 35 83 77 ff <0f> 0b 49 8b bd a8 00 00 00 e8 77 ab 85 ff 84 c0 0f 84 ad 01 00 00
 RSP: 0018:ffffafba00047c58 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffffa00481301c10 RCX: 0000000000000003
 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
 RBP: ffffa00480ffaee0 R08: 0000000000000000 R09: ffffafba00047ae8
 R10: ffffa0135e93ffa8 R11: 0000000000000003 R12: ffffafba00047d18
 R13: ffffffff8adac1a0 R14: 0000000000000000 R15: ffffa004802c5800
 FS:  0000000000000000(0000) GS:ffffa0135ea00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffa00baca01000 CR3: 000000082b838000 CR4: 00000000003506f0
 Call Trace:
  <TASK>
  ? __iommu_probe_device+0x10b/0x510
  ? __warn.cold+0x93/0xf7
  ? __iommu_probe_device+0x10b/0x510
  ? report_bug+0xff/0x140
  ? prb_read_valid+0x1b/0x30
  ? handle_bug+0x58/0x90
  ? exc_invalid_op+0x17/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? __iommu_probe_device+0x10b/0x510
  ? __iommu_probe_device+0x10b/0x510
  ? __pfx_probe_iommu_group+0x10/0x10
  probe_iommu_group+0x28/0x50
  bus_for_each_dev+0x7e/0xd0
  iommu_device_register+0xee/0x260
  iommu_go_to_state+0xa2a/0x1970
  ? srso_return_thunk+0x5/0x5f
  ? blake2s_update+0x68/0x160
  ? __pfx_pci_iommu_init+0x10/0x10
  amd_iommu_init+0x14/0x50
  ? __pfx_pci_iommu_init+0x10/0x10
  pci_iommu_init+0x12/0x40
  do_one_initcall+0x46/0x300
  kernel_init_freeable+0x23d/0x2a0
  ? __pfx_kernel_init+0x10/0x10
  kernel_init+0x1a/0x140
  ret_from_fork+0x34/0x50
  ? __pfx_kernel_init+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-11 18:42   ` Joerg Roedel
@ 2025-03-12  7:07     ` Baolu Lu
  2025-03-12 10:10     ` Robin Murphy
  1 sibling, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2025-03-12  7:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: baolu.lu, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On 2025/3/12 2:42, Joerg Roedel wrote:
> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
>> +	/*
>> +	 * And if we do now see any replay calls, they would indicate someone
>> +	 * misusing the dma_configure path outside bus code.
>> +	 */
>> +	if (dev->driver)
>> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
> This warning triggers on my workstation (with an AMD IOMMU), any ideas?
> 
>   ------------[ cut here ]------------
>   reg-dummy reg-dummy: late IOMMU probe at driver bind, something fishy here!
>   WARNING: CPU: 0 PID: 1 at drivers/iommu/iommu.c:449 __iommu_probe_device+0x10b/0x510
>   Modules linked in:
> 
>   CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6-iommu-next+ #1 1d691d7aebf343bde741cf4c8610d78a2ea2d2d9
>   Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019
>   RIP: 0010:__iommu_probe_device+0x10b/0x510
>   Code: 68 00 74 28 48 8b 6b 50 48 85 ed 75 03 48 8b 2b 48 89 df e8 87 71 06 00 48 89 ea 48 c7 c7 90 dd 2c 8b 48 89 c6 e8 35 83 77 ff <0f> 0b 49 8b bd a8 00 00 00 e8 77 ab 85 ff 84 c0 0f 84 ad 01 00 00
>   RSP: 0018:ffffafba00047c58 EFLAGS: 00010282
>   RAX: 0000000000000000 RBX: ffffa00481301c10 RCX: 0000000000000003
>   RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
>   RBP: ffffa00480ffaee0 R08: 0000000000000000 R09: ffffafba00047ae8
>   R10: ffffa0135e93ffa8 R11: 0000000000000003 R12: ffffafba00047d18
>   R13: ffffffff8adac1a0 R14: 0000000000000000 R15: ffffa004802c5800
>   FS:  0000000000000000(0000) GS:ffffa0135ea00000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: ffffa00baca01000 CR3: 000000082b838000 CR4: 00000000003506f0
>   Call Trace:
>    <TASK>
>    ? __iommu_probe_device+0x10b/0x510
>    ? __warn.cold+0x93/0xf7
>    ? __iommu_probe_device+0x10b/0x510
>    ? report_bug+0xff/0x140
>    ? prb_read_valid+0x1b/0x30
>    ? handle_bug+0x58/0x90
>    ? exc_invalid_op+0x17/0x70
>    ? asm_exc_invalid_op+0x1a/0x20
>    ? __iommu_probe_device+0x10b/0x510
>    ? __iommu_probe_device+0x10b/0x510
>    ? __pfx_probe_iommu_group+0x10/0x10
>    probe_iommu_group+0x28/0x50
>    bus_for_each_dev+0x7e/0xd0
>    iommu_device_register+0xee/0x260
>    iommu_go_to_state+0xa2a/0x1970
>    ? srso_return_thunk+0x5/0x5f
>    ? blake2s_update+0x68/0x160
>    ? __pfx_pci_iommu_init+0x10/0x10
>    amd_iommu_init+0x14/0x50
>    ? __pfx_pci_iommu_init+0x10/0x10
>    pci_iommu_init+0x12/0x40
>    do_one_initcall+0x46/0x300
>    kernel_init_freeable+0x23d/0x2a0
>    ? __pfx_kernel_init+0x10/0x10
>    kernel_init+0x1a/0x140
>    ret_from_fork+0x34/0x50
>    ? __pfx_kernel_init+0x10/0x10
>    ret_from_fork_asm+0x1a/0x30
>    </TASK>
>   ---[ end trace 0000000000000000 ]---

I saw the same issue on Intel platforms.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-11 18:42   ` Joerg Roedel
  2025-03-12  7:07     ` Baolu Lu
@ 2025-03-12 10:10     ` Robin Murphy
  2025-03-12 14:34       ` Baolu Lu
  2025-03-12 15:21       ` Joerg Roedel
  1 sibling, 2 replies; 44+ messages in thread
From: Robin Murphy @ 2025-03-12 10:10 UTC (permalink / raw)
  To: Joerg Roedel, Baolu Lu
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas,
	linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 2025-03-11 6:42 pm, Joerg Roedel wrote:
> Hi Robin,
> 
> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
>> +	/*
>> +	 * And if we do now see any replay calls, they would indicate someone
>> +	 * misusing the dma_configure path outside bus code.
>> +	 */
>> +	if (dev->driver)
>> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
> 
> This warning triggers on my workstation (with an AMD IOMMU), any ideas?

Argh! When I moved the dma_configure call into iommu_init_device() for
v2 I moved the warning with it, but of course that needs to stay where
it was, *after* the point that ops->probe_device has had a chance to
filter out irrelevant devices. Does this make it behave?

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 09798ddbce9d..1da6c55a0d02 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -437,12 +437,6 @@ static int iommu_init_device(struct device *dev)
  		ret = -ENODEV;
  		goto err_free;
  	}
-	/*
-	 * And if we do now see any replay calls, they would indicate someone
-	 * misusing the dma_configure path outside bus code.
-	 */
-	if (dev->driver)
-		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
  
  	if (!try_module_get(ops->owner)) {
  		ret = -EINVAL;
@@ -565,6 +559,12 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
  	ret = iommu_init_device(dev);
  	if (ret)
  		return ret;
+	/*
+	 * And if we do now see any replay calls, they would indicate someone
+	 * misusing the dma_configure path outside bus code.
+	 */
+	if (dev->driver)
+		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
  
  	group = dev->iommu_group;
  	gdev = iommu_group_alloc_device(group, dev);


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-12 10:10     ` Robin Murphy
@ 2025-03-12 14:34       ` Baolu Lu
  2025-03-12 15:21       ` Joerg Roedel
  1 sibling, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2025-03-12 14:34 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel
  Cc: baolu.lu, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On 2025/3/12 18:10, Robin Murphy wrote:
> On 2025-03-11 6:42 pm, Joerg Roedel wrote:
>> Hi Robin,
>>
>> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
>>> +    /*
>>> +     * And if we do now see any replay calls, they would indicate 
>>> someone
>>> +     * misusing the dma_configure path outside bus code.
>>> +     */
>>> +    if (dev->driver)
>>> +        dev_WARN(dev, "late IOMMU probe at driver bind, something 
>>> fishy here!\n");
>>
>> This warning triggers on my workstation (with an AMD IOMMU), any ideas?
> 
> Argh! When I moved the dma_configure call into iommu_init_device() for
> v2 I moved the warning with it, but of course that needs to stay where
> it was, *after* the point that ops->probe_device has had a chance to
> filter out irrelevant devices. Does this make it behave?

Yes. It works on my end.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-12 10:10     ` Robin Murphy
  2025-03-12 14:34       ` Baolu Lu
@ 2025-03-12 15:21       ` Joerg Roedel
  1 sibling, 0 replies; 44+ messages in thread
From: Joerg Roedel @ 2025-03-12 15:21 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Baolu Lu, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

Hi Robin,

On Wed, Mar 12, 2025 at 10:10:04AM +0000, Robin Murphy wrote:
> Argh! When I moved the dma_configure call into iommu_init_device() for
> v2 I moved the warning with it, but of course that needs to stay where
> it was, *after* the point that ops->probe_device has had a chance to
> filter out irrelevant devices. Does this make it behave?

Okay, thanks for the patch. I am currently building a kernel to test it
and will report back.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
       [not found]   ` <CGME20250313095633eucas1p29cb55f2504b4bcf67c16b3bd3fa9b8cd@eucas1p2.samsung.com>
@ 2025-03-13  9:56     ` Marek Szyprowski
  2025-03-13 11:01       ` Robin Murphy
  0 siblings, 1 reply; 44+ messages in thread
From: Marek Szyprowski @ 2025-03-13  9:56 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

Hi Robin,

On 28.02.2025 16:46, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
>
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
>
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
>
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
>
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

This patch landed in yesterday's linux-next as commit bcb81ac6ae3c 
("iommu: Get DT/ACPI parsing into the proper probe path"). In my tests I 
found it breaks booting of ARM64 RK3568-based Odroid-M1 board 
(arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the 
relevant kernel log:

Unable to handle kernel NULL pointer dereference at virtual address 
00000000000003e8
Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[00000000000003e8] user address but active_mm is swapper
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : devm_kmalloc+0x2c/0x114
lr : rk_iommu_of_xlate+0x30/0x90
...
Call trace:
  devm_kmalloc+0x2c/0x114 (P)
  rk_iommu_of_xlate+0x30/0x90
  of_iommu_xlate+0x9c/0xc8
  of_iommu_configure+0x1c4/0x23c
  of_dma_configure_id+0x244/0x3b8
  platform_dma_configure+0x74/0xac
  __iommu_probe_device+0x258/0x49c
  probe_iommu_group+0x3c/0x64
  bus_for_each_dev+0x74/0xd4
  iommu_device_register+0xd4/0x230
  rk_iommu_probe+0x1f8/0x350
  platform_probe+0x68/0xdc
  really_probe+0xbc/0x298
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0x40/0x164
  __driver_attach+0x9c/0x1ac
  bus_for_each_dev+0x74/0xd4
  driver_attach+0x24/0x30
  bus_add_driver+0xe4/0x208
  driver_register+0x60/0x128
  __platform_driver_register+0x24/0x30
  rk_iommu_driver_init+0x1c/0x28
  do_one_initcall+0x64/0x308
  kernel_init_freeable+0x288/0x4f0
  kernel_init+0x20/0x1d8
  ret_from_fork+0x10/0x20
Code: d2801017 a90153f3 ab0102e0 aa0103f4 (b943eab8)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x100,00000040,00901250,8200720b
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b ]---

Let me know if You want me to test any experimental patch or provide 
more information.

> ---
>
> v2:
>   - Comment bus driver changes for clarity
>   - Use dev->iommu as the now-robust replay condition
>   - Drop the device_iommu_mapped() checks in the firmware paths as they
>     weren't doing much - we can't replace probe_device_lock just yet...
>   
>   drivers/acpi/arm64/dma.c        |  5 +++++
>   drivers/acpi/scan.c             |  7 -------
>   drivers/amba/bus.c              |  3 ++-
>   drivers/base/platform.c         |  3 ++-
>   drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
>   drivers/cdx/cdx.c               |  3 ++-
>   drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
>   drivers/iommu/of_iommu.c        |  7 ++++++-
>   drivers/of/device.c             |  7 ++++++-
>   drivers/pci/pci-driver.c        |  3 ++-
>   10 files changed, 48 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> index 52b2abf88689..f30f138352b7 100644
> --- a/drivers/acpi/arm64/dma.c
> +++ b/drivers/acpi/arm64/dma.c
> @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
>   	else
>   		end = (1ULL << 32) - 1;
>   
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		return;
> +	}
> +
>   	ret = acpi_dma_get_range(dev, &map);
>   	if (!ret && map) {
>   		end = dma_range_map_max(map);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>   		err = viot_iommu_configure(dev);
>   	mutex_unlock(&iommu_probe_device_lock);
>   
> -	/*
> -	 * If we have reason to believe the IOMMU driver missed the initial
> -	 * iommu_probe_device() call for dev, replay it to get things in order.
> -	 */
> -	if (!err && dev->bus)
> -		err = iommu_probe_device(dev);
> -
>   	return err;
>   }
>   
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 8ef259b4d037..71482d639a6d 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
>   		ret = acpi_dma_configure(dev, attr);
>   	}
>   
> -	if (!ret && !drv->driver_managed_dma) {
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !drv->driver_managed_dma) {
>   		ret = iommu_device_use_default_domain(dev);
>   		if (ret)
>   			arch_teardown_dma_ops(dev);
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
>   		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>   		ret = acpi_dma_configure(dev, attr);
>   	}
> -	if (ret || drv->driver_managed_dma)
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (ret || !dev->driver || drv->driver_managed_dma)
>   		return ret;
>   
>   	ret = iommu_device_use_default_domain(dev);
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index d1f3d327ddd1..a8be8cf246fb 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
>   	else
>   		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
>   
> -	if (!ret && !mc_drv->driver_managed_dma) {
> +	/* @mc_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
>   		ret = iommu_device_use_default_domain(dev);
>   		if (ret)
>   			arch_teardown_dma_ops(dev);
> diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> index c573ed2ee71a..780fb0c4adba 100644
> --- a/drivers/cdx/cdx.c
> +++ b/drivers/cdx/cdx.c
> @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
>   		return ret;
>   	}
>   
> -	if (!ret && !cdx_drv->driver_managed_dma) {
> +	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
>   		ret = iommu_device_use_default_domain(dev);
>   		if (ret)
>   			arch_teardown_dma_ops(dev);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a3b45b84f42b..1cec7074367a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
>   	if (!dev_iommu_get(dev))
>   		return -ENOMEM;
>   	/*
> -	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> -	 * instances with non-NULL fwnodes, and client devices should have been
> -	 * identified with a fwspec by this point. Otherwise, we can currently
> +	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
> +	 * is buried in the bus dma_configure path. Properly unpicking that is
> +	 * still a big job, so for now just invoke the whole thing. The device
> +	 * already having a driver bound means dma_configure has already run and
> +	 * either found no IOMMU to wait for, or we're in its replay call right
> +	 * now, so either way there's no point calling it again.
> +	 */
> +	if (!dev->driver && dev->bus->dma_configure) {
> +		mutex_unlock(&iommu_probe_device_lock);
> +		dev->bus->dma_configure(dev);
> +		mutex_lock(&iommu_probe_device_lock);
> +	}
> +	/*
> +	 * At this point, relevant devices either now have a fwspec which will
> +	 * match ops registered with a non-NULL fwnode, or we can reasonably
>   	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
>   	 * be present, and that any of their registered instances has suitable
>   	 * ops for probing, and thus cheekily co-opt the same mechanism.
> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>   		ret = -ENODEV;
>   		goto err_free;
>   	}
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>   
>   	if (!try_module_get(ops->owner)) {
>   		ret = -EINVAL;
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e10a68b5ffde..6b989a62def2 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>   		dev_iommu_free(dev);
>   	mutex_unlock(&iommu_probe_device_lock);
>   
> -	if (!err && dev->bus)
> +	/*
> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> +	 * initial dev->iommu) then try to simulate it. This should no longer
> +	 * happen unless of_dma_configure() is being misused outside bus code.
> +	 */
> +	if (!err && dev->bus && !dev_iommu_present)
>   		err = iommu_probe_device(dev);
>   
>   	if (err && err != -EPROBE_DEFER)
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index edf3be197265..5053e5d532cc 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>   	bool coherent, set_map = false;
>   	int ret;
>   
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		goto skip_map;
> +	}
> +
>   	if (np == dev->of_node)
>   		bus_np = __of_get_dma_parent(np);
>   	else
> @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>   		end = dma_range_map_max(map);
>   		set_map = true;
>   	}
> -
> +skip_map:
>   	/*
>   	 * If @dev is expected to be DMA-capable then the bus code that created
>   	 * it should have initialised its dma_mask pointer by this point. For
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f57ea36d125d..4468b7327cab 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
>   
>   	pci_put_host_bridge_device(bridge);
>   
> -	if (!ret && !driver->driver_managed_dma) {
> +	/* @driver may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !driver->driver_managed_dma) {
>   		ret = iommu_device_use_default_domain(dev);
>   		if (ret)
>   			arch_teardown_dma_ops(dev);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13  9:56     ` Marek Szyprowski
@ 2025-03-13 11:01       ` Robin Murphy
  2025-03-13 12:23         ` Marek Szyprowski
  2025-03-13 16:30         ` Anders Roxell
  0 siblings, 2 replies; 44+ messages in thread
From: Robin Murphy @ 2025-03-13 11:01 UTC (permalink / raw)
  To: Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 2025-03-13 9:56 am, Marek Szyprowski wrote:
[...]
> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my tests I
> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
> relevant kernel log:

...and the bug-flushing-out begins!

> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000000003e8
> Mem abort info:
>     ESR = 0x0000000096000004
>     EC = 0x25: DABT (current EL), IL = 32 bits
>     SET = 0, FnV = 0
>     EA = 0, S1PTW = 0
>     FSC = 0x04: level 0 translation fault
> Data abort info:
>     ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>     CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [00000000000003e8] user address but active_mm is swapper
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
> Hardware name: Hardkernel ODROID-M1 (DT)
> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : devm_kmalloc+0x2c/0x114
> lr : rk_iommu_of_xlate+0x30/0x90
> ...
> Call trace:
>    devm_kmalloc+0x2c/0x114 (P)
>    rk_iommu_of_xlate+0x30/0x90

Yeah, looks like this is doing something a bit questionable which can't
work properly. TBH the whole dma_dev thing could probably be cleaned up
now that we have proper instances, but for now does this work?

(annoyingly none of my Rockchip boards are set up for testing right now, 
but I might have time to dig one out later)

Thanks,
Robin.

----->8-----

Subject: [PATCH] iommu/rockchip: Allocate per-device data sensibly

Now that DT-based probing is finally happening in the right order again,
it reveals an issue in Rockchip's of_xlate, which can now be called
during registration, but is using the global dma_dev which is only
assigned later. However, this makes little sense when we're already
looking up the correct IOMMU device, who should logically be the owner
of the devm allocation anyway.

Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe 
path")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/iommu/rockchip-iommu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 323cc665c357..48826d1ccfd8 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1148,12 +1148,12 @@ static int rk_iommu_of_xlate(struct device *dev,
  	struct platform_device *iommu_dev;
  	struct rk_iommudata *data;

-	data = devm_kzalloc(dma_dev, sizeof(*data), GFP_KERNEL);
+	iommu_dev = of_find_device_by_node(args->np);
+
+	data = devm_kzalloc(&iommu_dev->dev, sizeof(*data), GFP_KERNEL);
  	if (!data)
  		return -ENOMEM;

-	iommu_dev = of_find_device_by_node(args->np);
-
  	data->iommu = platform_get_drvdata(iommu_dev);
  	data->iommu->domain = &rk_identity_domain;
  	dev_iommu_priv_set(dev, data);
-- 
2.39.2.101.g768bb238c484.dirty



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13 11:01       ` Robin Murphy
@ 2025-03-13 12:23         ` Marek Szyprowski
  2025-03-13 13:06           ` Robin Murphy
  2025-03-13 16:30         ` Anders Roxell
  1 sibling, 1 reply; 44+ messages in thread
From: Marek Szyprowski @ 2025-03-13 12:23 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 13.03.2025 12:01, Robin Murphy wrote:
> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
> [...]
>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my tests I
>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>> relevant kernel log:
>
> ...and the bug-flushing-out begins!
>
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 00000000000003e8
>> Mem abort info:
>>     ESR = 0x0000000096000004
>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>     SET = 0, FnV = 0
>>     EA = 0, S1PTW = 0
>>     FSC = 0x04: level 0 translation fault
>> Data abort info:
>>     ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>     CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [00000000000003e8] user address but active_mm is swapper
>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>> Hardware name: Hardkernel ODROID-M1 (DT)
>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : devm_kmalloc+0x2c/0x114
>> lr : rk_iommu_of_xlate+0x30/0x90
>> ...
>> Call trace:
>>    devm_kmalloc+0x2c/0x114 (P)
>>    rk_iommu_of_xlate+0x30/0x90
>
> Yeah, looks like this is doing something a bit questionable which can't
> work properly. TBH the whole dma_dev thing could probably be cleaned up
> now that we have proper instances, but for now does this work?

Yes, this patch fixes the problem I've observed.

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

BTW, this dma_dev idea has been borrowed from my exynos_iommu driver and 
I doubt it can be cleaned up.


>
> (annoyingly none of my Rockchip boards are set up for testing right 
> now, but I might have time to dig one out later)
>
> Thanks,
> Robin.
>
> ----->8-----
>
> Subject: [PATCH] iommu/rockchip: Allocate per-device data sensibly
>
> Now that DT-based probing is finally happening in the right order again,
> it reveals an issue in Rockchip's of_xlate, which can now be called
> during registration, but is using the global dma_dev which is only
> assigned later. However, this makes little sense when we're already
> looking up the correct IOMMU device, who should logically be the owner
> of the devm allocation anyway.
>
> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe 
> path")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/rockchip-iommu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/rockchip-iommu.c 
> b/drivers/iommu/rockchip-iommu.c
> index 323cc665c357..48826d1ccfd8 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -1148,12 +1148,12 @@ static int rk_iommu_of_xlate(struct device *dev,
>      struct platform_device *iommu_dev;
>      struct rk_iommudata *data;
>
> -    data = devm_kzalloc(dma_dev, sizeof(*data), GFP_KERNEL);
> +    iommu_dev = of_find_device_by_node(args->np);
> +
> +    data = devm_kzalloc(&iommu_dev->dev, sizeof(*data), GFP_KERNEL);
>      if (!data)
>          return -ENOMEM;
>
> -    iommu_dev = of_find_device_by_node(args->np);
> -
>      data->iommu = platform_get_drvdata(iommu_dev);
>      data->iommu->domain = &rk_identity_domain;
>      dev_iommu_priv_set(dev, data);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13 12:23         ` Marek Szyprowski
@ 2025-03-13 13:06           ` Robin Murphy
  2025-03-13 14:12             ` Robin Murphy
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-03-13 13:06 UTC (permalink / raw)
  To: Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
> On 13.03.2025 12:01, Robin Murphy wrote:
>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>> [...]
>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my tests I
>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>> relevant kernel log:
>>
>> ...and the bug-flushing-out begins!
>>
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 00000000000003e8
>>> Mem abort info:
>>>      ESR = 0x0000000096000004
>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>      SET = 0, FnV = 0
>>>      EA = 0, S1PTW = 0
>>>      FSC = 0x04: level 0 translation fault
>>> Data abort info:
>>>      ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> [00000000000003e8] user address but active_mm is swapper
>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : devm_kmalloc+0x2c/0x114
>>> lr : rk_iommu_of_xlate+0x30/0x90
>>> ...
>>> Call trace:
>>>     devm_kmalloc+0x2c/0x114 (P)
>>>     rk_iommu_of_xlate+0x30/0x90
>>
>> Yeah, looks like this is doing something a bit questionable which can't
>> work properly. TBH the whole dma_dev thing could probably be cleaned up
>> now that we have proper instances, but for now does this work?
> 
> Yes, this patch fixes the problem I've observed.
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver and
> I doubt it can be cleaned up.

On the contrary I suspect they both can - it all dates back to when we 
had the single global platform bus iommu_ops and the SoC drivers were 
forced to bodge their own notion of multiple instances, but with the 
modern core code, ops are always called via a valid IOMMU instance or 
domain, so in principle it should always be possible to get at an 
appropriate IOMMU device now. IIRC it was mostly about allocating and 
DMA-mapping the pagetables in domain_alloc, where the private notion of 
instances didn't have enough information, but domain_alloc_paging solves 
that.

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13 13:06           ` Robin Murphy
@ 2025-03-13 14:12             ` Robin Murphy
  2025-03-17  7:37               ` Marek Szyprowski
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-03-13 14:12 UTC (permalink / raw)
  To: Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 2025-03-13 1:06 pm, Robin Murphy wrote:
> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>> On 13.03.2025 12:01, Robin Murphy wrote:
>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>> [...]
>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my 
>>>> tests I
>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>> relevant kernel log:
>>>
>>> ...and the bug-flushing-out begins!
>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000000000003e8
>>>> Mem abort info:
>>>>      ESR = 0x0000000096000004
>>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>>      SET = 0, FnV = 0
>>>>      EA = 0, S1PTW = 0
>>>>      FSC = 0x04: level 0 translation fault
>>>> Data abort info:
>>>>      ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>> [00000000000003e8] user address but active_mm is swapper
>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>> Modules linked in:
>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : devm_kmalloc+0x2c/0x114
>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>> ...
>>>> Call trace:
>>>>     devm_kmalloc+0x2c/0x114 (P)
>>>>     rk_iommu_of_xlate+0x30/0x90
>>>
>>> Yeah, looks like this is doing something a bit questionable which can't
>>> work properly. TBH the whole dma_dev thing could probably be cleaned up
>>> now that we have proper instances, but for now does this work?
>>
>> Yes, this patch fixes the problem I've observed.
>>
>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>
>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver and
>> I doubt it can be cleaned up.
> 
> On the contrary I suspect they both can - it all dates back to when we 
> had the single global platform bus iommu_ops and the SoC drivers were 
> forced to bodge their own notion of multiple instances, but with the 
> modern core code, ops are always called via a valid IOMMU instance or 
> domain, so in principle it should always be possible to get at an 
> appropriate IOMMU device now. IIRC it was mostly about allocating and 
> DMA-mapping the pagetables in domain_alloc, where the private notion of 
> instances didn't have enough information, but domain_alloc_paging solves 
> that.

Bah, in fact I think I am going to have to do that now, since although 
it doesn't crash, rk_domain_alloc_paging() will also be failing for the 
same reason. Time to find a PSU for the RK3399 board, I guess...

(Or maybe just move the dma_dev assignment earlier to match Exynos?)

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13 11:01       ` Robin Murphy
  2025-03-13 12:23         ` Marek Szyprowski
@ 2025-03-13 16:30         ` Anders Roxell
  1 sibling, 0 replies; 44+ messages in thread
From: Anders Roxell @ 2025-03-13 16:30 UTC (permalink / raw)
  To: robin.murphy, Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Rafael J. Wysocki, Len Brown, Russell King,
	Greg Kroah-Hartman, Danilo Krummrich, Stuart Yoder,
	Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal, Joerg Roedel,
	Will Deacon, Rob Herring, Saravana Kannan, Bjorn Helgaas
  Cc: devicetree, iommu, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-pci, quic_charante, Anders Roxell

From: Robin Murphy <robin.murphy@arm.com>

> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
> [...]
> > This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
> > ("iommu: Get DT/ACPI parsing into the proper probe path"). In my tests I
> > found it breaks booting of ARM64 RK3568-based Odroid-M1 board
> > (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
> > relevant kernel log:
> 
> ...and the bug-flushing-out begins!
> 
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000000000003e8
> > Mem abort info:
> >     ESR = 0x0000000096000004
> >     EC = 0x25: DABT (current EL), IL = 32 bits
> >     SET = 0, FnV = 0
> >     EA = 0, S1PTW = 0
> >     FSC = 0x04: level 0 translation fault
> > Data abort info:
> >     ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >     CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [00000000000003e8] user address but active_mm is swapper
> > Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
> > Hardware name: Hardkernel ODROID-M1 (DT)
> > pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : devm_kmalloc+0x2c/0x114
> > lr : rk_iommu_of_xlate+0x30/0x90
> > ...
> > Call trace:
> >    devm_kmalloc+0x2c/0x114 (P)
> >    rk_iommu_of_xlate+0x30/0x90
> 
> Yeah, looks like this is doing something a bit questionable which can't
> work properly. TBH the whole dma_dev thing could probably be cleaned up
> now that we have proper instances, but for now does this work?
> 
> (annoyingly none of my Rockchip boards are set up for testing right now, 
> but I might have time to dig one out later)
> 
> Thanks,
> Robin.
> 
> ----->8-----
> 
> Subject: [PATCH] iommu/rockchip: Allocate per-device data sensibly
> 
> Now that DT-based probing is finally happening in the right order again,
> it reveals an issue in Rockchip's of_xlate, which can now be called
> during registration, but is using the global dma_dev which is only
> assigned later. However, this makes little sense when we're already
> looking up the correct IOMMU device, who should logically be the owner
> of the devm allocation anyway.
> 
> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe 
> path")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

This patch fixed the boot on rockpi4.
Applied it ontop of next-20250313

Tested-by: Anders Roxell <anders.roxell@linaro.org>

Cheers,
Anders

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-13 14:12             ` Robin Murphy
@ 2025-03-17  7:37               ` Marek Szyprowski
  2025-03-17 18:22                 ` Robin Murphy
  0 siblings, 1 reply; 44+ messages in thread
From: Marek Szyprowski @ 2025-03-17  7:37 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 13.03.2025 15:12, Robin Murphy wrote:
> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>> [...]
>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my 
>>>>> tests I
>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>> relevant kernel log:
>>>>
>>>> ...and the bug-flushing-out begins!
>>>>
>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>> 00000000000003e8
>>>>> Mem abort info:
>>>>>      ESR = 0x0000000096000004
>>>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>      SET = 0, FnV = 0
>>>>>      EA = 0, S1PTW = 0
>>>>>      FSC = 0x04: level 0 translation fault
>>>>> Data abort info:
>>>>>      ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>> Modules linked in:
>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>> ...
>>>>> Call trace:
>>>>>     devm_kmalloc+0x2c/0x114 (P)
>>>>>     rk_iommu_of_xlate+0x30/0x90
>>>>
>>>> Yeah, looks like this is doing something a bit questionable which 
>>>> can't
>>>> work properly. TBH the whole dma_dev thing could probably be 
>>>> cleaned up
>>>> now that we have proper instances, but for now does this work?
>>>
>>> Yes, this patch fixes the problem I've observed.
>>>
>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>
>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver 
>>> and
>>> I doubt it can be cleaned up.
>>
>> On the contrary I suspect they both can - it all dates back to when 
>> we had the single global platform bus iommu_ops and the SoC drivers 
>> were forced to bodge their own notion of multiple instances, but with 
>> the modern core code, ops are always called via a valid IOMMU 
>> instance or domain, so in principle it should always be possible to 
>> get at an appropriate IOMMU device now. IIRC it was mostly about 
>> allocating and DMA-mapping the pagetables in domain_alloc, where the 
>> private notion of instances didn't have enough information, but 
>> domain_alloc_paging solves that.
>
> Bah, in fact I think I am going to have to do that now, since although 
> it doesn't crash, rk_domain_alloc_paging() will also be failing for 
> the same reason. Time to find a PSU for the RK3399 board, I guess...
>
> (Or maybe just move the dma_dev assignment earlier to match Exynos?)

Well I just found that Exynos IOMMU is also broken on some on my test 
boards. It looks that the runtime pm links are somehow not correctly 
established. I will try to analyze this later in the afternoon.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-17  7:37               ` Marek Szyprowski
@ 2025-03-17 18:22                 ` Robin Murphy
  2025-03-21 12:15                   ` Marek Szyprowski
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-03-17 18:22 UTC (permalink / raw)
  To: Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 17/03/2025 7:37 am, Marek Szyprowski wrote:
> On 13.03.2025 15:12, Robin Murphy wrote:
>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>> [...]
>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>> tests I
>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>> relevant kernel log:
>>>>>
>>>>> ...and the bug-flushing-out begins!
>>>>>
>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>> 00000000000003e8
>>>>>> Mem abort info:
>>>>>>       ESR = 0x0000000096000004
>>>>>>       EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>       SET = 0, FnV = 0
>>>>>>       EA = 0, S1PTW = 0
>>>>>>       FSC = 0x04: level 0 translation fault
>>>>>> Data abort info:
>>>>>>       ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>>       CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>>       GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>> Modules linked in:
>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>> ...
>>>>>> Call trace:
>>>>>>      devm_kmalloc+0x2c/0x114 (P)
>>>>>>      rk_iommu_of_xlate+0x30/0x90
>>>>>
>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>> can't
>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>> cleaned up
>>>>> now that we have proper instances, but for now does this work?
>>>>
>>>> Yes, this patch fixes the problem I've observed.
>>>>
>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>
>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
>>>> and
>>>> I doubt it can be cleaned up.
>>>
>>> On the contrary I suspect they both can - it all dates back to when
>>> we had the single global platform bus iommu_ops and the SoC drivers
>>> were forced to bodge their own notion of multiple instances, but with
>>> the modern core code, ops are always called via a valid IOMMU
>>> instance or domain, so in principle it should always be possible to
>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>> private notion of instances didn't have enough information, but
>>> domain_alloc_paging solves that.
>>
>> Bah, in fact I think I am going to have to do that now, since although
>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>
>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
> 
> Well I just found that Exynos IOMMU is also broken on some on my test
> boards. It looks that the runtime pm links are somehow not correctly
> established. I will try to analyze this later in the afternoon.

Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable to 
boot my original 6.14-rc3-based branch - even with the IOMMU driver 
disabled, it's consistently dying somewhere near (or just after) init 
with what looks like some catastrophic memory corruption issue - very 
occasionally it's managed to print the first line of various different 
panics.

Before that point though, with the IOMMU driver enabled it does appear 
to show signs of working OK:

[    0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
[    0.654220] platform 14450000.mixer: Adding to iommu group 1
...
[    2.680920] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: 
Attached IOMMU with pgtable 0x42924000
...
[    5.196674] exynos-mixer 14450000.mixer: 
exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable 
0x42924000
[    5.207091] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: 
Attached IOMMU with pgtable 0x42884000


The multi-instance stuff in probe/release does look a bit suspect, 
however - seems like the second instance probe would overwrite the first 
instance's links, and then there would be a double-del() if the device 
were ever actually released again? I may have made that much more likely 
to happen, but I suspect it was already possible with async driver probe...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
                     ` (3 preceding siblings ...)
       [not found]   ` <CGME20250313095633eucas1p29cb55f2504b4bcf67c16b3bd3fa9b8cd@eucas1p2.samsung.com>
@ 2025-03-18 16:37   ` Geert Uytterhoeven
  2025-03-18 17:24     ` Robin Murphy
  2025-03-27  9:47   ` Chen-Yu Tsai
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Geert Uytterhoeven @ 2025-03-18 16:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla, Linux-Renesas

Hi Robin,

On Fri, 28 Feb 2025 at 16:51, Robin Murphy <robin.murphy@arm.com> wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
>
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
>
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
>
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
>
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Thanks for your patch, which is now commit bcb81ac6ae3c2ef9 ("iommu:
Get DT/ACPI parsing into the proper probe path") in iommu/next.

This patch triggers two issues on R-Car Gen3 platforms:

1. I am seeing a warning on Renesas Salvator-XS with R-Car M3N
(but not on the similar board with R-Car H3), and only for SATA[1].
Unfortunately commit 73d2f10957f517e5 ("iommu: Don't warn prematurely
about dodgy probes") does not help:

    ------------[ cut here ]------------
    sata_rcar ee300000.sata: late IOMMU probe at driver bind,
something fishy here!
    WARNING: CPU: 1 PID: 13 at drivers/iommu/iommu.c:571
__iommu_probe_device+0x208/0x38c
    Modules linked in:
    CPU: 1 UID: 0 PID: 13 Comm: kworker/u8:1 Not tainted
6.14.0-rc3-rcar3-00020-g73d2f10957f5-dirty #315
    Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
    Workqueue: events_unbound deferred_probe_work_func
    pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : __iommu_probe_device+0x208/0x38c
    lr : __iommu_probe_device+0x208/0x38c
    sp : ffffffc086da39a0
    x29: ffffffc086da39b0 x28: 0000000000000000 x27: 0000000000000000
    x26: 0000000000000001 x25: ffffffc080e0e0ae x24: ffffffc080e0e0bb
    x23: 0000000000000000 x22: ffffff800bd3d090 x21: ffffffc080acf680
    x20: ffffff8008e8f780 x19: ffffff800aca8810 x18: 00000000e9f55e4c
    x17: 6874656d6f73202c x16: 646e696220726576 x15: 0720072007200720
    x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
    x11: 00000000000001ac x10: ffffffc086da3700 x9 : ffffffc083edb140
    x8 : ffffffc086da3698 x7 : ffffffc086da36a0 x6 : 00000000fff7ffff
    x5 : c0000000fff7ffff x4 : 0000000000000000 x3 : 0000000000000001
    x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80083d5380
    Call trace:
     __iommu_probe_device+0x208/0x38c (P)
     iommu_probe_device+0x34/0x74
     of_iommu_configure+0x128/0x200
     of_dma_configure_id+0xdc/0x1d4
     platform_dma_configure+0x48/0x6c
     really_probe+0xf0/0x260
     __driver_probe_device+0xec/0x104
     driver_probe_device+0x3c/0xc0
     __device_attach_driver+0x58/0xcc
     bus_for_each_drv+0xb8/0xe0
     __device_attach+0xdc/0x138
     device_initial_probe+0x10/0x18
     bus_probe_device+0x38/0xa0
     deferred_probe_work_func+0xb4/0xcc
     process_scheduled_works+0x2e4/0x4a8
     worker_thread+0x144/0x1cc
     kthread+0x188/0x198
     ret_from_fork+0x10/0x20
    irq event stamp: 49052
    hardirqs last  enabled at (49051): [<ffffffc0800fb6a8>]
__up_console_sem+0x50/0x74
    hardirqs last disabled at (49052): [<ffffffc0809eb65c>] el1_dbg+0x20/0x6c
    softirqs last  enabled at (49046): [<ffffffc080096c50>]
handle_softirqs+0x1b0/0x3b4
    softirqs last disabled at (48839): [<ffffffc080010168>]
__do_softirq+0x10/0x18
    ---[ end trace 0000000000000000 ]---

I added debug prints to sata_rcar_probe(), and verified SATA is
probed at about the same time on R-Car H3 and M3N, and the probe is
never deferred.

Do you have a clue?


2. The IOMMU driver's iommu_ops.of_xlate() callback is called about
three times as much as before:

    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform fea27000.fcp: ipmmu_of_xlate
    +platform fea2f000.fcp: ipmmu_of_xlate
    +platform ec700000.dma-controller: ipmmu_of_xlate
    +platform ec720000.dma-controller: ipmmu_of_xlate
    +platform fe950000.fcp: ipmmu_of_xlate
    +platform fe96f000.fcp: ipmmu_of_xlate
    +platform fea27000.fcp: ipmmu_of_xlate
    +platform fea2f000.fcp: ipmmu_of_xlate
    +platform fe9af000.fcp: ipmmu_of_xlate
     rcar-fcp fe950000.fcp: ipmmu_of_xlate
     rcar-fcp fe96f000.fcp: ipmmu_of_xlate
     rcar-fcp fea27000.fcp: ipmmu_of_xlate
     rcar-fcp fea2f000.fcp: ipmmu_of_xlate
     rcar-fcp fe9af000.fcp: ipmmu_of_xlate
     rcar-dmac ec700000.dma-controller: ipmmu_of_xlate
     rcar-dmac ec720000.dma-controller: ipmmu_of_xlate
    +platform e6700000.dma-controller: ipmmu_of_xlate
    +platform e6800000.ethernet: ipmmu_of_xlate
    +platform e6700000.dma-controller: ipmmu_of_xlate
    +platform e7300000.dma-controller: ipmmu_of_xlate
    +platform e7310000.dma-controller: ipmmu_of_xlate
    +platform e6800000.ethernet: ipmmu_of_xlate
    +platform ee100000.mmc: ipmmu_of_xlate
    +platform ee140000.mmc: ipmmu_of_xlate
    +platform ee160000.mmc: ipmmu_of_xlate
    +platform e6700000.dma-controller: ipmmu_of_xlate
    +platform e7300000.dma-controller: ipmmu_of_xlate
    +platform e7310000.dma-controller: ipmmu_of_xlate
    +platform e6800000.ethernet: ipmmu_of_xlate
    +platform ee100000.mmc: ipmmu_of_xlate
    +platform ee140000.mmc: ipmmu_of_xlate
    +platform ee160000.mmc: ipmmu_of_xlate
    +platform ee300000.sata: ipmmu_of_xlate
     sata_rcar ee300000.sata: ipmmu_of_xlate
     ravb e6800000.ethernet: ipmmu_of_xlate
    -renesas_sdhi_internal_dmac ee100000.mmc: ipmmu_of_xlate
    -renesas_sdhi_internal_dmac ee140000.mmc: ipmmu_of_xlate
    -renesas_sdhi_internal_dmac ee160000.mmc: ipmmu_of_xlate
     rcar-dmac e6700000.dma-controller: ipmmu_of_xlate
     rcar-dmac e7300000.dma-controller: ipmmu_of_xlate
     rcar-dmac e7310000.dma-controller: ipmmu_of_xlate

For some devices, it can be called up to 6 times. All of the duplicates
happen before the device is bound (cfr. "platform" instead of the
actual driver name):

      6 platform ec720000.dma-controller: ipmmu_of_xlate
      6 platform ec700000.dma-controller: ipmmu_of_xlate
      3 platform e6800000.ethernet: ipmmu_of_xlate
      3 platform e6700000.dma-controller: ipmmu_of_xlate
      2 platform fea2f000.fcp: ipmmu_of_xlate
      2 platform fea27000.fcp: ipmmu_of_xlate
      2 platform ee160000.mmc: ipmmu_of_xlate
      2 platform ee140000.mmc: ipmmu_of_xlate
      2 platform ee100000.mmc: ipmmu_of_xlate
      2 platform e7310000.dma-controller: ipmmu_of_xlate
      2 platform e7300000.dma-controller: ipmmu_of_xlate
      1 platform fe9af000.fcp: ipmmu_of_xlate
      1 platform fe96f000.fcp: ipmmu_of_xlate
      1 platform fe950000.fcp: ipmmu_of_xlate
      1 platform ee300000.sata: ipmmu_of_xlate
      1 sata_rcar ee300000.sata: ipmmu_of_xlate
      1 rcar-fcp fea2f000.fcp: ipmmu_of_xlate
      1 rcar-fcp fea27000.fcp: ipmmu_of_xlate
      1 rcar-fcp fe9af000.fcp: ipmmu_of_xlate
      1 rcar-fcp fe96f000.fcp: ipmmu_of_xlate
      1 rcar-fcp fe950000.fcp: ipmmu_of_xlate
      1 rcar-dmac ec720000.dma-controller: ipmmu_of_xlate
      1 rcar-dmac ec700000.dma-controller: ipmmu_of_xlate
      1 rcar-dmac e7310000.dma-controller: ipmmu_of_xlate
      1 rcar-dmac e7300000.dma-controller: ipmmu_of_xlate
      1 rcar-dmac e6700000.dma-controller: ipmmu_of_xlate
      1 ravb e6800000.ethernet: ipmmu_of_xlate

Before, the callback was called just once for each DMA slave device.

Is this intentional?

Thanks!

[1] SATA IOMMU on R-Car Gen3 needs an out-of-tree patch to add it to
    drivers/iommu/ipmmu-vmsa.c:devices_allowlist[].

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-18 16:37   ` Geert Uytterhoeven
@ 2025-03-18 17:24     ` Robin Murphy
  2025-03-25 15:32       ` Geert Uytterhoeven
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-03-18 17:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla, Linux-Renesas

Hi Geert,

On 18/03/2025 4:37 pm, Geert Uytterhoeven wrote:
[...]
> Thanks for your patch, which is now commit bcb81ac6ae3c2ef9 ("iommu:
> Get DT/ACPI parsing into the proper probe path") in iommu/next.
> 
> This patch triggers two issues on R-Car Gen3 platforms:
> 
> 1. I am seeing a warning on Renesas Salvator-XS with R-Car M3N
> (but not on the similar board with R-Car H3), and only for SATA[1].
> Unfortunately commit 73d2f10957f517e5 ("iommu: Don't warn prematurely
> about dodgy probes") does not help:
[...]
>      Call trace:
>       __iommu_probe_device+0x208/0x38c (P)
>       iommu_probe_device+0x34/0x74
>       of_iommu_configure+0x128/0x200
>       of_dma_configure_id+0xdc/0x1d4
>       platform_dma_configure+0x48/0x6c
>       really_probe+0xf0/0x260
>       __driver_probe_device+0xec/0x104
>       driver_probe_device+0x3c/0xc0

Hurrah, this is the warning doing the correct job - something *is* off
if we're now getting here without the IOMMU configuration being done
already (for a normal device with no other funny business going on).

> 2. The IOMMU driver's iommu_ops.of_xlate() callback is called about
> three times as much as before:

That would suggest that the fwspec gets set up OK, then something later
in the __iommu_probe_device() path fails and tears it down again, so the
next attempt starts from scratch. Do you see the "Cannot attach to
IPMMU" message firing? And similarly to the Rockchip case, does the
below help?

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 074daf1aac4e..5d416262ae5f 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1081,6 +1081,7 @@ static int ipmmu_probe(struct platform_device *pdev)
  		}
  	}
  
+	platform_set_drvdata(pdev, mmu);
  	/*
  	 * Register the IPMMU to the IOMMU subsystem in the following cases:
  	 * - R-Car Gen2 IPMMU (all devices registered)
@@ -1103,7 +1104,6 @@ static int ipmmu_probe(struct platform_device *pdev)
  	 * ipmmu_init() after the probe function returns.
  	 */
  
-	platform_set_drvdata(pdev, mmu);
  
  	return 0;
  }

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-17 18:22                 ` Robin Murphy
@ 2025-03-21 12:15                   ` Marek Szyprowski
  2025-03-21 16:48                     ` Robin Murphy
  0 siblings, 1 reply; 44+ messages in thread
From: Marek Szyprowski @ 2025-03-21 12:15 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 17.03.2025 19:22, Robin Murphy wrote:
> On 17/03/2025 7:37 am, Marek Szyprowski wrote:
>> On 13.03.2025 15:12, Robin Murphy wrote:
>>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>>> [...]
>>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>>> tests I
>>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>>> relevant kernel log:
>>>>>>
>>>>>> ...and the bug-flushing-out begins!
>>>>>>
>>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>>> 00000000000003e8
>>>>>>> Mem abort info:
>>>>>>>       ESR = 0x0000000096000004
>>>>>>>       EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>>       SET = 0, FnV = 0
>>>>>>>       EA = 0, S1PTW = 0
>>>>>>>       FSC = 0x04: level 0 translation fault
>>>>>>> Data abort info:
>>>>>>>       ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>>>       CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>>>       GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>>> Modules linked in:
>>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>>> ...
>>>>>>> Call trace:
>>>>>>>      devm_kmalloc+0x2c/0x114 (P)
>>>>>>>      rk_iommu_of_xlate+0x30/0x90
>>>>>>
>>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>>> can't
>>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>>> cleaned up
>>>>>> now that we have proper instances, but for now does this work?
>>>>>
>>>>> Yes, this patch fixes the problem I've observed.
>>>>>
>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>
>>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
>>>>> and
>>>>> I doubt it can be cleaned up.
>>>>
>>>> On the contrary I suspect they both can - it all dates back to when
>>>> we had the single global platform bus iommu_ops and the SoC drivers
>>>> were forced to bodge their own notion of multiple instances, but with
>>>> the modern core code, ops are always called via a valid IOMMU
>>>> instance or domain, so in principle it should always be possible to
>>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>>> private notion of instances didn't have enough information, but
>>>> domain_alloc_paging solves that.
>>>
>>> Bah, in fact I think I am going to have to do that now, since although
>>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>>
>>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
>>
>> Well I just found that Exynos IOMMU is also broken on some on my test
>> boards. It looks that the runtime pm links are somehow not correctly
>> established. I will try to analyze this later in the afternoon.
>
> Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable 
> to boot my original 6.14-rc3-based branch - even with the IOMMU driver 
> disabled, it's consistently dying somewhere near (or just after) init 
> with what looks like some catastrophic memory corruption issue - very 
> occasionally it's managed to print the first line of various different 
> panics.
>
> Before that point though, with the IOMMU driver enabled it does appear 
> to show signs of working OK:
>
> [    0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
> [    0.654220] platform 14450000.mixer: Adding to iommu group 1
> ...
> [    2.680920] exynos-mixer 14450000.mixer: 
> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000
> ...
> [    5.196674] exynos-mixer 14450000.mixer: 
> exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable 
> 0x42924000
> [    5.207091] exynos-mixer 14450000.mixer: 
> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000
>
>
> The multi-instance stuff in probe/release does look a bit suspect, 
> however - seems like the second instance probe would overwrite the 
> first instance's links, and then there would be a double-del() if the 
> device were ever actually released again? I may have made that much 
> more likely to happen, but I suspect it was already possible with 
> async driver probe...

That is really strange. My Odroid XU3 boots fine from commit 
bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"), 
although the IOMMU seems not to be working correctly. I've tested this 
with 14450000.mixer device (one need to attach HDMI cable to get it 
activated) and it looks that the video data are not being read from 
memory at all (the lack of VSYNC is reported, no IOMMU fault). However, 
from time to time, everything initializes and works properly.

It looks that this is somehow related to the different IOMMU/DMA-mapping 
glue code, as the other boards (ARM64 based) with exactly the same 
Exynos IOMMU driver always work fine. I've tried to figure out what 
actually happens, but so far I didn't get anything for sure. Disabling 
the call to dev->bus->dma_configure(dev) from iommu_init_device() seems 
to be fixing this, but this is almost equal to the revert of the 
$subject patch. I don't get why calling it in iommu_init_device() causes 
problems. It also doesn't look that this is anyhow related to the 
multi-instance stuff, as the same happens if I only leave a single 
exynos-sysmmu instance and its client (only 14450000.mixer device in the 
system).

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-21 12:15                   ` Marek Szyprowski
@ 2025-03-21 16:48                     ` Robin Murphy
  2025-04-01 20:34                       ` Marek Szyprowski
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-03-21 16:48 UTC (permalink / raw)
  To: Marek Szyprowski, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 21/03/2025 12:15 pm, Marek Szyprowski wrote:
> On 17.03.2025 19:22, Robin Murphy wrote:
>> On 17/03/2025 7:37 am, Marek Szyprowski wrote:
>>> On 13.03.2025 15:12, Robin Murphy wrote:
>>>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>>>> [...]
>>>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>>>> tests I
>>>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>>>> relevant kernel log:
>>>>>>>
>>>>>>> ...and the bug-flushing-out begins!
>>>>>>>
>>>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>>>> 00000000000003e8
>>>>>>>> Mem abort info:
>>>>>>>>        ESR = 0x0000000096000004
>>>>>>>>        EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>>>        SET = 0, FnV = 0
>>>>>>>>        EA = 0, S1PTW = 0
>>>>>>>>        FSC = 0x04: level 0 translation fault
>>>>>>>> Data abort info:
>>>>>>>>        ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>>>>        CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>>>>        GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>>>> Modules linked in:
>>>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>>>> ...
>>>>>>>> Call trace:
>>>>>>>>       devm_kmalloc+0x2c/0x114 (P)
>>>>>>>>       rk_iommu_of_xlate+0x30/0x90
>>>>>>>
>>>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>>>> can't
>>>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>>>> cleaned up
>>>>>>> now that we have proper instances, but for now does this work?
>>>>>>
>>>>>> Yes, this patch fixes the problem I've observed.
>>>>>>
>>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>
>>>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
>>>>>> and
>>>>>> I doubt it can be cleaned up.
>>>>>
>>>>> On the contrary I suspect they both can - it all dates back to when
>>>>> we had the single global platform bus iommu_ops and the SoC drivers
>>>>> were forced to bodge their own notion of multiple instances, but with
>>>>> the modern core code, ops are always called via a valid IOMMU
>>>>> instance or domain, so in principle it should always be possible to
>>>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>>>> private notion of instances didn't have enough information, but
>>>>> domain_alloc_paging solves that.
>>>>
>>>> Bah, in fact I think I am going to have to do that now, since although
>>>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>>>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>>>
>>>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
>>>
>>> Well I just found that Exynos IOMMU is also broken on some on my test
>>> boards. It looks that the runtime pm links are somehow not correctly
>>> established. I will try to analyze this later in the afternoon.
>>
>> Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable
>> to boot my original 6.14-rc3-based branch - even with the IOMMU driver
>> disabled, it's consistently dying somewhere near (or just after) init
>> with what looks like some catastrophic memory corruption issue - very
>> occasionally it's managed to print the first line of various different
>> panics.
>>
>> Before that point though, with the IOMMU driver enabled it does appear
>> to show signs of working OK:
>>
>> [    0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
>> [    0.654220] platform 14450000.mixer: Adding to iommu group 1
>> ...
>> [    2.680920] exynos-mixer 14450000.mixer:
>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000
>> ...
>> [    5.196674] exynos-mixer 14450000.mixer:
>> exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable
>> 0x42924000
>> [    5.207091] exynos-mixer 14450000.mixer:
>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000
>>
>>
>> The multi-instance stuff in probe/release does look a bit suspect,
>> however - seems like the second instance probe would overwrite the
>> first instance's links, and then there would be a double-del() if the
>> device were ever actually released again? I may have made that much
>> more likely to happen, but I suspect it was already possible with
>> async driver probe...
> 
> That is really strange. My Odroid XU3 boots fine from commit
> bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"),
> although the IOMMU seems not to be working correctly. I've tested this
> with 14450000.mixer device (one need to attach HDMI cable to get it
> activated) and it looks that the video data are not being read from
> memory at all (the lack of VSYNC is reported, no IOMMU fault). However,
> from time to time, everything initializes and works properly.

Urgh, seems my mistake was assuming exynos_defconfig was the right thing 
to begin from - bcb81ac6ae3c with that still dies in the same way (this 
time I saw a hint of spin_bug() being hit...), however a 
multi_v7_defconfig build does get to userspace OK again with no obvious 
signs of distress:

[root@alarm ~]# grep -Hr . /sys/kernel/iommu_groups/*/type
/sys/kernel/iommu_groups/0/type:identity
/sys/kernel/iommu_groups/1/type:identity
/sys/kernel/iommu_groups/10/type:identity
/sys/kernel/iommu_groups/2/type:identity
/sys/kernel/iommu_groups/3/type:identity
/sys/kernel/iommu_groups/4/type:identity
/sys/kernel/iommu_groups/5/type:identity
/sys/kernel/iommu_groups/6/type:identity
/sys/kernel/iommu_groups/7/type:identity
/sys/kernel/iommu_groups/8/type:identity
/sys/kernel/iommu_groups/9/type:identity

Annoyingly I do have an adapter for the fiddly micro-HDMI, but it's at 
home :(

> It looks that this is somehow related to the different IOMMU/DMA-mapping
> glue code, as the other boards (ARM64 based) with exactly the same
> Exynos IOMMU driver always work fine. I've tried to figure out what
> actually happens, but so far I didn't get anything for sure. Disabling
> the call to dev->bus->dma_configure(dev) from iommu_init_device() seems
> to be fixing this, but this is almost equal to the revert of the
> $subject patch. I don't get why calling it in iommu_init_device() causes
> problems. It also doesn't look that this is anyhow related to the
> multi-instance stuff, as the same happens if I only leave a single
> exynos-sysmmu instance and its client (only 14450000.mixer device in the
> system).

On a hunch I stuck a print in exynos_iommu_probe_device(), and it looks 
like in fact device_link_add() isn't getting called at all, and indeed 
your symptoms do sound like they could be explained by the IOMMU not 
being reliably resumed... lemme stare at exynos_iommu_of_xlate() a bit 
longer...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-18 17:24     ` Robin Murphy
@ 2025-03-25 15:32       ` Geert Uytterhoeven
  0 siblings, 0 replies; 44+ messages in thread
From: Geert Uytterhoeven @ 2025-03-25 15:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla, Linux-Renesas

Hi Robin,

On Tue, 18 Mar 2025 at 18:24, Robin Murphy <robin.murphy@arm.com> wrote:
> On 18/03/2025 4:37 pm, Geert Uytterhoeven wrote:
> [...]
> > Thanks for your patch, which is now commit bcb81ac6ae3c2ef9 ("iommu:
> > Get DT/ACPI parsing into the proper probe path") in iommu/next.
> >
> > This patch triggers two issues on R-Car Gen3 platforms:
> >
> > 1. I am seeing a warning on Renesas Salvator-XS with R-Car M3N
> > (but not on the similar board with R-Car H3), and only for SATA[1].
> > Unfortunately commit 73d2f10957f517e5 ("iommu: Don't warn prematurely
> > about dodgy probes") does not help:
> [...]
> >      Call trace:
> >       __iommu_probe_device+0x208/0x38c (P)
> >       iommu_probe_device+0x34/0x74
> >       of_iommu_configure+0x128/0x200
> >       of_dma_configure_id+0xdc/0x1d4
> >       platform_dma_configure+0x48/0x6c
> >       really_probe+0xf0/0x260
> >       __driver_probe_device+0xec/0x104
> >       driver_probe_device+0x3c/0xc0
>
> Hurrah, this is the warning doing the correct job - something *is* off
> if we're now getting here without the IOMMU configuration being done
> already (for a normal device with no other funny business going on).
>
> > 2. The IOMMU driver's iommu_ops.of_xlate() callback is called about
> > three times as much as before:
>
> That would suggest that the fwspec gets set up OK, then something later
> in the __iommu_probe_device() path fails and tears it down again, so the
> next attempt starts from scratch. Do you see the "Cannot attach to
> IPMMU" message firing?

I do not see such messages.

> And similarly to the Rockchip case, does the
> below help?

The below is basically the same as your "[PATCH] iommu/ipmmu-vmsa:
Register in a sensible order"[1].  While that fixes my first issue,
it does not fix the second (harmless?) issue.

Note that I only noticed the second issue because I have local debug
code in soc_device_match().  Perhaps it happens, unnoticed, on other
systems too?

Thanks!

[1] https://lore.kernel.org/53be6667544de65a15415b699e38a9a965692e45.1742481687.git.robin.murphy@arm.com/

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
                     ` (4 preceding siblings ...)
  2025-03-18 16:37   ` Geert Uytterhoeven
@ 2025-03-27  9:47   ` Chen-Yu Tsai
  2025-03-27 11:00     ` Louis-Alexis Eyraud
  2025-04-11  8:02   ` Johan Hovold
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Chen-Yu Tsai @ 2025-03-27  9:47 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla, Matthias Brugger,
	AngeloGioacchino Del Regno, linux-mediatek

Hi,

On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
> 
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
> 
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
> 
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
> 
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

This change causes the MediaTek IOMMU driver to panic on probe,
resulting in multiple MediaTek platforms not being able to boot.
This was observed on Linus's tree at commit 1a9239bb4253
("Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
which just received the IOMMU updates a couple merge commits
prior.

The regression was bisected down to this patch on MT8183 Juniper
Chromebook. The error is a NULL pointer dereference. Here's the
decoded backtrace:

    Unable to handle kernel read from unreadable memory at virtual address 0000000000000000
    Mem abort info:
      ESR = 0x0000000096000005
    usb 1-1.1.2: new high-speed USB device number 6 using xhci-mtk
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x05: level 1 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    [0000000000000000] user address but active_mm is swapper
    Internal error: Oops: 0000000096000005 [#1]  SMP
    Modules linked in:
    CPU: 4 UID: 0 PID: 68 Comm: kworker/u34:1 Tainted: G    B              6.14.0-05877-g1a9239bb4253 #621 PREEMPT  a6631d3f04612a5c23866dc67cf38316c6b023e0
    Tainted: [B]=BAD_PAGE
    Hardware name: Google juniper sku16 board (DT)
    Workqueue: events_unbound deferred_probe_work_func
    pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368 drivers/iommu/mtk_iommu.c:940)
    lr : mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368 drivers/iommu/mtk_iommu.c:940)
    sp : ffffffc0809674a0
    x29: ffffffc0809674a0 x28: ffffff80c1f0f078 x27: ffffff80c1f0f460
    x26: ffffff80c1f0f458 x25: ffffffdb1b738788 x24: ffffffc0809676d0
    x23: ffffff80c8fe4130 x22: ffffffffffffffed x21: 0000000000000000
    x20: ffffff80c1f0f010 x19: ffffff80c8fd7d00 x18: 0000000000000000
    x17: 000000040044ffff x16: 00500072b5593510 x15: 0000000000000000
    x14: ffffff80c09fbb80 x13: ffffffa5bf03f000 x12: ffffffbb6399cdf1
    x11: 1ffffffb6399cdf0 x10: ffffffbb6399cdf0 x9 : dfffffc000000000
    x8 : 000000449c663210 x7 : ffffffdb1cce6f87 x6 : 0000000000000001
    x5 : ffffffdb1cce6f80 x4 : ffffffbb6399cdf1 x3 : 0000000000000000
    x2 : 0000000000000020 x1 : ffffff80c22ed940 x0 : 0000000000000001
    Call trace:
    mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368 drivers/iommu/mtk_iommu.c:940) (P)
    __iommu_probe_device (drivers/iommu/iommu.c:461 drivers/iommu/iommu.c:563)
    probe_iommu_group (drivers/iommu/iommu.c:1722)
    bus_for_each_dev (drivers/base/bus.c:370)
    iommu_device_register (drivers/iommu/iommu.c:1875 drivers/iommu/iommu.c:276)
    mtk_iommu_probe (drivers/iommu/mtk_iommu.c:1380)
    platform_probe (drivers/base/platform.c:1404)
    really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
    __driver_probe_device (drivers/base/dd.c:800)
    driver_probe_device (drivers/base/dd.c:830)
    __device_attach_driver (drivers/base/dd.c:959)
    bus_for_each_drv (drivers/base/bus.c:462)
    __device_attach (drivers/base/dd.c:1032)
    device_initial_probe (drivers/base/dd.c:1080)
    bus_probe_device (drivers/base/bus.c:537)
    deferred_probe_work_func (drivers/base/dd.c:124)
    process_one_work (./arch/arm64/include/asm/jump_label.h:36 ./include/trace/events/workqueue.h:110 kernel/workqueue.c:3243)
    worker_thread (kernel/workqueue.c:3313 (discriminator 2) kernel/workqueue.c:3400 (discriminator 2))
    kthread (kernel/kthread.c:464)
    ret_from_fork (arch/arm64/kernel/entry.S:863)
    Code: 92800256 f940d2b5 aa1503e0 97e8dbd7 (f94002b5)
    All code
    ========
       0:	92800256 	mov	x22, #0xffffffffffffffed    	// #-19
       4:	f940d2b5 	ldr	x21, [x21, #416]
       8:	aa1503e0 	mov	x0, x21
       c:	97e8dbd7 	bl	0xffffffffffa36f68
      10:*	f94002b5 	ldr	x21, [x21]		<-- trapping instruction

    Code starting with the faulting instruction
    ===========================================
       0:	f94002b5 	ldr	x21, [x21]
    ---[ end trace 0000000000000000 ]---


ChenYu

> ---
> 
> v2:
>  - Comment bus driver changes for clarity
>  - Use dev->iommu as the now-robust replay condition
>  - Drop the device_iommu_mapped() checks in the firmware paths as they
>    weren't doing much - we can't replace probe_device_lock just yet...
>  
>  drivers/acpi/arm64/dma.c        |  5 +++++
>  drivers/acpi/scan.c             |  7 -------
>  drivers/amba/bus.c              |  3 ++-
>  drivers/base/platform.c         |  3 ++-
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
>  drivers/cdx/cdx.c               |  3 ++-
>  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
>  drivers/iommu/of_iommu.c        |  7 ++++++-
>  drivers/of/device.c             |  7 ++++++-
>  drivers/pci/pci-driver.c        |  3 ++-
>  10 files changed, 48 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> index 52b2abf88689..f30f138352b7 100644
> --- a/drivers/acpi/arm64/dma.c
> +++ b/drivers/acpi/arm64/dma.c
> @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
>  	else
>  		end = (1ULL << 32) - 1;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		return;
> +	}
> +
>  	ret = acpi_dma_get_range(dev, &map);
>  	if (!ret && map) {
>  		end = dma_range_map_max(map);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>  		err = viot_iommu_configure(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	/*
> -	 * If we have reason to believe the IOMMU driver missed the initial
> -	 * iommu_probe_device() call for dev, replay it to get things in order.
> -	 */
> -	if (!err && dev->bus)
> -		err = iommu_probe_device(dev);
> -
>  	return err;
>  }
>  
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 8ef259b4d037..71482d639a6d 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
>  		ret = acpi_dma_configure(dev, attr);
>  	}
>  
> -	if (!ret && !drv->driver_managed_dma) {
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
>  		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>  		ret = acpi_dma_configure(dev, attr);
>  	}
> -	if (ret || drv->driver_managed_dma)
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (ret || !dev->driver || drv->driver_managed_dma)
>  		return ret;
>  
>  	ret = iommu_device_use_default_domain(dev);
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index d1f3d327ddd1..a8be8cf246fb 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
>  	else
>  		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
>  
> -	if (!ret && !mc_drv->driver_managed_dma) {
> +	/* @mc_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> index c573ed2ee71a..780fb0c4adba 100644
> --- a/drivers/cdx/cdx.c
> +++ b/drivers/cdx/cdx.c
> @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
>  		return ret;
>  	}
>  
> -	if (!ret && !cdx_drv->driver_managed_dma) {
> +	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a3b45b84f42b..1cec7074367a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
>  	if (!dev_iommu_get(dev))
>  		return -ENOMEM;
>  	/*
> -	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> -	 * instances with non-NULL fwnodes, and client devices should have been
> -	 * identified with a fwspec by this point. Otherwise, we can currently
> +	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
> +	 * is buried in the bus dma_configure path. Properly unpicking that is
> +	 * still a big job, so for now just invoke the whole thing. The device
> +	 * already having a driver bound means dma_configure has already run and
> +	 * either found no IOMMU to wait for, or we're in its replay call right
> +	 * now, so either way there's no point calling it again.
> +	 */
> +	if (!dev->driver && dev->bus->dma_configure) {
> +		mutex_unlock(&iommu_probe_device_lock);
> +		dev->bus->dma_configure(dev);
> +		mutex_lock(&iommu_probe_device_lock);
> +	}
> +	/*
> +	 * At this point, relevant devices either now have a fwspec which will
> +	 * match ops registered with a non-NULL fwnode, or we can reasonably
>  	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
>  	 * be present, and that any of their registered instances has suitable
>  	 * ops for probing, and thus cheekily co-opt the same mechanism.
> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>  		ret = -ENODEV;
>  		goto err_free;
>  	}
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>  
>  	if (!try_module_get(ops->owner)) {
>  		ret = -EINVAL;
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e10a68b5ffde..6b989a62def2 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>  		dev_iommu_free(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	if (!err && dev->bus)
> +	/*
> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> +	 * initial dev->iommu) then try to simulate it. This should no longer
> +	 * happen unless of_dma_configure() is being misused outside bus code.
> +	 */
> +	if (!err && dev->bus && !dev_iommu_present)
>  		err = iommu_probe_device(dev);
>  
>  	if (err && err != -EPROBE_DEFER)
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index edf3be197265..5053e5d532cc 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  	bool coherent, set_map = false;
>  	int ret;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		goto skip_map;
> +	}
> +
>  	if (np == dev->of_node)
>  		bus_np = __of_get_dma_parent(np);
>  	else
> @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  		end = dma_range_map_max(map);
>  		set_map = true;
>  	}
> -
> +skip_map:
>  	/*
>  	 * If @dev is expected to be DMA-capable then the bus code that created
>  	 * it should have initialised its dma_mask pointer by this point. For
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f57ea36d125d..4468b7327cab 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
>  
>  	pci_put_host_bridge_device(bridge);
>  
> -	if (!ret && !driver->driver_managed_dma) {
> +	/* @driver may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !driver->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> -- 
> 2.39.2.101.g768bb238c484.dirty
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-27  9:47   ` Chen-Yu Tsai
@ 2025-03-27 11:00     ` Louis-Alexis Eyraud
  0 siblings, 0 replies; 44+ messages in thread
From: Louis-Alexis Eyraud @ 2025-03-27 11:00 UTC (permalink / raw)
  To: Chen-Yu Tsai, Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla, Matthias Brugger,
	AngeloGioacchino Del Regno, linux-mediatek

Hi,

On Thu, 2025-03-27 at 17:47 +0800, Chen-Yu Tsai wrote:
> Hi,
> 
> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> > In hindsight, there were some crucial subtleties overlooked when
> > moving
> > {of,acpi}_dma_configure() to driver probe time to allow waiting for
> > IOMMU drivers with -EPROBE_DEFER, and these have become an
> > ever-increasing source of problems. The IOMMU API has some
> > fundamental
> > assumptions that iommu_probe_device() is called for every device
> > added
> > to the system, in the order in which they are added. Calling it in
> > a
> > random order or not at all dependent on driver binding leads to
> > malformed groups, a potential lack of isolation for devices with no
> > driver, and all manner of unexpected concurrency and race
> > conditions.
> > We've attempted to mitigate the latter with point-fix bodges like
> > iommu_probe_device_lock, but it's a losing battle and the time has
> > come
> > to bite the bullet and address the true source of the problem
> > instead.
> > 
> > The crux of the matter is that the firmware parsing actually serves
> > two
> > distinct purposes; one is identifying the IOMMU instance associated
> > with
> > a device so we can check its availability, the second is actually
> > telling that instance about the relevant firmware-provided data for
> > the
> > device. However the latter also depends on the former, and at the
> > time
> > there was no good place to defer and retry that separately from the
> > availability check we also wanted for client driver probe.
> > 
> > Nowadays, though, we have a proper notion of multiple IOMMU
> > instances in
> > the core API itself, and each one gets a chance to probe its own
> > devices
> > upon registration, so we can finally make that work as intended for
> > DT/IORT/VIOT platforms too. All we need is for iommu_probe_device()
> > to
> > be able to run the iommu_fwspec machinery currently buried deep in
> > the
> > wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> > surprisingly straightforward to bootstrap this transformation by
> > pretty
> > much just calling the same path twice. At client driver probe time,
> > dev->driver is obviously set; conversely at device_add(), or a
> > subsequent bus_iommu_probe(), any device waiting for an IOMMU
> > really
> > should *not* have a driver already, so we can use that as a
> > condition to
> > disambiguate the two cases, and avoid recursing back into the IOMMU
> > core
> > at the wrong times.
> > 
> > Obviously this isn't the nicest thing, but for now it gives us a
> > functional baseline to then unpick the layers in between without
> > many
> > more awkward cross-subsystem patches. There are some minor side-
> > effects
> > like dma_range_map potentially being created earlier, and some
> > debug
> > prints being repeated, but these aren't significantly detrimental.
> > Let's
> > make things work first, then deal with making them nice.
> > 
> > With the basic flow finally in the right order again, the next step
> > is
> > probably turning the bus->dma_configure paths inside-out, since all
> > we
> > really need from bus code is its notion of which device and input
> > ID(s)
> > to parse the common firmware properties with...
> > 
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> > Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> This change causes the MediaTek IOMMU driver to panic on probe,
> resulting in multiple MediaTek platforms not being able to boot.
> This was observed on Linus's tree at commit 1a9239bb4253
> ("Merge tag 'net-next-6.15' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
> which just received the IOMMU updates a couple merge commits
> prior.
> 
> The regression was bisected down to this patch on MT8183 Juniper
> Chromebook. The error is a NULL pointer dereference. Here's the
> decoded backtrace:
> 
>     Unable to handle kernel read from unreadable memory at virtual
> address 0000000000000000
>     Mem abort info:
>       ESR = 0x0000000096000005
>     usb 1-1.1.2: new high-speed USB device number 6 using xhci-mtk
>       EC = 0x25: DABT (current EL), IL = 32 bits
>       SET = 0, FnV = 0
>       EA = 0, S1PTW = 0
>       FSC = 0x05: level 1 translation fault
>     Data abort info:
>       ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>       CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>       GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>     [0000000000000000] user address but active_mm is swapper
>     Internal error: Oops: 0000000096000005 [#1]  SMP
>     Modules linked in:
>     CPU: 4 UID: 0 PID: 68 Comm: kworker/u34:1 Tainted: G   
> B              6.14.0-05877-g1a9239bb4253 #621 PREEMPT 
> a6631d3f04612a5c23866dc67cf38316c6b023e0
>     Tainted: [B]=BAD_PAGE
>     Hardware name: Google juniper sku16 board (DT)
>     Workqueue: events_unbound deferred_probe_work_func
>     pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>     pc : mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368
> drivers/iommu/mtk_iommu.c:940)
>     lr : mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368
> drivers/iommu/mtk_iommu.c:940)
>     sp : ffffffc0809674a0
>     x29: ffffffc0809674a0 x28: ffffff80c1f0f078 x27: ffffff80c1f0f460
>     x26: ffffff80c1f0f458 x25: ffffffdb1b738788 x24: ffffffc0809676d0
>     x23: ffffff80c8fe4130 x22: ffffffffffffffed x21: 0000000000000000
>     x20: ffffff80c1f0f010 x19: ffffff80c8fd7d00 x18: 0000000000000000
>     x17: 000000040044ffff x16: 00500072b5593510 x15: 0000000000000000
>     x14: ffffff80c09fbb80 x13: ffffffa5bf03f000 x12: ffffffbb6399cdf1
>     x11: 1ffffffb6399cdf0 x10: ffffffbb6399cdf0 x9 : dfffffc000000000
>     x8 : 000000449c663210 x7 : ffffffdb1cce6f87 x6 : 0000000000000001
>     x5 : ffffffdb1cce6f80 x4 : ffffffbb6399cdf1 x3 : 0000000000000000
>     x2 : 0000000000000020 x1 : ffffff80c22ed940 x0 : 0000000000000001
>     Call trace:
>     mtk_iommu_device_group (drivers/iommu/mtk_iommu.c:368
> drivers/iommu/mtk_iommu.c:940) (P)
>     __iommu_probe_device (drivers/iommu/iommu.c:461
> drivers/iommu/iommu.c:563)
>     probe_iommu_group (drivers/iommu/iommu.c:1722)
>     bus_for_each_dev (drivers/base/bus.c:370)
>     iommu_device_register (drivers/iommu/iommu.c:1875
> drivers/iommu/iommu.c:276)
>     mtk_iommu_probe (drivers/iommu/mtk_iommu.c:1380)
>     platform_probe (drivers/base/platform.c:1404)
>     really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>     __driver_probe_device (drivers/base/dd.c:800)
>     driver_probe_device (drivers/base/dd.c:830)
>     __device_attach_driver (drivers/base/dd.c:959)
>     bus_for_each_drv (drivers/base/bus.c:462)
>     __device_attach (drivers/base/dd.c:1032)
>     device_initial_probe (drivers/base/dd.c:1080)
>     bus_probe_device (drivers/base/bus.c:537)
>     deferred_probe_work_func (drivers/base/dd.c:124)
>     process_one_work (./arch/arm64/include/asm/jump_label.h:36
> ./include/trace/events/workqueue.h:110 kernel/workqueue.c:3243)
>     worker_thread (kernel/workqueue.c:3313 (discriminator 2)
> kernel/workqueue.c:3400 (discriminator 2))
>     kthread (kernel/kthread.c:464)
>     ret_from_fork (arch/arm64/kernel/entry.S:863)
>     Code: 92800256 f940d2b5 aa1503e0 97e8dbd7 (f94002b5)
>     All code
>     ========
>        0:	92800256 	mov	x22,
> #0xffffffffffffffed    	// #-19
>        4:	f940d2b5 	ldr	x21, [x21, #416]
>        8:	aa1503e0 	mov	x0, x21
>        c:	97e8dbd7 	bl	0xffffffffffa36f68
>       10:*	f94002b5 	ldr	x21, [x21]		<--
> trapping instruction
> 
>     Code starting with the faulting instruction
>     ===========================================
>        0:	f94002b5 	ldr	x21, [x21]
>     ---[ end trace 0000000000000000 ]---
> 
> 
> ChenYu
> 
I've sent a patch [1] to fix this issue that I've also observed on
my Mediatek Genio boards (510-EVK and 1200-EVK).

[1]:https://lore.kernel.org/linux-iommu/20250327-fix-mtk-iommu-error-v1-1-df969158e752@collabora.com/

Regards,
Louis-Alexis

> > ---
> > 
> > v2:
> >  - Comment bus driver changes for clarity
> >  - Use dev->iommu as the now-robust replay condition
> >  - Drop the device_iommu_mapped() checks in the firmware paths as
> > they
> >    weren't doing much - we can't replace probe_device_lock just
> > yet...
> >  
> >  drivers/acpi/arm64/dma.c        |  5 +++++
> >  drivers/acpi/scan.c             |  7 -------
> >  drivers/amba/bus.c              |  3 ++-
> >  drivers/base/platform.c         |  3 ++-
> >  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
> >  drivers/cdx/cdx.c               |  3 ++-
> >  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
> >  drivers/iommu/of_iommu.c        |  7 ++++++-
> >  drivers/of/device.c             |  7 ++++++-
> >  drivers/pci/pci-driver.c        |  3 ++-
> >  10 files changed, 48 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> > index 52b2abf88689..f30f138352b7 100644
> > --- a/drivers/acpi/arm64/dma.c
> > +++ b/drivers/acpi/arm64/dma.c
> > @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
> >  	else
> >  		end = (1ULL << 32) - 1;
> >  
> > +	if (dev->dma_range_map) {
> > +		dev_dbg(dev, "dma_range_map already set\n");
> > +		return;
> > +	}
> > +
> >  	ret = acpi_dma_get_range(dev, &map);
> >  	if (!ret && map) {
> >  		end = dma_range_map_max(map);
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct
> > device *dev, const u32 *id_in)
> >  		err = viot_iommu_configure(dev);
> >  	mutex_unlock(&iommu_probe_device_lock);
> >  
> > -	/*
> > -	 * If we have reason to believe the IOMMU driver missed
> > the initial
> > -	 * iommu_probe_device() call for dev, replay it to get
> > things in order.
> > -	 */
> > -	if (!err && dev->bus)
> > -		err = iommu_probe_device(dev);
> > -
> >  	return err;
> >  }
> >  
> > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> > index 8ef259b4d037..71482d639a6d 100644
> > --- a/drivers/amba/bus.c
> > +++ b/drivers/amba/bus.c
> > @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device
> > *dev)
> >  		ret = acpi_dma_configure(dev, attr);
> >  	}
> >  
> > -	if (!ret && !drv->driver_managed_dma) {
> > +	/* @drv may not be valid when we're called from the IOMMU
> > layer */
> > +	if (!ret && dev->driver && !drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 6f2a33722c52..1813cfd0c4bd 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct
> > device *dev)
> >  		attr =
> > acpi_get_dma_attr(to_acpi_device_node(fwnode));
> >  		ret = acpi_dma_configure(dev, attr);
> >  	}
> > -	if (ret || drv->driver_managed_dma)
> > +	/* @drv may not be valid when we're called from the IOMMU
> > layer */
> > +	if (ret || !dev->driver || drv->driver_managed_dma)
> >  		return ret;
> >  
> >  	ret = iommu_device_use_default_domain(dev);
> > diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-
> > mc/fsl-mc-bus.c
> > index d1f3d327ddd1..a8be8cf246fb 100644
> > --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> > +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> > @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device
> > *dev)
> >  	else
> >  		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT,
> > &input_id);
> >  
> > -	if (!ret && !mc_drv->driver_managed_dma) {
> > +	/* @mc_drv may not be valid when we're called from the
> > IOMMU layer */
> > +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> > index c573ed2ee71a..780fb0c4adba 100644
> > --- a/drivers/cdx/cdx.c
> > +++ b/drivers/cdx/cdx.c
> > @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device
> > *dev)
> >  		return ret;
> >  	}
> >  
> > -	if (!ret && !cdx_drv->driver_managed_dma) {
> > +	/* @cdx_drv may not be valid when we're called from the
> > IOMMU layer */
> > +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index a3b45b84f42b..1cec7074367a 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -414,9 +414,21 @@ static int iommu_init_device(struct device
> > *dev)
> >  	if (!dev_iommu_get(dev))
> >  		return -ENOMEM;
> >  	/*
> > -	 * For FDT-based systems and ACPI IORT/VIOT, drivers
> > register IOMMU
> > -	 * instances with non-NULL fwnodes, and client devices
> > should have been
> > -	 * identified with a fwspec by this point. Otherwise, we
> > can currently
> > +	 * For FDT-based systems and ACPI IORT/VIOT, the common
> > firmware parsing
> > +	 * is buried in the bus dma_configure path. Properly
> > unpicking that is
> > +	 * still a big job, so for now just invoke the whole
> > thing. The device
> > +	 * already having a driver bound means dma_configure has
> > already run and
> > +	 * either found no IOMMU to wait for, or we're in its
> > replay call right
> > +	 * now, so either way there's no point calling it again.
> > +	 */
> > +	if (!dev->driver && dev->bus->dma_configure) {
> > +		mutex_unlock(&iommu_probe_device_lock);
> > +		dev->bus->dma_configure(dev);
> > +		mutex_lock(&iommu_probe_device_lock);
> > +	}
> > +	/*
> > +	 * At this point, relevant devices either now have a
> > fwspec which will
> > +	 * match ops registered with a non-NULL fwnode, or we can
> > reasonably
> >  	 * assume that only one of Intel, AMD, s390, PAMU or
> > legacy SMMUv2 can
> >  	 * be present, and that any of their registered instances
> > has suitable
> >  	 * ops for probing, and thus cheekily co-opt the same
> > mechanism.
> > @@ -426,6 +438,12 @@ static int iommu_init_device(struct device
> > *dev)
> >  		ret = -ENODEV;
> >  		goto err_free;
> >  	}
> > +	/*
> > +	 * And if we do now see any replay calls, they would
> > indicate someone
> > +	 * misusing the dma_configure path outside bus code.
> > +	 */
> > +	if (dev->driver)
> > +		dev_WARN(dev, "late IOMMU probe at driver bind,
> > something fishy here!\n");
> >  
> >  	if (!try_module_get(ops->owner)) {
> >  		ret = -EINVAL;
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> > index e10a68b5ffde..6b989a62def2 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev,
> > struct device_node *master_np,
> >  		dev_iommu_free(dev);
> >  	mutex_unlock(&iommu_probe_device_lock);
> >  
> > -	if (!err && dev->bus)
> > +	/*
> > +	 * If we're not on the iommu_probe_device() path (as
> > indicated by the
> > +	 * initial dev->iommu) then try to simulate it. This
> > should no longer
> > +	 * happen unless of_dma_configure() is being misused
> > outside bus code.
> > +	 */
> > +	if (!err && dev->bus && !dev_iommu_present)
> >  		err = iommu_probe_device(dev);
> >  
> >  	if (err && err != -EPROBE_DEFER)
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index edf3be197265..5053e5d532cc 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev,
> > struct device_node *np,
> >  	bool coherent, set_map = false;
> >  	int ret;
> >  
> > +	if (dev->dma_range_map) {
> > +		dev_dbg(dev, "dma_range_map already set\n");
> > +		goto skip_map;
> > +	}
> > +
> >  	if (np == dev->of_node)
> >  		bus_np = __of_get_dma_parent(np);
> >  	else
> > @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev,
> > struct device_node *np,
> >  		end = dma_range_map_max(map);
> >  		set_map = true;
> >  	}
> > -
> > +skip_map:
> >  	/*
> >  	 * If @dev is expected to be DMA-capable then the bus code
> > that created
> >  	 * it should have initialised its dma_mask pointer by this
> > point. For
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index f57ea36d125d..4468b7327cab 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device
> > *dev)
> >  
> >  	pci_put_host_bridge_device(bridge);
> >  
> > -	if (!ret && !driver->driver_managed_dma) {
> > +	/* @driver may not be valid when we're called from the
> > IOMMU layer */
> > +	if (!ret && dev->driver && !driver->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > -- 
> > 2.39.2.101.g768bb238c484.dirty
> > 
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-03-21 16:48                     ` Robin Murphy
@ 2025-04-01 20:34                       ` Marek Szyprowski
  0 siblings, 0 replies; 44+ messages in thread
From: Marek Szyprowski @ 2025-04-01 20:34 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

On 21.03.2025 17:48, Robin Murphy wrote:
> On 21/03/2025 12:15 pm, Marek Szyprowski wrote:
>> On 17.03.2025 19:22, Robin Murphy wrote:
>>> On 17/03/2025 7:37 am, Marek Szyprowski wrote:
>>>> On 13.03.2025 15:12, Robin Murphy wrote:
>>>>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>>>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>>>>> [...]
>>>>>>>>> This patch landed in yesterday's linux-next as commit 
>>>>>>>>> bcb81ac6ae3c
>>>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>>>>> tests I
>>>>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>>>>> relevant kernel log:
>>>>>>>>
>>>>>>>> ...and the bug-flushing-out begins!
>>>>>>>>
>>>>>>>>> Unable to handle kernel NULL pointer dereference at virtual 
>>>>>>>>> address
>>>>>>>>> 00000000000003e8
>>>>>>>>> Mem abort info:
>>>>>>>>>        ESR = 0x0000000096000004
>>>>>>>>>        EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>>>>        SET = 0, FnV = 0
>>>>>>>>>        EA = 0, S1PTW = 0
>>>>>>>>>        FSC = 0x04: level 0 translation fault
>>>>>>>>> Data abort info:
>>>>>>>>>        ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>>>>>        CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>>>>>        GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>>>>> Modules linked in:
>>>>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ 
>>>>>>>>> #15533
>>>>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>>>>> ...
>>>>>>>>> Call trace:
>>>>>>>>>       devm_kmalloc+0x2c/0x114 (P)
>>>>>>>>>       rk_iommu_of_xlate+0x30/0x90
>>>>>>>>
>>>>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>>>>> can't
>>>>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>>>>> cleaned up
>>>>>>>> now that we have proper instances, but for now does this work?
>>>>>>>
>>>>>>> Yes, this patch fixes the problem I've observed.
>>>>>>>
>>>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>>
>>>>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu 
>>>>>>> driver
>>>>>>> and
>>>>>>> I doubt it can be cleaned up.
>>>>>>
>>>>>> On the contrary I suspect they both can - it all dates back to when
>>>>>> we had the single global platform bus iommu_ops and the SoC drivers
>>>>>> were forced to bodge their own notion of multiple instances, but 
>>>>>> with
>>>>>> the modern core code, ops are always called via a valid IOMMU
>>>>>> instance or domain, so in principle it should always be possible to
>>>>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>>>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>>>>> private notion of instances didn't have enough information, but
>>>>>> domain_alloc_paging solves that.
>>>>>
>>>>> Bah, in fact I think I am going to have to do that now, since 
>>>>> although
>>>>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>>>>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>>>>
>>>>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
>>>>
>>>> Well I just found that Exynos IOMMU is also broken on some on my test
>>>> boards. It looks that the runtime pm links are somehow not correctly
>>>> established. I will try to analyze this later in the afternoon.
>>>
>>> Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable
>>> to boot my original 6.14-rc3-based branch - even with the IOMMU driver
>>> disabled, it's consistently dying somewhere near (or just after) init
>>> with what looks like some catastrophic memory corruption issue - very
>>> occasionally it's managed to print the first line of various different
>>> panics.
>>>
>>> Before that point though, with the IOMMU driver enabled it does appear
>>> to show signs of working OK:
>>>
>>> [    0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
>>> [    0.654220] platform 14450000.mixer: Adding to iommu group 1
>>> ...
>>> [    2.680920] exynos-mixer 14450000.mixer:
>>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000
>>> ...
>>> [    5.196674] exynos-mixer 14450000.mixer:
>>> exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable
>>> 0x42924000
>>> [    5.207091] exynos-mixer 14450000.mixer:
>>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000
>>>
>>>
>>> The multi-instance stuff in probe/release does look a bit suspect,
>>> however - seems like the second instance probe would overwrite the
>>> first instance's links, and then there would be a double-del() if the
>>> device were ever actually released again? I may have made that much
>>> more likely to happen, but I suspect it was already possible with
>>> async driver probe...
>>
>> That is really strange. My Odroid XU3 boots fine from commit
>> bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"),
>> although the IOMMU seems not to be working correctly. I've tested this
>> with 14450000.mixer device (one need to attach HDMI cable to get it
>> activated) and it looks that the video data are not being read from
>> memory at all (the lack of VSYNC is reported, no IOMMU fault). However,
>> from time to time, everything initializes and works properly.
>
> Urgh, seems my mistake was assuming exynos_defconfig was the right 
> thing to begin from - bcb81ac6ae3c with that still dies in the same 
> way (this time I saw a hint of spin_bug() being hit...), however a 
> multi_v7_defconfig build does get to userspace OK again with no 
> obvious signs of distress:
>
> [root@alarm ~]# grep -Hr . /sys/kernel/iommu_groups/*/type
> /sys/kernel/iommu_groups/0/type:identity
> /sys/kernel/iommu_groups/1/type:identity
> /sys/kernel/iommu_groups/10/type:identity
> /sys/kernel/iommu_groups/2/type:identity
> /sys/kernel/iommu_groups/3/type:identity
> /sys/kernel/iommu_groups/4/type:identity
> /sys/kernel/iommu_groups/5/type:identity
> /sys/kernel/iommu_groups/6/type:identity
> /sys/kernel/iommu_groups/7/type:identity
> /sys/kernel/iommu_groups/8/type:identity
> /sys/kernel/iommu_groups/9/type:identity
>
> Annoyingly I do have an adapter for the fiddly micro-HDMI, but it's at 
> home :(
>
>> It looks that this is somehow related to the different IOMMU/DMA-mapping
>> glue code, as the other boards (ARM64 based) with exactly the same
>> Exynos IOMMU driver always work fine. I've tried to figure out what
>> actually happens, but so far I didn't get anything for sure. Disabling
>> the call to dev->bus->dma_configure(dev) from iommu_init_device() seems
>> to be fixing this, but this is almost equal to the revert of the
>> $subject patch. I don't get why calling it in iommu_init_device() causes
>> problems. It also doesn't look that this is anyhow related to the
>> multi-instance stuff, as the same happens if I only leave a single
>> exynos-sysmmu instance and its client (only 14450000.mixer device in the
>> system).
>
> On a hunch I stuck a print in exynos_iommu_probe_device(), and it 
> looks like in fact device_link_add() isn't getting called at all, and 
> indeed your symptoms do sound like they could be explained by the 
> IOMMU not being reliably resumed... lemme stare at 
> exynos_iommu_of_xlate() a bit longer...

Just to let everyone know. The $subject change is okay. This is a bug in 
exynos-iommu driver, fixed by the following patch:

https://lore.kernel.org/all/20250401202731.2810474-1-m.szyprowski@samsung.com/


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
                     ` (5 preceding siblings ...)
  2025-03-27  9:47   ` Chen-Yu Tsai
@ 2025-04-11  8:02   ` Johan Hovold
  2025-04-14 15:37     ` Robin Murphy
  2025-04-21 21:19   ` William McVicker
  2025-08-11 16:44   ` Eric Auger
  8 siblings, 1 reply; 44+ messages in thread
From: Johan Hovold @ 2025-04-11  8:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

Hi Robin,

On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.

> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>  		ret = -ENODEV;
>  		goto err_free;
>  	}
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>  
>  	if (!try_module_get(ops->owner)) {
>  		ret = -EINVAL;
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e10a68b5ffde..6b989a62def2 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>  		dev_iommu_free(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	if (!err && dev->bus)
> +	/*
> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> +	 * initial dev->iommu) then try to simulate it. This should no longer
> +	 * happen unless of_dma_configure() is being misused outside bus code.
> +	 */

This assumption does not hold as there is nothing preventing iommu
driver probe from racing with a client driver probe.

> +	if (!err && dev->bus && !dev_iommu_present)
>  		err = iommu_probe_device(dev);
>  
>  	if (err && err != -EPROBE_DEFER)

I hit the (now moved) dev_WARN() on the ThinkPad T14s where the GPU SMMU
is probed late due to a clock dependency and can end up probing in
parallel with the GPU driver.

[    3.805282] arm-smmu 3da0000.iommu: probing hardware configuration...
[    3.806007] arm-smmu 3da0000.iommu: SMMUv2 with:
[    3.806843] arm-smmu 3da0000.iommu:  stage 1 translation
[    3.807562] arm-smmu 3da0000.iommu:  coherent table walk
[    3.808253] arm-smmu 3da0000.iommu:  stream matching with 24 register groups
[    3.808957] arm-smmu 3da0000.iommu:  22 context banks (0 stage-2 only)
[    3.809651] arm-smmu 3da0000.iommu:  Supported page sizes: 0x61311000
[    3.810339] arm-smmu 3da0000.iommu:  Stage-1: 48-bit VA -> 40-bit IPA
[    3.811130] arm-smmu 3da0000.iommu:  preserved 0 boot mappings

[    3.829042] platform 3d6a000.gmu: Adding to iommu group 8

[    3.992050] ------------[ cut here ]------------
[    3.993045] adreno 3d00000.gpu: late IOMMU probe at driver bind, something fishy here!
[    3.994058] WARNING: CPU: 9 PID: 343 at drivers/iommu/iommu.c:579 __iommu_probe_device+0x2b0/0x4ac

[    4.003272] CPU: 9 UID: 0 PID: 343 Comm: kworker/u50:2 Not tainted 6.15.0-rc1 #109 PREEMPT 
[    4.003276] Hardware name: LENOVO 21N2ZC5PUS/21N2ZC5PUS, BIOS N42ET83W (2.13 ) 10/04/2024

[    4.025943] Call trace:
[    4.025945]  __iommu_probe_device+0x2b0/0x4ac (P)
[    4.030453]  iommu_probe_device+0x38/0x7c
[    4.030455]  of_iommu_configure+0x188/0x26c
[    4.030457]  of_dma_configure_id+0xcc/0x300
[    4.030460]  platform_dma_configure+0x74/0xac
[    4.030462]  really_probe+0x74/0x38c
[    4.030464]  __driver_probe_device+0x7c/0x160
[    4.030465]  driver_probe_device+0x40/0x110
[    4.030467]  __device_attach_driver+0xbc/0x158
[    4.030468]  bus_for_each_drv+0x84/0xe0
[    4.030470]  __device_attach+0xa8/0x1d4
[    4.030472]  device_initial_probe+0x14/0x20
[    4.030473]  bus_probe_device+0xb0/0xb4
[    4.030476]  deferred_probe_work_func+0xa0/0xf4

[    4.030501] ---[ end trace 0000000000000000 ]---
[    4.031269] adreno 3d00000.gpu: Adding to iommu group 9

Johan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-11  8:02   ` Johan Hovold
@ 2025-04-14 15:37     ` Robin Murphy
  2025-04-15 15:08       ` Johan Hovold
  0 siblings, 1 reply; 44+ messages in thread
From: Robin Murphy @ 2025-04-14 15:37 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On 2025-04-11 9:02 am, Johan Hovold wrote:
> Hi Robin,
> 
> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
>> In hindsight, there were some crucial subtleties overlooked when moving
>> {of,acpi}_dma_configure() to driver probe time to allow waiting for
>> IOMMU drivers with -EPROBE_DEFER, and these have become an
>> ever-increasing source of problems. The IOMMU API has some fundamental
>> assumptions that iommu_probe_device() is called for every device added
>> to the system, in the order in which they are added. Calling it in a
>> random order or not at all dependent on driver binding leads to
>> malformed groups, a potential lack of isolation for devices with no
>> driver, and all manner of unexpected concurrency and race conditions.
>> We've attempted to mitigate the latter with point-fix bodges like
>> iommu_probe_device_lock, but it's a losing battle and the time has come
>> to bite the bullet and address the true source of the problem instead.
> 
>> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>>   		ret = -ENODEV;
>>   		goto err_free;
>>   	}
>> +	/*
>> +	 * And if we do now see any replay calls, they would indicate someone
>> +	 * misusing the dma_configure path outside bus code.
>> +	 */
>> +	if (dev->driver)
>> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>>   
>>   	if (!try_module_get(ops->owner)) {
>>   		ret = -EINVAL;
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index e10a68b5ffde..6b989a62def2 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>>   		dev_iommu_free(dev);
>>   	mutex_unlock(&iommu_probe_device_lock);
>>   
>> -	if (!err && dev->bus)
>> +	/*
>> +	 * If we're not on the iommu_probe_device() path (as indicated by the
>> +	 * initial dev->iommu) then try to simulate it. This should no longer
>> +	 * happen unless of_dma_configure() is being misused outside bus code.
>> +	 */
> 
> This assumption does not hold as there is nothing preventing iommu
> driver probe from racing with a client driver probe.

Not sure I follow - *this* assumption is that if we arrived here with 
dev->iommu already allocated then __iommu_probe_device() is already in 
progress for this device, either in the current callchain or on another 
thread, and so we can (and should) skip calling into it again. There's 
no ambiguity about that.

>> +	if (!err && dev->bus && !dev_iommu_present)
>>   		err = iommu_probe_device(dev);
>>   
>>   	if (err && err != -EPROBE_DEFER)
> 
> I hit the (now moved) dev_WARN() on the ThinkPad T14s where the GPU SMMU
> is probed late due to a clock dependency and can end up probing in
> parallel with the GPU driver.

And what *should* happen is that the GPU driver probe waits for the 
IOMMU driver probe to finish. Do you have fw_devlink enabled?

> [    3.805282] arm-smmu 3da0000.iommu: probing hardware configuration...
> [    3.806007] arm-smmu 3da0000.iommu: SMMUv2 with:
> [    3.806843] arm-smmu 3da0000.iommu:  stage 1 translation
> [    3.807562] arm-smmu 3da0000.iommu:  coherent table walk
> [    3.808253] arm-smmu 3da0000.iommu:  stream matching with 24 register groups
> [    3.808957] arm-smmu 3da0000.iommu:  22 context banks (0 stage-2 only)
> [    3.809651] arm-smmu 3da0000.iommu:  Supported page sizes: 0x61311000
> [    3.810339] arm-smmu 3da0000.iommu:  Stage-1: 48-bit VA -> 40-bit IPA
> [    3.811130] arm-smmu 3da0000.iommu:  preserved 0 boot mappings
> 
> [    3.829042] platform 3d6a000.gmu: Adding to iommu group 8
> 
> [    3.992050] ------------[ cut here ]------------
> [    3.993045] adreno 3d00000.gpu: late IOMMU probe at driver bind, something fishy here!
> [    3.994058] WARNING: CPU: 9 PID: 343 at drivers/iommu/iommu.c:579 __iommu_probe_device+0x2b0/0x4ac
> 
> [    4.003272] CPU: 9 UID: 0 PID: 343 Comm: kworker/u50:2 Not tainted 6.15.0-rc1 #109 PREEMPT
> [    4.003276] Hardware name: LENOVO 21N2ZC5PUS/21N2ZC5PUS, BIOS N42ET83W (2.13 ) 10/04/2024
> 
> [    4.025943] Call trace:
> [    4.025945]  __iommu_probe_device+0x2b0/0x4ac (P)
> [    4.030453]  iommu_probe_device+0x38/0x7c
> [    4.030455]  of_iommu_configure+0x188/0x26c
> [    4.030457]  of_dma_configure_id+0xcc/0x300
> [    4.030460]  platform_dma_configure+0x74/0xac
> [    4.030462]  really_probe+0x74/0x38c

Indeed this is exactly what is *not* supposed to be happening - does 
this patch help at all?

https://lore.kernel.org/linux-iommu/09d901ad11b3a410fbb6e27f7d04ad4609c3fe4a.1741706365.git.robin.murphy@arm.com/

If not then I guess I do need to do something to explicitly distinguish 
the "iommu_device_register() is still running" state after all...

Thanks,
Robin.

> [    4.030464]  __driver_probe_device+0x7c/0x160
> [    4.030465]  driver_probe_device+0x40/0x110
> [    4.030467]  __device_attach_driver+0xbc/0x158
> [    4.030468]  bus_for_each_drv+0x84/0xe0
> [    4.030470]  __device_attach+0xa8/0x1d4
> [    4.030472]  device_initial_probe+0x14/0x20
> [    4.030473]  bus_probe_device+0xb0/0xb4
> [    4.030476]  deferred_probe_work_func+0xa0/0xf4
> 
> [    4.030501] ---[ end trace 0000000000000000 ]---
> [    4.031269] adreno 3d00000.gpu: Adding to iommu group 9
> 
> Johan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-14 15:37     ` Robin Murphy
@ 2025-04-15 15:08       ` Johan Hovold
  2025-04-24 13:58         ` Robin Murphy
  0 siblings, 1 reply; 44+ messages in thread
From: Johan Hovold @ 2025-04-15 15:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On Mon, Apr 14, 2025 at 04:37:59PM +0100, Robin Murphy wrote:
> On 2025-04-11 9:02 am, Johan Hovold wrote:
> > On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:

> >> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
> >>   		dev_iommu_free(dev);
> >>   	mutex_unlock(&iommu_probe_device_lock);
> >>   
> >> -	if (!err && dev->bus)
> >> +	/*
> >> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> >> +	 * initial dev->iommu) then try to simulate it. This should no longer
> >> +	 * happen unless of_dma_configure() is being misused outside bus code.
> >> +	 */
> > 
> > This assumption does not hold as there is nothing preventing iommu
> > driver probe from racing with a client driver probe.
> 
> Not sure I follow - *this* assumption is that if we arrived here with 
> dev->iommu already allocated then __iommu_probe_device() is already in 
> progress for this device, either in the current callchain or on another 
> thread, and so we can (and should) skip calling into it again. There's 
> no ambiguity about that.

I was referring to the this "should no longer happen unless
of_dma_configure() is being misused outside bus code" claim, which
appears to be false given the splat below.

> >> +	if (!err && dev->bus && !dev_iommu_present)
> >>   		err = iommu_probe_device(dev);
> >>   
> >>   	if (err && err != -EPROBE_DEFER)
> > 
> > I hit the (now moved) dev_WARN() on the ThinkPad T14s where the GPU SMMU
> > is probed late due to a clock dependency and can end up probing in
> > parallel with the GPU driver.
> 
> And what *should* happen is that the GPU driver probe waits for the 
> IOMMU driver probe to finish. Do you have fw_devlink enabled?

Yes, but you shouldn't rely on devlinks for correctness.

That said it does seem like something is not working as you think it is
here, and indeed the iommu supplier link is not created until SMMUv2
probe_device() (see arm_smmu_probe_device()).

So client devices can start to be probed (bus dma_configure() is called)
before their iommu is ready also with devlinks enabled (and I do see
this happen on every boot).

> > [    3.805282] arm-smmu 3da0000.iommu: probing hardware configuration...

> > [    3.829042] platform 3d6a000.gmu: Adding to iommu group 8
> > 
> > [    3.992050] ------------[ cut here ]------------
> > [    3.993045] adreno 3d00000.gpu: late IOMMU probe at driver bind, something fishy here!
> > [    3.994058] WARNING: CPU: 9 PID: 343 at drivers/iommu/iommu.c:579 __iommu_probe_device+0x2b0/0x4ac
> > 
> > [    4.003272] CPU: 9 UID: 0 PID: 343 Comm: kworker/u50:2 Not tainted 6.15.0-rc1 #109 PREEMPT
> > [    4.003276] Hardware name: LENOVO 21N2ZC5PUS/21N2ZC5PUS, BIOS N42ET83W (2.13 ) 10/04/2024
> > 
> > [    4.025943] Call trace:
> > [    4.025945]  __iommu_probe_device+0x2b0/0x4ac (P)
> > [    4.030453]  iommu_probe_device+0x38/0x7c
> > [    4.030455]  of_iommu_configure+0x188/0x26c
> > [    4.030457]  of_dma_configure_id+0xcc/0x300
> > [    4.030460]  platform_dma_configure+0x74/0xac
> > [    4.030462]  really_probe+0x74/0x38c
> 
> Indeed this is exactly what is *not* supposed to be happening - does 
> this patch help at all?
> 
> https://lore.kernel.org/linux-iommu/09d901ad11b3a410fbb6e27f7d04ad4609c3fe4a.1741706365.git.robin.murphy@arm.com/

I've only seen that splat once so far so I don't have a reliable
reproducer.

But AFAICS that patch won't help help here where we appear to have iommu
probe racing with bus dma_configure() called from really_probe() for the
client device.

Johan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
                     ` (6 preceding siblings ...)
  2025-04-11  8:02   ` Johan Hovold
@ 2025-04-21 21:19   ` William McVicker
  2025-04-22 19:00     ` Jason Gunthorpe
  2025-08-11 16:44   ` Eric Auger
  8 siblings, 1 reply; 44+ messages in thread
From: William McVicker @ 2025-04-21 21:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

Hi Robin,

On 02/28/2025, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
> 
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
> 
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
> 
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
> 
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2:
>  - Comment bus driver changes for clarity
>  - Use dev->iommu as the now-robust replay condition
>  - Drop the device_iommu_mapped() checks in the firmware paths as they
>    weren't doing much - we can't replace probe_device_lock just yet...
>  
>  drivers/acpi/arm64/dma.c        |  5 +++++
>  drivers/acpi/scan.c             |  7 -------
>  drivers/amba/bus.c              |  3 ++-
>  drivers/base/platform.c         |  3 ++-
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
>  drivers/cdx/cdx.c               |  3 ++-
>  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
>  drivers/iommu/of_iommu.c        |  7 ++++++-
>  drivers/of/device.c             |  7 ++++++-
>  drivers/pci/pci-driver.c        |  3 ++-
>  10 files changed, 48 insertions(+), 17 deletions(-)
> 

[...]

> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
>  		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>  		ret = acpi_dma_configure(dev, attr);
>  	}
> -	if (ret || drv->driver_managed_dma)
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (ret || !dev->driver || drv->driver_managed_dma)
>  		return ret;
>  
>  	ret = iommu_device_use_default_domain(dev);

I wanted to report a regression here that was exposed by the new probing
behavior. On Pixel 6, we load our kernel modules in parallel which means
probing is done in parallel. This results in a race condition between the IOMMU
thread and the device probing thread. What I'm seeing is at the top of the
function `platform_dma_configure()` when we assign
`drv = to_platform_driver(dev->driver);`, `dev->driver` is NULL which results
in `drv = 0xf...ffd8`. In parallel, if the driver gets bound to the device
before we reach the above if-statement, then `dev->driver != NULL` and we will
de-reference `drv` --  resulting in a kernel panic.

To address this race condition and KP, we need to defer assigning `drv` until
after we check if the driver is bound. Here is what works for me:

----->8-----

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1813cfd0c4bd..6d124447545c 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1440,8 +1440,8 @@ static void platform_shutdown(struct device *_dev)
 
 static int platform_dma_configure(struct device *dev)
 {
-       struct platform_driver *drv = to_platform_driver(dev->driver);
        struct fwnode_handle *fwnode = dev_fwnode(dev);
+       struct platform_driver *drv;
        enum dev_dma_attr attr;
        int ret = 0;
 
@@ -1451,8 +1451,12 @@ static int platform_dma_configure(struct device *dev)
                attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
                ret = acpi_dma_configure(dev, attr);
        }
-       /* @drv may not be valid when we're called from the IOMMU layer */
-       if (ret || !dev->driver || drv->driver_managed_dma)
+       /* @dev->driver may not be valid when we're called from the IOMMU layer */
+       if (ret || !dev->driver)
+               return ret;
+
+       drv = to_platform_driver(dev->driver);
+       if (drv->driver_managed_dma)
                return ret;
 
        ret = iommu_device_use_default_domain(dev);
--

Please let me know what you think.

Thanks,
Will

[...]

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-21 21:19   ` William McVicker
@ 2025-04-22 19:00     ` Jason Gunthorpe
  2025-04-22 21:55       ` William McVicker
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2025-04-22 19:00 UTC (permalink / raw)
  To: William McVicker
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, linux-acpi, linux-arm-kernel,
	linux-kernel, iommu, devicetree, linux-pci, Charan Teja Kalla

On Mon, Apr 21, 2025 at 02:19:35PM -0700, William McVicker wrote:
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 1813cfd0c4bd..6d124447545c 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1440,8 +1440,8 @@ static void platform_shutdown(struct device *_dev)
>  
>  static int platform_dma_configure(struct device *dev)
>  {
> -       struct platform_driver *drv = to_platform_driver(dev->driver);
>         struct fwnode_handle *fwnode = dev_fwnode(dev);
> +       struct platform_driver *drv;
>         enum dev_dma_attr attr;
>         int ret = 0;
>  
> @@ -1451,8 +1451,12 @@ static int platform_dma_configure(struct device *dev)
>                 attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>                 ret = acpi_dma_configure(dev, attr);
>         }
> -       /* @drv may not be valid when we're called from the IOMMU layer */
> -       if (ret || !dev->driver || drv->driver_managed_dma)
> +       /* @dev->driver may not be valid when we're called from the IOMMU layer */
> +       if (ret || !dev->driver)
> +               return ret;
> +
> +       drv = to_platform_driver(dev->driver);
> +       if (drv->driver_managed_dma)
>                 return ret;
>  
>         ret = iommu_device_use_default_domain(dev);

The diagnosis looks right to me, but pedantically I think it should
have a READ_ONCE():

struct driver *drv = READ_ONCE(dev->driver);

And then never touch dev->driver again in the function.

Send a proper patch?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-22 19:00     ` Jason Gunthorpe
@ 2025-04-22 21:55       ` William McVicker
  2025-04-22 23:41         ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: William McVicker @ 2025-04-22 21:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, linux-acpi, linux-arm-kernel,
	linux-kernel, iommu, devicetree, linux-pci, Charan Teja Kalla

Hi Jason,

On 04/22/2025, Jason Gunthorpe wrote:
> On Mon, Apr 21, 2025 at 02:19:35PM -0700, William McVicker wrote:
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 1813cfd0c4bd..6d124447545c 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -1440,8 +1440,8 @@ static void platform_shutdown(struct device *_dev)
> >  
> >  static int platform_dma_configure(struct device *dev)
> >  {
> > -       struct platform_driver *drv = to_platform_driver(dev->driver);
> >         struct fwnode_handle *fwnode = dev_fwnode(dev);
> > +       struct platform_driver *drv;
> >         enum dev_dma_attr attr;
> >         int ret = 0;
> >  
> > @@ -1451,8 +1451,12 @@ static int platform_dma_configure(struct device *dev)
> >                 attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
> >                 ret = acpi_dma_configure(dev, attr);
> >         }
> > -       /* @drv may not be valid when we're called from the IOMMU layer */
> > -       if (ret || !dev->driver || drv->driver_managed_dma)
> > +       /* @dev->driver may not be valid when we're called from the IOMMU layer */
> > +       if (ret || !dev->driver)
> > +               return ret;
> > +
> > +       drv = to_platform_driver(dev->driver);
> > +       if (drv->driver_managed_dma)
> >                 return ret;
> >  
> >         ret = iommu_device_use_default_domain(dev);
> 
> The diagnosis looks right to me, but pedantically I think it should
> have a READ_ONCE():
> 
> struct driver *drv = READ_ONCE(dev->driver);
> 
> And then never touch dev->driver again in the function.
> 
> Send a proper patch?
> 
> Jason

Thanks for the response! Yes, that would work as well. I'll send a v2 revision
once I get it tested.

On this note, I was looking through `of_dma_configure_id()` and am also
wondering if we may hit other race conditions if the device is still being
probed and the dma properties (like the coherent dma mask) haven't been fully
populated? Just checking if the driver is bound, doesn't seem like enough to
start configuring the DMA when async probing can happen.

Thanks,
Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-22 21:55       ` William McVicker
@ 2025-04-22 23:41         ` Jason Gunthorpe
  2025-04-23 17:31           ` William McVicker
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2025-04-22 23:41 UTC (permalink / raw)
  To: William McVicker
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, linux-acpi, linux-arm-kernel,
	linux-kernel, iommu, devicetree, linux-pci, Charan Teja Kalla

On Tue, Apr 22, 2025 at 02:55:28PM -0700, William McVicker wrote:

> On this note, I was looking through `of_dma_configure_id()` and am also
> wondering if we may hit other race conditions if the device is still being
> probed and the dma properties (like the coherent dma mask) haven't been fully
> populated? Just checking if the driver is bound, doesn't seem like enough to
> start configuring the DMA when async probing can happen.

I think the reasoning at work here is that the plugin path for a
struct device should synchronously setup the iommu.

There is enough locking there that the iommu code won't allow the
device plugin to continue until the iommu is fully setup under the
global lock.

The trick of using dev->driver is only a way to tell if this function
is being called from the driver plugin path just before starting the
driver, or from the iommu code just before configuring the iommu.

Given that explanation can you see issues with of_dma_configure_id() ?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-22 23:41         ` Jason Gunthorpe
@ 2025-04-23 17:31           ` William McVicker
  2025-04-23 18:18             ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: William McVicker @ 2025-04-23 17:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, linux-acpi, linux-arm-kernel,
	linux-kernel, iommu, devicetree, linux-pci, Charan Teja Kalla

On 04/22/2025, Jason Gunthorpe wrote:
> On Tue, Apr 22, 2025 at 02:55:28PM -0700, William McVicker wrote:
> 
> > On this note, I was looking through `of_dma_configure_id()` and am also
> > wondering if we may hit other race conditions if the device is still being
> > probed and the dma properties (like the coherent dma mask) haven't been fully
> > populated? Just checking if the driver is bound, doesn't seem like enough to
> > start configuring the DMA when async probing can happen.
> 
> I think the reasoning at work here is that the plugin path for a
> struct device should synchronously setup the iommu.
> 
> There is enough locking there that the iommu code won't allow the
> device plugin to continue until the iommu is fully setup under the
> global lock.
> 
> The trick of using dev->driver is only a way to tell if this function
> is being called from the driver plugin path just before starting the
> driver, or from the iommu code just before configuring the iommu.
> 
> Given that explanation can you see issues with of_dma_configure_id() ?
> 
> Jason

I think the only concern is when a driver calls dma_set_mask_and_coherent() in
it's probe function. If we can handle that case in an asynchrounous manner,
then I think we are good.

Thanks,
Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-23 17:31           ` William McVicker
@ 2025-04-23 18:18             ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2025-04-23 18:18 UTC (permalink / raw)
  To: William McVicker
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, linux-acpi, linux-arm-kernel,
	linux-kernel, iommu, devicetree, linux-pci, Charan Teja Kalla

On Wed, Apr 23, 2025 at 10:31:16AM -0700, William McVicker wrote:
> On 04/22/2025, Jason Gunthorpe wrote:
> > On Tue, Apr 22, 2025 at 02:55:28PM -0700, William McVicker wrote:
> > 
> > > On this note, I was looking through `of_dma_configure_id()` and am also
> > > wondering if we may hit other race conditions if the device is still being
> > > probed and the dma properties (like the coherent dma mask) haven't been fully
> > > populated? Just checking if the driver is bound, doesn't seem like enough to
> > > start configuring the DMA when async probing can happen.
> > 
> > I think the reasoning at work here is that the plugin path for a
> > struct device should synchronously setup the iommu.
> > 
> > There is enough locking there that the iommu code won't allow the
> > device plugin to continue until the iommu is fully setup under the
> > global lock.
> > 
> > The trick of using dev->driver is only a way to tell if this function
> > is being called from the driver plugin path just before starting the
> > driver, or from the iommu code just before configuring the iommu.
> > 
> > Given that explanation can you see issues with of_dma_configure_id() ?
> > 
> > Jason
> 
> I think the only concern is when a driver calls dma_set_mask_and_coherent() in
> it's probe function. If we can handle that case in an asynchrounous manner,
> then I think we are good.

You should never get to a driver probe function while iommu setup is
still concurrently running. That would be a major bug and break alot
of stuff.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-04-15 15:08       ` Johan Hovold
@ 2025-04-24 13:58         ` Robin Murphy
  0 siblings, 0 replies; 44+ messages in thread
From: Robin Murphy @ 2025-04-24 13:58 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Rafael J. Wysocki,
	Len Brown, Russell King, Greg Kroah-Hartman, Danilo Krummrich,
	Stuart Yoder, Laurentiu Tudor, Nipun Gupta, Nikhil Agarwal,
	Joerg Roedel, Will Deacon, Rob Herring, Saravana Kannan,
	Bjorn Helgaas, linux-acpi, linux-arm-kernel, linux-kernel, iommu,
	devicetree, linux-pci, Charan Teja Kalla

On 15/04/2025 4:08 pm, Johan Hovold wrote:
> On Mon, Apr 14, 2025 at 04:37:59PM +0100, Robin Murphy wrote:
>> On 2025-04-11 9:02 am, Johan Hovold wrote:
>>> On Fri, Feb 28, 2025 at 03:46:33PM +0000, Robin Murphy wrote:
> 
>>>> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>>>>    		dev_iommu_free(dev);
>>>>    	mutex_unlock(&iommu_probe_device_lock);
>>>>    
>>>> -	if (!err && dev->bus)
>>>> +	/*
>>>> +	 * If we're not on the iommu_probe_device() path (as indicated by the
>>>> +	 * initial dev->iommu) then try to simulate it. This should no longer
>>>> +	 * happen unless of_dma_configure() is being misused outside bus code.
>>>> +	 */
>>>
>>> This assumption does not hold as there is nothing preventing iommu
>>> driver probe from racing with a client driver probe.
>>
>> Not sure I follow - *this* assumption is that if we arrived here with
>> dev->iommu already allocated then __iommu_probe_device() is already in
>> progress for this device, either in the current callchain or on another
>> thread, and so we can (and should) skip calling into it again. There's
>> no ambiguity about that.
> 
> I was referring to the this "should no longer happen unless
> of_dma_configure() is being misused outside bus code" claim, which
> appears to be false given the splat below.

That's not an assumption so much as a statement of intent. And really 
it's the other way round anyway, as a reminder that this replay call is 
only still here (unlike in the ACPI path) because there *is* still 
plenty of sketchy usage of of_dma_configure() which I'm wary of breaking.
>>>> +	if (!err && dev->bus && !dev_iommu_present)
>>>>    		err = iommu_probe_device(dev);
>>>>    
>>>>    	if (err && err != -EPROBE_DEFER)
>>>
>>> I hit the (now moved) dev_WARN() on the ThinkPad T14s where the GPU SMMU
>>> is probed late due to a clock dependency and can end up probing in
>>> parallel with the GPU driver.
>>
>> And what *should* happen is that the GPU driver probe waits for the
>> IOMMU driver probe to finish. Do you have fw_devlink enabled?
> 
> Yes, but you shouldn't rely on devlinks for correctness.
> 
> That said it does seem like something is not working as you think it is
> here, and indeed the iommu supplier link is not created until SMMUv2
> probe_device() (see arm_smmu_probe_device()).
> 
> So client devices can start to be probed (bus dma_configure() is called)
> before their iommu is ready also with devlinks enabled (and I do see
> this happen on every boot).

I didn't mean the explicit power management links created by the SMMU 
driver itself, I meant the fwnode links created automatically by 
fw_devlink_link_device() at device_add() time, which infer a 
supplier-consumer relationship from the "iommus" DT property, wherein 
device_links_check_suppliers() would then defer the GPU driver probe 
until the SMMU driver probe has completed successfully probing and 
called device_links_driver_bound().

Except it turns out that "iommus" is one of the optional properties 
which are only linked that way under "fw_devlink=strict", so that 
explains that, fair enough.
>>> [    3.805282] arm-smmu 3da0000.iommu: probing hardware configuration...
> 
>>> [    3.829042] platform 3d6a000.gmu: Adding to iommu group 8
>>>
>>> [    3.992050] ------------[ cut here ]------------
>>> [    3.993045] adreno 3d00000.gpu: late IOMMU probe at driver bind, something fishy here!
>>> [    3.994058] WARNING: CPU: 9 PID: 343 at drivers/iommu/iommu.c:579 __iommu_probe_device+0x2b0/0x4ac
>>>
>>> [    4.003272] CPU: 9 UID: 0 PID: 343 Comm: kworker/u50:2 Not tainted 6.15.0-rc1 #109 PREEMPT
>>> [    4.003276] Hardware name: LENOVO 21N2ZC5PUS/21N2ZC5PUS, BIOS N42ET83W (2.13 ) 10/04/2024
>>>
>>> [    4.025943] Call trace:
>>> [    4.025945]  __iommu_probe_device+0x2b0/0x4ac (P)
>>> [    4.030453]  iommu_probe_device+0x38/0x7c
>>> [    4.030455]  of_iommu_configure+0x188/0x26c
>>> [    4.030457]  of_dma_configure_id+0xcc/0x300
>>> [    4.030460]  platform_dma_configure+0x74/0xac
>>> [    4.030462]  really_probe+0x74/0x38c
>>
>> Indeed this is exactly what is *not* supposed to be happening - does
>> this patch help at all?
>>
>> https://lore.kernel.org/linux-iommu/09d901ad11b3a410fbb6e27f7d04ad4609c3fe4a.1741706365.git.robin.murphy@arm.com/
> 
> I've only seen that splat once so far so I don't have a reliable
> reproducer.
> 
> But AFAICS that patch won't help help here where we appear to have iommu
> probe racing with bus dma_configure() called from really_probe() for the
> client device.

Well, tightening up __iommu_probe_device() would stand to slightly 
reduce the window in general while bus_set_iommu() is running. However 
you're right that this is a different race from the ones implicated 
there. I have now managed to provoke it on my Juno board with 
"driver_async_probe=*" (which does also require that patch for other 
reasons), and I think I've got a reasonable fix which I shall finish 
writing up and send shortly. Thanks for helping me nail this one down!

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
                     ` (7 preceding siblings ...)
  2025-04-21 21:19   ` William McVicker
@ 2025-08-11 16:44   ` Eric Auger
  2025-08-11 17:01     ` Bjorn Helgaas
  8 siblings, 1 reply; 44+ messages in thread
From: Eric Auger @ 2025-08-11 16:44 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, Jerry Snitselaar,
	Jean-Philippe Brucker
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, iommu, devicetree,
	linux-pci, Charan Teja Kalla

Hi Robin,

On 2/28/25 4:46 PM, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
> 
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
> 
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
> 
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
> 
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

This patch seems to break virtio-iommu along with qemu. After applying
it we cannot see any iommu group. I don't have any specific warning in
dmesg though. Reverting bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into
the proper probe path") fixes the issue for me. Added Jerry and
jean-Philippe in the loop.

Thanks

Eric
> ---
> 
> v2:
>  - Comment bus driver changes for clarity
>  - Use dev->iommu as the now-robust replay condition
>  - Drop the device_iommu_mapped() checks in the firmware paths as they
>    weren't doing much - we can't replace probe_device_lock just yet...
>  
>  drivers/acpi/arm64/dma.c        |  5 +++++
>  drivers/acpi/scan.c             |  7 -------
>  drivers/amba/bus.c              |  3 ++-
>  drivers/base/platform.c         |  3 ++-
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
>  drivers/cdx/cdx.c               |  3 ++-
>  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
>  drivers/iommu/of_iommu.c        |  7 ++++++-
>  drivers/of/device.c             |  7 ++++++-
>  drivers/pci/pci-driver.c        |  3 ++-
>  10 files changed, 48 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> index 52b2abf88689..f30f138352b7 100644
> --- a/drivers/acpi/arm64/dma.c
> +++ b/drivers/acpi/arm64/dma.c
> @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
>  	else
>  		end = (1ULL << 32) - 1;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		return;
> +	}
> +
>  	ret = acpi_dma_get_range(dev, &map);
>  	if (!ret && map) {
>  		end = dma_range_map_max(map);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>  		err = viot_iommu_configure(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	/*
> -	 * If we have reason to believe the IOMMU driver missed the initial
> -	 * iommu_probe_device() call for dev, replay it to get things in order.
> -	 */
> -	if (!err && dev->bus)
> -		err = iommu_probe_device(dev);
> -
>  	return err;
>  }
>  
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 8ef259b4d037..71482d639a6d 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
>  		ret = acpi_dma_configure(dev, attr);
>  	}
>  
> -	if (!ret && !drv->driver_managed_dma) {
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
>  		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>  		ret = acpi_dma_configure(dev, attr);
>  	}
> -	if (ret || drv->driver_managed_dma)
> +	/* @drv may not be valid when we're called from the IOMMU layer */
> +	if (ret || !dev->driver || drv->driver_managed_dma)
>  		return ret;
>  
>  	ret = iommu_device_use_default_domain(dev);
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index d1f3d327ddd1..a8be8cf246fb 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
>  	else
>  		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
>  
> -	if (!ret && !mc_drv->driver_managed_dma) {
> +	/* @mc_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> index c573ed2ee71a..780fb0c4adba 100644
> --- a/drivers/cdx/cdx.c
> +++ b/drivers/cdx/cdx.c
> @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
>  		return ret;
>  	}
>  
> -	if (!ret && !cdx_drv->driver_managed_dma) {
> +	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index a3b45b84f42b..1cec7074367a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
>  	if (!dev_iommu_get(dev))
>  		return -ENOMEM;
>  	/*
> -	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> -	 * instances with non-NULL fwnodes, and client devices should have been
> -	 * identified with a fwspec by this point. Otherwise, we can currently
> +	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
> +	 * is buried in the bus dma_configure path. Properly unpicking that is
> +	 * still a big job, so for now just invoke the whole thing. The device
> +	 * already having a driver bound means dma_configure has already run and
> +	 * either found no IOMMU to wait for, or we're in its replay call right
> +	 * now, so either way there's no point calling it again.
> +	 */
> +	if (!dev->driver && dev->bus->dma_configure) {
> +		mutex_unlock(&iommu_probe_device_lock);
> +		dev->bus->dma_configure(dev);
> +		mutex_lock(&iommu_probe_device_lock);
> +	}
> +	/*
> +	 * At this point, relevant devices either now have a fwspec which will
> +	 * match ops registered with a non-NULL fwnode, or we can reasonably
>  	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
>  	 * be present, and that any of their registered instances has suitable
>  	 * ops for probing, and thus cheekily co-opt the same mechanism.
> @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
>  		ret = -ENODEV;
>  		goto err_free;
>  	}
> +	/*
> +	 * And if we do now see any replay calls, they would indicate someone
> +	 * misusing the dma_configure path outside bus code.
> +	 */
> +	if (dev->driver)
> +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
>  
>  	if (!try_module_get(ops->owner)) {
>  		ret = -EINVAL;
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e10a68b5ffde..6b989a62def2 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>  		dev_iommu_free(dev);
>  	mutex_unlock(&iommu_probe_device_lock);
>  
> -	if (!err && dev->bus)
> +	/*
> +	 * If we're not on the iommu_probe_device() path (as indicated by the
> +	 * initial dev->iommu) then try to simulate it. This should no longer
> +	 * happen unless of_dma_configure() is being misused outside bus code.
> +	 */
> +	if (!err && dev->bus && !dev_iommu_present)
>  		err = iommu_probe_device(dev);
>  
>  	if (err && err != -EPROBE_DEFER)
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index edf3be197265..5053e5d532cc 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  	bool coherent, set_map = false;
>  	int ret;
>  
> +	if (dev->dma_range_map) {
> +		dev_dbg(dev, "dma_range_map already set\n");
> +		goto skip_map;
> +	}
> +
>  	if (np == dev->of_node)
>  		bus_np = __of_get_dma_parent(np);
>  	else
> @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
>  		end = dma_range_map_max(map);
>  		set_map = true;
>  	}
> -
> +skip_map:
>  	/*
>  	 * If @dev is expected to be DMA-capable then the bus code that created
>  	 * it should have initialised its dma_mask pointer by this point. For
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f57ea36d125d..4468b7327cab 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
>  
>  	pci_put_host_bridge_device(bridge);
>  
> -	if (!ret && !driver->driver_managed_dma) {
> +	/* @driver may not be valid when we're called from the IOMMU layer */
> +	if (!ret && dev->driver && !driver->driver_managed_dma) {
>  		ret = iommu_device_use_default_domain(dev);
>  		if (ret)
>  			arch_teardown_dma_ops(dev);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
  2025-08-11 16:44   ` Eric Auger
@ 2025-08-11 17:01     ` Bjorn Helgaas
  0 siblings, 0 replies; 44+ messages in thread
From: Bjorn Helgaas @ 2025-08-11 17:01 UTC (permalink / raw)
  To: Eric Auger
  Cc: Robin Murphy, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Rafael J. Wysocki, Len Brown, Russell King, Greg Kroah-Hartman,
	Danilo Krummrich, Stuart Yoder, Laurentiu Tudor, Nipun Gupta,
	Nikhil Agarwal, Joerg Roedel, Will Deacon, Rob Herring,
	Saravana Kannan, Bjorn Helgaas, Jerry Snitselaar,
	Jean-Philippe Brucker, linux-acpi, linux-arm-kernel, linux-kernel,
	iommu, devicetree, linux-pci, Charan Teja Kalla, regressions

[+cc regressions]

On Mon, Aug 11, 2025 at 06:44:01PM +0200, Eric Auger wrote:
> On 2/28/25 4:46 PM, Robin Murphy wrote:
> > In hindsight, there were some crucial subtleties overlooked when moving
> > {of,acpi}_dma_configure() to driver probe time to allow waiting for
> > IOMMU drivers with -EPROBE_DEFER, and these have become an
> > ever-increasing source of problems. The IOMMU API has some fundamental
> > assumptions that iommu_probe_device() is called for every device added
> > to the system, in the order in which they are added. Calling it in a
> > random order or not at all dependent on driver binding leads to
> > malformed groups, a potential lack of isolation for devices with no
> > driver, and all manner of unexpected concurrency and race conditions.
> > We've attempted to mitigate the latter with point-fix bodges like
> > iommu_probe_device_lock, but it's a losing battle and the time has come
> > to bite the bullet and address the true source of the problem instead.
> > 
> > The crux of the matter is that the firmware parsing actually serves two
> > distinct purposes; one is identifying the IOMMU instance associated with
> > a device so we can check its availability, the second is actually
> > telling that instance about the relevant firmware-provided data for the
> > device. However the latter also depends on the former, and at the time
> > there was no good place to defer and retry that separately from the
> > availability check we also wanted for client driver probe.
> > 
> > Nowadays, though, we have a proper notion of multiple IOMMU instances in
> > the core API itself, and each one gets a chance to probe its own devices
> > upon registration, so we can finally make that work as intended for
> > DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> > be able to run the iommu_fwspec machinery currently buried deep in the
> > wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> > surprisingly straightforward to bootstrap this transformation by pretty
> > much just calling the same path twice. At client driver probe time,
> > dev->driver is obviously set; conversely at device_add(), or a
> > subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> > should *not* have a driver already, so we can use that as a condition to
> > disambiguate the two cases, and avoid recursing back into the IOMMU core
> > at the wrong times.
> > 
> > Obviously this isn't the nicest thing, but for now it gives us a
> > functional baseline to then unpick the layers in between without many
> > more awkward cross-subsystem patches. There are some minor side-effects
> > like dma_range_map potentially being created earlier, and some debug
> > prints being repeated, but these aren't significantly detrimental. Let's
> > make things work first, then deal with making them nice.
> > 
> > With the basic flow finally in the right order again, the next step is
> > probably turning the bus->dma_configure paths inside-out, since all we
> > really need from bus code is its notion of which device and input ID(s)
> > to parse the common firmware properties with...
> > 
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> > Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> This patch seems to break virtio-iommu along with qemu. After applying
> it we cannot see any iommu group. I don't have any specific warning in
> dmesg though. Reverting bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into
> the proper probe path") fixes the issue for me. Added Jerry and
> jean-Philippe in the loop.

#regzbot introduced: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")

> > v2:
> >  - Comment bus driver changes for clarity
> >  - Use dev->iommu as the now-robust replay condition
> >  - Drop the device_iommu_mapped() checks in the firmware paths as they
> >    weren't doing much - we can't replace probe_device_lock just yet...
> >  
> >  drivers/acpi/arm64/dma.c        |  5 +++++
> >  drivers/acpi/scan.c             |  7 -------
> >  drivers/amba/bus.c              |  3 ++-
> >  drivers/base/platform.c         |  3 ++-
> >  drivers/bus/fsl-mc/fsl-mc-bus.c |  3 ++-
> >  drivers/cdx/cdx.c               |  3 ++-
> >  drivers/iommu/iommu.c           | 24 +++++++++++++++++++++---
> >  drivers/iommu/of_iommu.c        |  7 ++++++-
> >  drivers/of/device.c             |  7 ++++++-
> >  drivers/pci/pci-driver.c        |  3 ++-
> >  10 files changed, 48 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
> > index 52b2abf88689..f30f138352b7 100644
> > --- a/drivers/acpi/arm64/dma.c
> > +++ b/drivers/acpi/arm64/dma.c
> > @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev)
> >  	else
> >  		end = (1ULL << 32) - 1;
> >  
> > +	if (dev->dma_range_map) {
> > +		dev_dbg(dev, "dma_range_map already set\n");
> > +		return;
> > +	}
> > +
> >  	ret = acpi_dma_get_range(dev, &map);
> >  	if (!ret && map) {
> >  		end = dma_range_map_max(map);
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 9f4efa8f75a6..fb1fe9f3b1a3 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
> >  		err = viot_iommu_configure(dev);
> >  	mutex_unlock(&iommu_probe_device_lock);
> >  
> > -	/*
> > -	 * If we have reason to believe the IOMMU driver missed the initial
> > -	 * iommu_probe_device() call for dev, replay it to get things in order.
> > -	 */
> > -	if (!err && dev->bus)
> > -		err = iommu_probe_device(dev);
> > -
> >  	return err;
> >  }
> >  
> > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> > index 8ef259b4d037..71482d639a6d 100644
> > --- a/drivers/amba/bus.c
> > +++ b/drivers/amba/bus.c
> > @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev)
> >  		ret = acpi_dma_configure(dev, attr);
> >  	}
> >  
> > -	if (!ret && !drv->driver_managed_dma) {
> > +	/* @drv may not be valid when we're called from the IOMMU layer */
> > +	if (!ret && dev->driver && !drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 6f2a33722c52..1813cfd0c4bd 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
> >  		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
> >  		ret = acpi_dma_configure(dev, attr);
> >  	}
> > -	if (ret || drv->driver_managed_dma)
> > +	/* @drv may not be valid when we're called from the IOMMU layer */
> > +	if (ret || !dev->driver || drv->driver_managed_dma)
> >  		return ret;
> >  
> >  	ret = iommu_device_use_default_domain(dev);
> > diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> > index d1f3d327ddd1..a8be8cf246fb 100644
> > --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> > +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> > @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev)
> >  	else
> >  		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
> >  
> > -	if (!ret && !mc_drv->driver_managed_dma) {
> > +	/* @mc_drv may not be valid when we're called from the IOMMU layer */
> > +	if (!ret && dev->driver && !mc_drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
> > index c573ed2ee71a..780fb0c4adba 100644
> > --- a/drivers/cdx/cdx.c
> > +++ b/drivers/cdx/cdx.c
> > @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev)
> >  		return ret;
> >  	}
> >  
> > -	if (!ret && !cdx_drv->driver_managed_dma) {
> > +	/* @cdx_drv may not be valid when we're called from the IOMMU layer */
> > +	if (!ret && dev->driver && !cdx_drv->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index a3b45b84f42b..1cec7074367a 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
> >  	if (!dev_iommu_get(dev))
> >  		return -ENOMEM;
> >  	/*
> > -	 * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> > -	 * instances with non-NULL fwnodes, and client devices should have been
> > -	 * identified with a fwspec by this point. Otherwise, we can currently
> > +	 * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
> > +	 * is buried in the bus dma_configure path. Properly unpicking that is
> > +	 * still a big job, so for now just invoke the whole thing. The device
> > +	 * already having a driver bound means dma_configure has already run and
> > +	 * either found no IOMMU to wait for, or we're in its replay call right
> > +	 * now, so either way there's no point calling it again.
> > +	 */
> > +	if (!dev->driver && dev->bus->dma_configure) {
> > +		mutex_unlock(&iommu_probe_device_lock);
> > +		dev->bus->dma_configure(dev);
> > +		mutex_lock(&iommu_probe_device_lock);
> > +	}
> > +	/*
> > +	 * At this point, relevant devices either now have a fwspec which will
> > +	 * match ops registered with a non-NULL fwnode, or we can reasonably
> >  	 * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
> >  	 * be present, and that any of their registered instances has suitable
> >  	 * ops for probing, and thus cheekily co-opt the same mechanism.
> > @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev)
> >  		ret = -ENODEV;
> >  		goto err_free;
> >  	}
> > +	/*
> > +	 * And if we do now see any replay calls, they would indicate someone
> > +	 * misusing the dma_configure path outside bus code.
> > +	 */
> > +	if (dev->driver)
> > +		dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
> >  
> >  	if (!try_module_get(ops->owner)) {
> >  		ret = -EINVAL;
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> > index e10a68b5ffde..6b989a62def2 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
> >  		dev_iommu_free(dev);
> >  	mutex_unlock(&iommu_probe_device_lock);
> >  
> > -	if (!err && dev->bus)
> > +	/*
> > +	 * If we're not on the iommu_probe_device() path (as indicated by the
> > +	 * initial dev->iommu) then try to simulate it. This should no longer
> > +	 * happen unless of_dma_configure() is being misused outside bus code.
> > +	 */
> > +	if (!err && dev->bus && !dev_iommu_present)
> >  		err = iommu_probe_device(dev);
> >  
> >  	if (err && err != -EPROBE_DEFER)
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index edf3be197265..5053e5d532cc 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
> >  	bool coherent, set_map = false;
> >  	int ret;
> >  
> > +	if (dev->dma_range_map) {
> > +		dev_dbg(dev, "dma_range_map already set\n");
> > +		goto skip_map;
> > +	}
> > +
> >  	if (np == dev->of_node)
> >  		bus_np = __of_get_dma_parent(np);
> >  	else
> > @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
> >  		end = dma_range_map_max(map);
> >  		set_map = true;
> >  	}
> > -
> > +skip_map:
> >  	/*
> >  	 * If @dev is expected to be DMA-capable then the bus code that created
> >  	 * it should have initialised its dma_mask pointer by this point. For
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index f57ea36d125d..4468b7327cab 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev)
> >  
> >  	pci_put_host_bridge_device(bridge);
> >  
> > -	if (!ret && !driver->driver_managed_dma) {
> > +	/* @driver may not be valid when we're called from the IOMMU layer */
> > +	if (!ret && dev->driver && !driver->driver_managed_dma) {
> >  		ret = iommu_device_use_default_domain(dev);
> >  		if (ret)
> >  			arch_teardown_dma_ops(dev);
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-08-11 17:01 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
2025-02-28 15:46 ` [PATCH v2 1/4] iommu: Handle race with default domain setup Robin Murphy
2025-02-28 15:46 ` [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device() Robin Murphy
2025-03-05 17:55   ` Jason Gunthorpe
2025-02-28 15:46 ` [PATCH v2 3/4] iommu: Keep dev->iommu state consistent Robin Murphy
2025-03-05 18:14   ` Jason Gunthorpe
2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
2025-03-05 18:28   ` Jason Gunthorpe
2025-03-07 14:24   ` Lorenzo Pieralisi
2025-03-07 20:20     ` Robin Murphy
2025-03-11 18:42   ` Joerg Roedel
2025-03-12  7:07     ` Baolu Lu
2025-03-12 10:10     ` Robin Murphy
2025-03-12 14:34       ` Baolu Lu
2025-03-12 15:21       ` Joerg Roedel
     [not found]   ` <CGME20250313095633eucas1p29cb55f2504b4bcf67c16b3bd3fa9b8cd@eucas1p2.samsung.com>
2025-03-13  9:56     ` Marek Szyprowski
2025-03-13 11:01       ` Robin Murphy
2025-03-13 12:23         ` Marek Szyprowski
2025-03-13 13:06           ` Robin Murphy
2025-03-13 14:12             ` Robin Murphy
2025-03-17  7:37               ` Marek Szyprowski
2025-03-17 18:22                 ` Robin Murphy
2025-03-21 12:15                   ` Marek Szyprowski
2025-03-21 16:48                     ` Robin Murphy
2025-04-01 20:34                       ` Marek Szyprowski
2025-03-13 16:30         ` Anders Roxell
2025-03-18 16:37   ` Geert Uytterhoeven
2025-03-18 17:24     ` Robin Murphy
2025-03-25 15:32       ` Geert Uytterhoeven
2025-03-27  9:47   ` Chen-Yu Tsai
2025-03-27 11:00     ` Louis-Alexis Eyraud
2025-04-11  8:02   ` Johan Hovold
2025-04-14 15:37     ` Robin Murphy
2025-04-15 15:08       ` Johan Hovold
2025-04-24 13:58         ` Robin Murphy
2025-04-21 21:19   ` William McVicker
2025-04-22 19:00     ` Jason Gunthorpe
2025-04-22 21:55       ` William McVicker
2025-04-22 23:41         ` Jason Gunthorpe
2025-04-23 17:31           ` William McVicker
2025-04-23 18:18             ` Jason Gunthorpe
2025-08-11 16:44   ` Eric Auger
2025-08-11 17:01     ` Bjorn Helgaas
2025-03-10  8:29 ` [PATCH v2 0/4] iommu: Fix the longstanding probe issues Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).