From: William McVicker <willmcvicker@google.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>,
Hanjun Guo <guohanjun@huawei.com>,
Sudeep Holla <sudeep.holla@arm.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>, Russell King <linux@armlinux.org.uk>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Danilo Krummrich <dakr@kernel.org>,
Stuart Yoder <stuyoder@gmail.com>,
Laurentiu Tudor <laurentiu.tudor@nxp.com>,
Nipun Gupta <nipun.gupta@amd.com>,
Nikhil Agarwal <nikhil.agarwal@amd.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Rob Herring <robh@kernel.org>,
Saravana Kannan <saravanak@google.com>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
devicetree@vger.kernel.org, linux-pci@vger.kernel.org,
Charan Teja Kalla <quic_charante@quicinc.com>
Subject: Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path
Date: Mon, 21 Apr 2025 14:19:35 -0700 [thread overview]
Message-ID: <aAa2Zx86yUfayPSG@google.com> (raw)
In-Reply-To: <e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com>
Hi Robin,
On 02/28/2025, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
>
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
>
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
>
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
>
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> v2:
> - Comment bus driver changes for clarity
> - Use dev->iommu as the now-robust replay condition
> - Drop the device_iommu_mapped() checks in the firmware paths as they
> weren't doing much - we can't replace probe_device_lock just yet...
>
> drivers/acpi/arm64/dma.c | 5 +++++
> drivers/acpi/scan.c | 7 -------
> drivers/amba/bus.c | 3 ++-
> drivers/base/platform.c | 3 ++-
> drivers/bus/fsl-mc/fsl-mc-bus.c | 3 ++-
> drivers/cdx/cdx.c | 3 ++-
> drivers/iommu/iommu.c | 24 +++++++++++++++++++++---
> drivers/iommu/of_iommu.c | 7 ++++++-
> drivers/of/device.c | 7 ++++++-
> drivers/pci/pci-driver.c | 3 ++-
> 10 files changed, 48 insertions(+), 17 deletions(-)
>
[...]
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
> attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
> ret = acpi_dma_configure(dev, attr);
> }
> - if (ret || drv->driver_managed_dma)
> + /* @drv may not be valid when we're called from the IOMMU layer */
> + if (ret || !dev->driver || drv->driver_managed_dma)
> return ret;
>
> ret = iommu_device_use_default_domain(dev);
I wanted to report a regression here that was exposed by the new probing
behavior. On Pixel 6, we load our kernel modules in parallel which means
probing is done in parallel. This results in a race condition between the IOMMU
thread and the device probing thread. What I'm seeing is at the top of the
function `platform_dma_configure()` when we assign
`drv = to_platform_driver(dev->driver);`, `dev->driver` is NULL which results
in `drv = 0xf...ffd8`. In parallel, if the driver gets bound to the device
before we reach the above if-statement, then `dev->driver != NULL` and we will
de-reference `drv` -- resulting in a kernel panic.
To address this race condition and KP, we need to defer assigning `drv` until
after we check if the driver is bound. Here is what works for me:
----->8-----
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1813cfd0c4bd..6d124447545c 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1440,8 +1440,8 @@ static void platform_shutdown(struct device *_dev)
static int platform_dma_configure(struct device *dev)
{
- struct platform_driver *drv = to_platform_driver(dev->driver);
struct fwnode_handle *fwnode = dev_fwnode(dev);
+ struct platform_driver *drv;
enum dev_dma_attr attr;
int ret = 0;
@@ -1451,8 +1451,12 @@ static int platform_dma_configure(struct device *dev)
attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
ret = acpi_dma_configure(dev, attr);
}
- /* @drv may not be valid when we're called from the IOMMU layer */
- if (ret || !dev->driver || drv->driver_managed_dma)
+ /* @dev->driver may not be valid when we're called from the IOMMU layer */
+ if (ret || !dev->driver)
+ return ret;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->driver_managed_dma)
return ret;
ret = iommu_device_use_default_domain(dev);
--
Please let me know what you think.
Thanks,
Will
[...]
next prev parent reply other threads:[~2025-04-21 21:19 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 15:46 [PATCH v2 0/4] iommu: Fix the longstanding probe issues Robin Murphy
2025-02-28 15:46 ` [PATCH v2 1/4] iommu: Handle race with default domain setup Robin Murphy
2025-02-28 15:46 ` [PATCH v2 2/4] iommu: Resolve ops in iommu_init_device() Robin Murphy
2025-03-05 17:55 ` Jason Gunthorpe
2025-02-28 15:46 ` [PATCH v2 3/4] iommu: Keep dev->iommu state consistent Robin Murphy
2025-03-05 18:14 ` Jason Gunthorpe
2025-02-28 15:46 ` [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path Robin Murphy
2025-03-05 18:28 ` Jason Gunthorpe
2025-03-07 14:24 ` Lorenzo Pieralisi
2025-03-07 20:20 ` Robin Murphy
2025-03-11 18:42 ` Joerg Roedel
2025-03-12 7:07 ` Baolu Lu
2025-03-12 10:10 ` Robin Murphy
2025-03-12 14:34 ` Baolu Lu
2025-03-12 15:21 ` Joerg Roedel
2025-03-13 9:56 ` Marek Szyprowski
2025-03-13 11:01 ` Robin Murphy
2025-03-13 16:30 ` Anders Roxell
2025-03-13 12:23 ` Marek Szyprowski
2025-03-13 13:06 ` Robin Murphy
2025-03-13 14:12 ` Robin Murphy
2025-03-17 7:37 ` Marek Szyprowski
2025-03-17 18:22 ` Robin Murphy
2025-03-21 12:15 ` Marek Szyprowski
2025-03-21 16:48 ` Robin Murphy
2025-04-01 20:34 ` Marek Szyprowski
2025-03-18 16:37 ` Geert Uytterhoeven
2025-03-18 17:24 ` Robin Murphy
2025-03-25 15:32 ` Geert Uytterhoeven
2025-03-27 9:47 ` Chen-Yu Tsai
2025-03-27 11:00 ` Louis-Alexis Eyraud
2025-04-11 8:02 ` Johan Hovold
2025-04-14 15:37 ` Robin Murphy
2025-04-15 15:08 ` Johan Hovold
2025-04-24 13:58 ` Robin Murphy
2025-04-21 21:19 ` William McVicker [this message]
2025-04-22 19:00 ` Jason Gunthorpe
2025-04-22 21:55 ` William McVicker
2025-04-22 23:41 ` Jason Gunthorpe
2025-04-23 17:31 ` William McVicker
2025-04-23 18:18 ` Jason Gunthorpe
2025-08-11 16:44 ` Eric Auger
2025-08-11 17:01 ` Bjorn Helgaas
2026-03-23 17:18 ` Tudor Ambarus
2026-03-23 20:49 ` Robin Murphy
2026-04-01 11:49 ` Tudor Ambarus
2025-03-10 8:29 ` [PATCH v2 0/4] iommu: Fix the longstanding probe issues Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAa2Zx86yUfayPSG@google.com \
--to=willmcvicker@google.com \
--cc=bhelgaas@google.com \
--cc=dakr@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=guohanjun@huawei.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=laurentiu.tudor@nxp.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=lpieralisi@kernel.org \
--cc=nikhil.agarwal@amd.com \
--cc=nipun.gupta@amd.com \
--cc=quic_charante@quicinc.com \
--cc=rafael@kernel.org \
--cc=robh@kernel.org \
--cc=robin.murphy@arm.com \
--cc=saravanak@google.com \
--cc=stuyoder@gmail.com \
--cc=sudeep.holla@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.