* [PATCH v3 6.6] iommu: Handle race with default domain setup
@ 2025-04-29 10:47 Robin Murphy
2025-04-29 13:00 ` Greg KH
0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2025-04-29 10:47 UTC (permalink / raw)
To: stable; +Cc: Charan Teja Kalla
[ Upstream commit b46064a18810bad3aea089a79993ca5ea7a3d2b2 ]
It turns out that deferred default domain creation leaves a subtle
race window during iommu_device_register() wherein a client driver may
asynchronously probe in parallel and get as far as performing DMA API
operations with dma-direct, only to be switched to iommu-dma underfoot
once the default domain attachment finally happens, with obviously
disastrous consequences. Even the wonky of_iommu_configure() path is at
risk, since iommu_fwspec_init() will no longer defer client probe as the
instance ops are (necessarily) already registered, and the "replay"
iommu_probe_device() call can see dev->iommu_group already set and so
think there's nothing to do either.
Fortunately we already have the right tool in the right place in the
form of iommu_device_use_default_domain(), which just needs to ensure
that said default domain is actually ready to *be* used. Deferring the
client probe shouldn't have too much impact, given that this only
happens while the IOMMU driver is probing, and thus due to kick the
deferred probe list again once it finishes.
[ Backport: The above is true for mainline, but here we still have
arch_setup_dma_ops() to worry about, which is not replayed if the
default domain happens to be allocated *between* that call and
subsequently reaching iommu_device_use_default_domain(), so we need an
additional earlier check to cover that case. Also we're now back before
the nominal commit 98ac73f99bc4 so we need to tweak the logic to depend
on IOMMU_DMA as well, to avoid falsely deferring on architectures not
using default domains. This then serves us back as far as f188056352bc,
where this specific form of the problem first arises. ]
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
Fixes: f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
Resending as a new version with a new Message-Id so as not to confuse
the tools... 6.12.y should simply have a straight cherry-pick of the
mainline commit - 98ac73f99bc4 was in 6.7 so I'm not sure why autosel
hasn't picked that already?
Thanks,
Robin.
drivers/iommu/iommu.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f1029c0825e..713f2be48075 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -566,6 +566,17 @@ int iommu_probe_device(struct device *dev)
mutex_lock(&iommu_probe_device_lock);
ret = __iommu_probe_device(dev, NULL);
mutex_unlock(&iommu_probe_device_lock);
+
+ /*
+ * The dma_configure replay paths need bus_iommu_probe() to
+ * finish before they can call arch_setup_dma_ops()
+ */
+ if (IS_ENABLED(CONFIG_IOMMU_DMA) && !ret && dev->iommu_group) {
+ mutex_lock(&dev->iommu_group->mutex);
+ if (!dev->iommu_group->default_domain)
+ ret = -EPROBE_DEFER;
+ mutex_unlock(&dev->iommu_group->mutex);
+ }
if (ret)
return ret;
@@ -3149,6 +3160,11 @@ int iommu_device_use_default_domain(struct device *dev)
return 0;
mutex_lock(&group->mutex);
+ /* We may race against bus_iommu_probe() finalising groups here */
+ if (IS_ENABLED(CONFIG_IOMMU_DMA) && !group->default_domain) {
+ ret = -EPROBE_DEFER;
+ goto unlock_out;
+ }
if (group->owner_cnt) {
if (group->owner || !iommu_is_default_domain(group) ||
!xa_empty(&group->pasid_array)) {
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 6.6] iommu: Handle race with default domain setup
2025-04-29 10:47 [PATCH v3 6.6] iommu: Handle race with default domain setup Robin Murphy
@ 2025-04-29 13:00 ` Greg KH
2025-04-29 13:07 ` Robin Murphy
0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2025-04-29 13:00 UTC (permalink / raw)
To: Robin Murphy; +Cc: stable, Charan Teja Kalla
On Tue, Apr 29, 2025 at 11:47:40AM +0100, Robin Murphy wrote:
> [ Upstream commit b46064a18810bad3aea089a79993ca5ea7a3d2b2 ]
>
> It turns out that deferred default domain creation leaves a subtle
> race window during iommu_device_register() wherein a client driver may
> asynchronously probe in parallel and get as far as performing DMA API
> operations with dma-direct, only to be switched to iommu-dma underfoot
> once the default domain attachment finally happens, with obviously
> disastrous consequences. Even the wonky of_iommu_configure() path is at
> risk, since iommu_fwspec_init() will no longer defer client probe as the
> instance ops are (necessarily) already registered, and the "replay"
> iommu_probe_device() call can see dev->iommu_group already set and so
> think there's nothing to do either.
>
> Fortunately we already have the right tool in the right place in the
> form of iommu_device_use_default_domain(), which just needs to ensure
> that said default domain is actually ready to *be* used. Deferring the
> client probe shouldn't have too much impact, given that this only
> happens while the IOMMU driver is probing, and thus due to kick the
> deferred probe list again once it finishes.
>
> [ Backport: The above is true for mainline, but here we still have
> arch_setup_dma_ops() to worry about, which is not replayed if the
> default domain happens to be allocated *between* that call and
> subsequently reaching iommu_device_use_default_domain(), so we need an
> additional earlier check to cover that case. Also we're now back before
> the nominal commit 98ac73f99bc4 so we need to tweak the logic to depend
> on IOMMU_DMA as well, to avoid falsely deferring on architectures not
> using default domains. This then serves us back as far as f188056352bc,
> where this specific form of the problem first arises. ]
>
> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
> Fixes: f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
> Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> Resending as a new version with a new Message-Id so as not to confuse
> the tools... 6.12.y should simply have a straight cherry-pick of the
> mainline commit - 98ac73f99bc4 was in 6.7 so I'm not sure why autosel
> hasn't picked that already?
autosel is "maybe we get it", NEVER rely on it for an actual backport to
happen.
If you want this in 6.12.y, and it applies cleanly, just ask! But I
can't take this 6.6.y patch before that happens for obvious reasons.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 6.6] iommu: Handle race with default domain setup
2025-04-29 13:00 ` Greg KH
@ 2025-04-29 13:07 ` Robin Murphy
2025-04-29 14:22 ` Greg KH
0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2025-04-29 13:07 UTC (permalink / raw)
To: Greg KH; +Cc: stable, Charan Teja Kalla
On 29/04/2025 2:00 pm, Greg KH wrote:
> On Tue, Apr 29, 2025 at 11:47:40AM +0100, Robin Murphy wrote:
>> [ Upstream commit b46064a18810bad3aea089a79993ca5ea7a3d2b2 ]
>>
>> It turns out that deferred default domain creation leaves a subtle
>> race window during iommu_device_register() wherein a client driver may
>> asynchronously probe in parallel and get as far as performing DMA API
>> operations with dma-direct, only to be switched to iommu-dma underfoot
>> once the default domain attachment finally happens, with obviously
>> disastrous consequences. Even the wonky of_iommu_configure() path is at
>> risk, since iommu_fwspec_init() will no longer defer client probe as the
>> instance ops are (necessarily) already registered, and the "replay"
>> iommu_probe_device() call can see dev->iommu_group already set and so
>> think there's nothing to do either.
>>
>> Fortunately we already have the right tool in the right place in the
>> form of iommu_device_use_default_domain(), which just needs to ensure
>> that said default domain is actually ready to *be* used. Deferring the
>> client probe shouldn't have too much impact, given that this only
>> happens while the IOMMU driver is probing, and thus due to kick the
>> deferred probe list again once it finishes.
>>
>> [ Backport: The above is true for mainline, but here we still have
>> arch_setup_dma_ops() to worry about, which is not replayed if the
>> default domain happens to be allocated *between* that call and
>> subsequently reaching iommu_device_use_default_domain(), so we need an
>> additional earlier check to cover that case. Also we're now back before
>> the nominal commit 98ac73f99bc4 so we need to tweak the logic to depend
>> on IOMMU_DMA as well, to avoid falsely deferring on architectures not
>> using default domains. This then serves us back as far as f188056352bc,
>> where this specific form of the problem first arises. ]
>>
>> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
>> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
>> Fixes: f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
>> Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> Resending as a new version with a new Message-Id so as not to confuse
>> the tools... 6.12.y should simply have a straight cherry-pick of the
>> mainline commit - 98ac73f99bc4 was in 6.7 so I'm not sure why autosel
>> hasn't picked that already?
>
> autosel is "maybe we get it", NEVER rely on it for an actual backport to
> happen.
>
> If you want this in 6.12.y, and it applies cleanly, just ask! But I
> can't take this 6.6.y patch before that happens for obvious reasons.
Understood; I shall try harder to remember to include explicit stable
tags in future.
Could you please pick b46064a18810 for 6.12? I checked and there are
indeed no conflicts :)
Thanks,
Robin.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 6.6] iommu: Handle race with default domain setup
2025-04-29 13:07 ` Robin Murphy
@ 2025-04-29 14:22 ` Greg KH
2025-05-01 7:46 ` Greg KH
0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2025-04-29 14:22 UTC (permalink / raw)
To: Robin Murphy; +Cc: stable, Charan Teja Kalla
On Tue, Apr 29, 2025 at 02:07:19PM +0100, Robin Murphy wrote:
> On 29/04/2025 2:00 pm, Greg KH wrote:
> > On Tue, Apr 29, 2025 at 11:47:40AM +0100, Robin Murphy wrote:
> > > [ Upstream commit b46064a18810bad3aea089a79993ca5ea7a3d2b2 ]
> > >
> > > It turns out that deferred default domain creation leaves a subtle
> > > race window during iommu_device_register() wherein a client driver may
> > > asynchronously probe in parallel and get as far as performing DMA API
> > > operations with dma-direct, only to be switched to iommu-dma underfoot
> > > once the default domain attachment finally happens, with obviously
> > > disastrous consequences. Even the wonky of_iommu_configure() path is at
> > > risk, since iommu_fwspec_init() will no longer defer client probe as the
> > > instance ops are (necessarily) already registered, and the "replay"
> > > iommu_probe_device() call can see dev->iommu_group already set and so
> > > think there's nothing to do either.
> > >
> > > Fortunately we already have the right tool in the right place in the
> > > form of iommu_device_use_default_domain(), which just needs to ensure
> > > that said default domain is actually ready to *be* used. Deferring the
> > > client probe shouldn't have too much impact, given that this only
> > > happens while the IOMMU driver is probing, and thus due to kick the
> > > deferred probe list again once it finishes.
> > >
> > > [ Backport: The above is true for mainline, but here we still have
> > > arch_setup_dma_ops() to worry about, which is not replayed if the
> > > default domain happens to be allocated *between* that call and
> > > subsequently reaching iommu_device_use_default_domain(), so we need an
> > > additional earlier check to cover that case. Also we're now back before
> > > the nominal commit 98ac73f99bc4 so we need to tweak the logic to depend
> > > on IOMMU_DMA as well, to avoid falsely deferring on architectures not
> > > using default domains. This then serves us back as far as f188056352bc,
> > > where this specific form of the problem first arises. ]
> > >
> > > Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
> > > Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
> > > Fixes: f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
> > > Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >
> > > Resending as a new version with a new Message-Id so as not to confuse
> > > the tools... 6.12.y should simply have a straight cherry-pick of the
> > > mainline commit - 98ac73f99bc4 was in 6.7 so I'm not sure why autosel
> > > hasn't picked that already?
> >
> > autosel is "maybe we get it", NEVER rely on it for an actual backport to
> > happen.
> >
> > If you want this in 6.12.y, and it applies cleanly, just ask! But I
> > can't take this 6.6.y patch before that happens for obvious reasons.
>
> Understood; I shall try harder to remember to include explicit stable tags
> in future.
>
> Could you please pick b46064a18810 for 6.12? I checked and there are indeed
> no conflicts :)
Great, all now queued up.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 6.6] iommu: Handle race with default domain setup
2025-04-29 14:22 ` Greg KH
@ 2025-05-01 7:46 ` Greg KH
2025-05-01 10:34 ` Robin Murphy
0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2025-05-01 7:46 UTC (permalink / raw)
To: Robin Murphy; +Cc: stable, Charan Teja Kalla
On Tue, Apr 29, 2025 at 04:22:27PM +0200, Greg KH wrote:
> On Tue, Apr 29, 2025 at 02:07:19PM +0100, Robin Murphy wrote:
> > On 29/04/2025 2:00 pm, Greg KH wrote:
> > > On Tue, Apr 29, 2025 at 11:47:40AM +0100, Robin Murphy wrote:
> > > > [ Upstream commit b46064a18810bad3aea089a79993ca5ea7a3d2b2 ]
> > > >
> > > > It turns out that deferred default domain creation leaves a subtle
> > > > race window during iommu_device_register() wherein a client driver may
> > > > asynchronously probe in parallel and get as far as performing DMA API
> > > > operations with dma-direct, only to be switched to iommu-dma underfoot
> > > > once the default domain attachment finally happens, with obviously
> > > > disastrous consequences. Even the wonky of_iommu_configure() path is at
> > > > risk, since iommu_fwspec_init() will no longer defer client probe as the
> > > > instance ops are (necessarily) already registered, and the "replay"
> > > > iommu_probe_device() call can see dev->iommu_group already set and so
> > > > think there's nothing to do either.
> > > >
> > > > Fortunately we already have the right tool in the right place in the
> > > > form of iommu_device_use_default_domain(), which just needs to ensure
> > > > that said default domain is actually ready to *be* used. Deferring the
> > > > client probe shouldn't have too much impact, given that this only
> > > > happens while the IOMMU driver is probing, and thus due to kick the
> > > > deferred probe list again once it finishes.
> > > >
> > > > [ Backport: The above is true for mainline, but here we still have
> > > > arch_setup_dma_ops() to worry about, which is not replayed if the
> > > > default domain happens to be allocated *between* that call and
> > > > subsequently reaching iommu_device_use_default_domain(), so we need an
> > > > additional earlier check to cover that case. Also we're now back before
> > > > the nominal commit 98ac73f99bc4 so we need to tweak the logic to depend
> > > > on IOMMU_DMA as well, to avoid falsely deferring on architectures not
> > > > using default domains. This then serves us back as far as f188056352bc,
> > > > where this specific form of the problem first arises. ]
> > > >
> > > > Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
> > > > Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
> > > > Fixes: f188056352bc ("iommu: Avoid locking/unlocking for iommu_probe_device()")
> > > > Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com
> > > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > > ---
> > > >
> > > > Resending as a new version with a new Message-Id so as not to confuse
> > > > the tools... 6.12.y should simply have a straight cherry-pick of the
> > > > mainline commit - 98ac73f99bc4 was in 6.7 so I'm not sure why autosel
> > > > hasn't picked that already?
> > >
> > > autosel is "maybe we get it", NEVER rely on it for an actual backport to
> > > happen.
> > >
> > > If you want this in 6.12.y, and it applies cleanly, just ask! But I
> > > can't take this 6.6.y patch before that happens for obvious reasons.
> >
> > Understood; I shall try harder to remember to include explicit stable tags
> > in future.
> >
> > Could you please pick b46064a18810 for 6.12? I checked and there are indeed
> > no conflicts :)
>
> Great, all now queued up.
And now there's reports it is causing problems in the 6.6-stable queue:
https://lore.kernel.org/r/1903a129-6e06-47d9-862b-ab23f72a9fea@nvidia.com
so I'm going to drop it from the 6.6.y tree now, sorry.
Can you work with Jon to to get this figured out?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 6.6] iommu: Handle race with default domain setup
2025-05-01 7:46 ` Greg KH
@ 2025-05-01 10:34 ` Robin Murphy
0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2025-05-01 10:34 UTC (permalink / raw)
To: Greg KH, Jon Hunter; +Cc: stable, Charan Teja Kalla
On 2025-05-01 8:46 am, Greg KH wrote:
[...]
>> Great, all now queued up.
>
> And now there's reports it is causing problems in the 6.6-stable queue:
> https://lore.kernel.org/r/1903a129-6e06-47d9-862b-ab23f72a9fea@nvidia.com
> so I'm going to drop it from the 6.6.y tree now, sorry.
>
> Can you work with Jon to to get this figured out?
Oh fiddle... Jon, are you able to test if adding c8cc2655cc6c
("iommu/tegra-smmu: Implement an IDENTITY domain") makes it work OK? It
cherry-picks cleanly, and I reckon it's just about small enough to be
reasonable to take as a dependency. Indeed I had forgotten the subtlety
of Tegra being the one driver which is used with CONFIG_IOMMU_DMA but
still didn't support default domains at that point...
Thanks
Robin,
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-01 10:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 10:47 [PATCH v3 6.6] iommu: Handle race with default domain setup Robin Murphy
2025-04-29 13:00 ` Greg KH
2025-04-29 13:07 ` Robin Murphy
2025-04-29 14:22 ` Greg KH
2025-05-01 7:46 ` Greg KH
2025-05-01 10:34 ` Robin Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox