* 6.1-rc1 regression: bisected to 57365a04c921 iommu: Move bus setup to IOMMU device registration
@ 2022-10-19 23:25 Brian Norris
2022-10-20 10:02 ` Robin Murphy
0 siblings, 1 reply; 3+ messages in thread
From: Brian Norris @ 2022-10-19 23:25 UTC (permalink / raw)
To: Joerg Roedel, Will Deacon, Robin Murphy
Cc: Marek Szyprowski, Krishna Reddy, Kevin Tian, Matthew Rosato,
Niklas Schnelle, Robin Murphy, Joerg Roedel, iommu, linux-kernel
Hi all,
I'm testing out asynchronous probe (that's, kernel cmdline
'driver_async_probe=*' or similar), and I've identified a regression in
v6.1-rc1 due to this:
commit 57365a04c92126525a58bf7a1599ddfa832415e9
Author: Robin Murphy <robin.murphy@arm.com>
Date: Mon Aug 15 17:20:06 2022 +0100
iommu: Move bus setup to IOMMU device registration
In particular, I'm testing a Rockchip RK3399 system with
'driver_async_probe=rk_iommu', and finding I crash like this:
[ 0.180480] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
...
[ 0.180583] CPU: 2 PID: 49 Comm: kworker/u12:1 Not tainted 6.1.0-rc1 #57
[ 0.180593] Hardware name: Google Scarlet (DT)
[ 0.180602] Workqueue: events_unbound async_run_entry_fn
[ 0.180622] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.180632] pc : dev_iommu_free+0x24/0x54
[ 0.180644] lr : __iommu_probe_device+0x110/0x180
...
[ 0.180785] Call trace:
[ 0.180791] dev_iommu_free+0x24/0x54
[ 0.180800] __iommu_probe_device+0x110/0x180
[ 0.180807] probe_iommu_group+0x40/0x58
[ 0.180816] bus_for_each_dev+0x8c/0xd8
[ 0.180829] bus_iommu_probe+0x5c/0x2d0
[ 0.180840] iommu_device_register+0xbc/0x104
[ 0.180851] rk_iommu_probe+0x260/0x354
[ 0.180861] platform_probe+0xb4/0xd4
[ 0.180872] really_probe+0xfc/0x284
[ 0.180884] __driver_probe_device+0xc0/0xec
[ 0.180894] driver_probe_device+0x4c/0xd4
[ 0.180905] __driver_attach_async_helper+0x3c/0x60
[ 0.180915] async_run_entry_fn+0x34/0xd4
[ 0.180926] process_one_work+0x1e0/0x3b4
[ 0.180936] worker_thread+0x120/0x404
[ 0.180942] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[ 0.180944] kthread+0xf4/0x14c
[ 0.180953] ret_from_fork+0x10/0x20
[ 0.180968] Code: f9000bf3 910003fd f9416c13 f9016c1f (f9401a68)
[ 0.180981] ---[ end trace 0000000000000000 ]---
I find if I revert the above commit (and 29e932295bfa ("iommu: Clean up
bus_set_iommu()"), to keep the reverts clean), things start working again.
I haven't worked out exactly what's going wrong, but the patch looks like it
isn't async safe at all, due to the way each device is poking (without locking)
at the global iommu_buses[].
Is this a regression worth tracking (e.g., potentially reverting commit
57365a04c921)? Or should we just accept that iommu drivers cannot probe
asynchronously (and thus set PROBE_FORCE_SYNCHRONOUS for all iommu drivers)?
In the meantime, I'll see if I can figure out a reasonable non-revert solution
that is async safe, but I wanted to report this now, in case I don't
have much luck or enough time.
Thanks,
Brian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 6.1-rc1 regression: bisected to 57365a04c921 iommu: Move bus setup to IOMMU device registration
2022-10-19 23:25 6.1-rc1 regression: bisected to 57365a04c921 iommu: Move bus setup to IOMMU device registration Brian Norris
@ 2022-10-20 10:02 ` Robin Murphy
2022-10-20 16:21 ` Brian Norris
0 siblings, 1 reply; 3+ messages in thread
From: Robin Murphy @ 2022-10-20 10:02 UTC (permalink / raw)
To: Brian Norris, Joerg Roedel, Will Deacon
Cc: Marek Szyprowski, Krishna Reddy, Kevin Tian, Matthew Rosato,
Niklas Schnelle, Joerg Roedel, iommu, linux-kernel
On 2022-10-20 00:25, Brian Norris wrote:
> Hi all,
>
> I'm testing out asynchronous probe (that's, kernel cmdline
> 'driver_async_probe=*' or similar), and I've identified a regression in
> v6.1-rc1 due to this:
>
> commit 57365a04c92126525a58bf7a1599ddfa832415e9
> Author: Robin Murphy <robin.murphy@arm.com>
> Date: Mon Aug 15 17:20:06 2022 +0100
>
> iommu: Move bus setup to IOMMU device registration
>
> In particular, I'm testing a Rockchip RK3399 system with
> 'driver_async_probe=rk_iommu', and finding I crash like this:
>
> [ 0.180480] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> ...
> [ 0.180583] CPU: 2 PID: 49 Comm: kworker/u12:1 Not tainted 6.1.0-rc1 #57
> [ 0.180593] Hardware name: Google Scarlet (DT)
> [ 0.180602] Workqueue: events_unbound async_run_entry_fn
> [ 0.180622] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 0.180632] pc : dev_iommu_free+0x24/0x54
> [ 0.180644] lr : __iommu_probe_device+0x110/0x180
> ...
> [ 0.180785] Call trace:
> [ 0.180791] dev_iommu_free+0x24/0x54
> [ 0.180800] __iommu_probe_device+0x110/0x180
> [ 0.180807] probe_iommu_group+0x40/0x58
> [ 0.180816] bus_for_each_dev+0x8c/0xd8
> [ 0.180829] bus_iommu_probe+0x5c/0x2d0
> [ 0.180840] iommu_device_register+0xbc/0x104
> [ 0.180851] rk_iommu_probe+0x260/0x354
> [ 0.180861] platform_probe+0xb4/0xd4
> [ 0.180872] really_probe+0xfc/0x284
> [ 0.180884] __driver_probe_device+0xc0/0xec
> [ 0.180894] driver_probe_device+0x4c/0xd4
> [ 0.180905] __driver_attach_async_helper+0x3c/0x60
> [ 0.180915] async_run_entry_fn+0x34/0xd4
> [ 0.180926] process_one_work+0x1e0/0x3b4
> [ 0.180936] worker_thread+0x120/0x404
> [ 0.180942] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> [ 0.180944] kthread+0xf4/0x14c
> [ 0.180953] ret_from_fork+0x10/0x20
> [ 0.180968] Code: f9000bf3 910003fd f9416c13 f9016c1f (f9401a68)
> [ 0.180981] ---[ end trace 0000000000000000 ]---
Hmm, on the face of it that callstack doesn't even make much sense -
dev_iommu_free+0x24 presumably has to be the dereference of
param->fwspec, but __iommu_probe_device can only hit that failure path
after a successful dev_iommu_get(), so dev->iommu should not be NULL.
> I find if I revert the above commit (and 29e932295bfa ("iommu: Clean up
> bus_set_iommu()"), to keep the reverts clean), things start working again.
>
> I haven't worked out exactly what's going wrong, but the patch looks like it
> isn't async safe at all, due to the way each device is poking (without locking)
> at the global iommu_buses[].
We shouldn't really need locking for iommu_buses itself, but I guess to
support async probe we would need per-device locking in
iommu_probe_device() to prevent multiple threads trying to probe the
same device at once, which must be what's happening in your case to
cause a double-dev_iommu_free(). I'll see what I can do ASAP, since I
think that's worthwhile. In the meantime, as to why you're hitting the
failure path at all, I think that's another subtle oversight on my part,
does something like the diff below help?
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 4f2039789897..00fc57c86ea4 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1163,6 +1163,14 @@ static struct iommu_group *rk_iommu_device_group(struct device *dev)
iommu = rk_iommu_from_dev(dev);
+ /*
+ * Use the first registered IOMMU device for domain to use with DMA
+ * API, since a domain might not physically correspond to a single
+ * IOMMU device..
+ */
+ if (!dma_dev)
+ dma_dev = iommu->dev;
+
return iommu_group_ref_get(iommu->group);
}
@@ -1293,14 +1301,6 @@ static int rk_iommu_probe(struct platform_device *pdev)
if (err)
goto err_remove_sysfs;
- /*
- * Use the first registered IOMMU device for domain to use with DMA
- * API, since a domain might not physically correspond to a single
- * IOMMU device..
- */
- if (!dma_dev)
- dma_dev = &pdev->dev;
-
pm_runtime_enable(dev);
for (i = 0; i < iommu->num_irq; i++) {
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: 6.1-rc1 regression: bisected to 57365a04c921 iommu: Move bus setup to IOMMU device registration
2022-10-20 10:02 ` Robin Murphy
@ 2022-10-20 16:21 ` Brian Norris
0 siblings, 0 replies; 3+ messages in thread
From: Brian Norris @ 2022-10-20 16:21 UTC (permalink / raw)
To: Robin Murphy
Cc: Joerg Roedel, Will Deacon, Marek Szyprowski, Krishna Reddy,
Kevin Tian, Matthew Rosato, Niklas Schnelle, Joerg Roedel, iommu,
linux-kernel, linux-rockchip
On Thu, Oct 20, 2022 at 11:02:23AM +0100, Robin Murphy wrote:
> We shouldn't really need locking for iommu_buses itself, but I guess to
> support async probe we would need per-device locking in
> iommu_probe_device() to prevent multiple threads trying to probe the
> same device at once, which must be what's happening in your case to
> cause a double-dev_iommu_free().
Perhaps. I still haven't spent enough time trying to learn the expected
behavior here.
> I'll see what I can do ASAP, since I
> think that's worthwhile.
Awesome!
> In the meantime, as to why you're hitting the
> failure path at all, I think that's another subtle oversight on my part,
Ah, I didn't even pay attention to the fact we were hitting an error
path here.
The comments you move ("Use the first registered IOMMU device") might
suggest that some of the problems could be more fundamental to rk_iommu,
since with async probe, there is no such "first registerered device".
(Well, they get serialized into some lists eventually, but for the most
part we have to consider them unordered.)
> does something like the diff below help?
I get a different crash at least!
[ 0.183048] Unable to handle kernel NULL pointer dereference at virtual address 00000000000002a0
...
[ 0.183133] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #66
[ 0.183143] Hardware name: Google Scarlet (DT)
...
[ 0.183314] Call trace:
[ 0.183319] _raw_spin_lock_irqsave+0x44/0x9c
[ 0.183328] devres_add+0x34/0x64
[ 0.183336] devm_kmalloc+0xac/0xcc
[ 0.183345] rk_iommu_of_xlate+0x3c/0x80
[ 0.183358] of_iommu_xlate+0x8c/0xd0
[ 0.183367] of_iommu_configure+0x120/0x1d0
[ 0.183377] of_dma_configure_id+0x190/0x244
[ 0.183387] platform_dma_configure+0x40/0x88
[ 0.183397] really_probe+0xac/0x284
[ 0.183409] __driver_probe_device+0xc0/0xec
[ 0.183419] driver_probe_device+0x4c/0xd4
[ 0.183429] __driver_attach+0xb8/0x10c
[ 0.183439] bus_for_each_dev+0x8c/0xd8
[ 0.183448] driver_attach+0x30/0x3c
[ 0.183458] bus_add_driver+0x118/0x1f8
[ 0.183467] driver_register+0x70/0x10c
[ 0.183475] __platform_register_drivers+0x5c/0xcc
[ 0.183484] rockchip_drm_init+0x74/0xc4
[ 0.183495] do_one_initcall+0x154/0x2e0
[ 0.183505] do_initcall_level+0x134/0x160
[ 0.183515] do_initcalls+0x60/0xa0
[ 0.183524] do_basic_setup+0x28/0x34
[ 0.183532] kernel_init_freeable+0xf8/0x150
[ 0.183541] kernel_init+0x2c/0x12c
[ 0.183551] ret_from_fork+0x10/0x20
[ 0.183567] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
[ 0.183574] ---[ end trace 0000000000000000 ]---
Brian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-10-20 16:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-19 23:25 6.1-rc1 regression: bisected to 57365a04c921 iommu: Move bus setup to IOMMU device registration Brian Norris
2022-10-20 10:02 ` Robin Murphy
2022-10-20 16:21 ` Brian Norris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox