* [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
@ 2025-12-29 22:26 Christian Schrefl
2026-01-05 18:02 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Christian Schrefl @ 2025-12-29 22:26 UTC (permalink / raw)
To: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
Robin Murphy, iommu, linux-arm-msm
Cc: Rudraksha Gupta
Hi everyone,
I've found a panic on boot with v6.19-rc3 on the asus-nexus7-flo tablet with a APQ8064 CPU.
I've bisected it down to commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper
probe path"). Reverting the drivers/iommu/iommu.c changes (removing the added if block)
fixes the crash, but that presumably exists for a reason.
The diff for the fix:
```
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2ca990dfbb88..9f32d70b207d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -453,14 +453,6 @@ static int iommu_init_device(struct device *dev)
* already having a driver bound means dma_configure has already run and
* found no IOMMU to wait for, so there's no point calling it again.
*/
- if (!dev->iommu->fwspec && !dev->driver && dev->bus->dma_configure) {
- mutex_unlock(&iommu_probe_device_lock);
- dev->bus->dma_configure(dev);
- mutex_lock(&iommu_probe_device_lock);
- /* If another instance finished the job for us, skip it */
- if (!dev->iommu || dev->iommu_group)
- return -ENODEV;
- }
/*
* At this point, relevant devices either now have a fwspec which will
* match ops registered with a non-NULL fwnode, or we can reasonably
```
The panic message is (without the diff applied):
```
[ 5.458266] msm_iommu: device mapped at (ptrval), irq 33 with 2 ctx banks
[ 5.460667] 8<--- cut here ---
[ 5.464071] Unable to handle kernel NULL pointer dereference at virtual address 00000088 when read
[ 5.467033] [00000088] *pgd=00000000
[ 5.475971] Internal error: Oops: 5 [#1] SMP ARM
[ 5.479703] Modules linked in:
[ 5.484300] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-rc3-dirty #41 PREEMPT
[ 5.487168] Hardware name: Generic DT based system
[ 5.495410] PC is at qcom_iommu_of_xlate+0x84/0x174
[ 5.500187] LR is at qcom_iommu_of_xlate+0x1c/0x174
[ 5.504956] pc : [<c0810840>] lr : [<c08107d8>] psr: 80000093
[ 5.509824] sp : f081dc20 ip : 00000002 fp : c1268ae8
[ 5.516068] r10: c1268ad8 r9 : 00000000 r8 : f081dc64
[ 5.521278] r7 : 20000013 r6 : c208f428 r5 : f081dc64 r4 : c2198c10
[ 5.526488] r3 : c208f400 r2 : c208f50c r1 : 00000000 r0 : eeff2024
[ 5.533089] Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[ 5.539601] Control: 10c5787d Table: 8000406a DAC: 00000051
[ 5.546803] Register r0 information: non-slab/vmalloc memory
[ 5.552616] Register r1 information: NULL pointer
[ 5.558344] Register r2 information: slab kmalloc-192 start c208f480 pointer offset 140 size 192
[ 5.562955] Register r3 information: slab kmalloc-192 start c208f3c0 pointer offset 64 size 192
[ 5.571804] Register r4 information: slab kmalloc-1k start c2198c00 pointer offset 16 size 1024
[ 5.580229] Register r5 information: 2-page vmalloc region starting at 0xf081c000 allocated at copy_process+0x154/0x11a0
[ 5.588911] Register r6 information: slab kmalloc-192 start c208f3c0 pointer offset 104 size 192
[ 5.600019] Register r7 information: non-paged memory
[ 5.608778] Register r8 information: 2-page vmalloc region starting at 0xf081c000 allocated at copy_process+0x154/0x11a0
[ 5.613737] Register r9 information: NULL pointer
[ 5.624664] Register r10 information: non-slab/vmalloc memory
[ 5.629265] Register r11 information: non-slab/vmalloc memory
[ 5.634994] Register r12 information: non-paged memory
[ 5.640727] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[ 5.645760] Stack: (0xf081dc20 to 0xf081e000)
[ 5.651937] dc20: c1268ae8 c0809060 c208f440 f081dc64 c2198c10 c1a4dca0 c214dfc0 c080f558
[ 5.656195] dc40: ffffffed eeff39c8 c2198c10 c080fb1c 00000000 f081dc64 c2198d80 c0ae6cc0
[ 5.664357] dc60: eeff39c8 eeff2024 00000001 00000000 00000000 00000000 00000000 00000000
[ 5.672514] dc80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 5.680675] dca0: 00000000 00000000 00000000 c6f787fa 00000000 c2198c10 c1ab3900 eeff39c8
[ 5.688837] dcc0: ffffffff 00000000 00000000 00000020 00000000 c0ae03b8 00000000 00000000
[ 5.696995] dce0: c22b8171 00000007 00000000 c6f787fa 00001402 c2198c10 00000000 c2198c10
[ 5.705154] dd00: c080aa54 f081ddb8 c0edd194 c080aa54 c1464a64 c088a720 c080aa54 f081ddb8
[ 5.713315] dd20: c0edd194 00000000 c2198c10 c08099f4 00000000 c6f787fa c22b8000 c2198c10
[ 5.721477] dd40: c205c900 c2198c10 c080aa54 c080a880 c214dfc0 00000000 00000000 c230b400
[ 5.729634] dd60: c2198c10 c205c900 f081ddb8 c080aa54 c208f5c0 c080aa78 c205c900 f081ddb8
[ 5.737795] dd80: 00000000 c0885280 c21e5c00 c205c958 c2162734 c6f787fa c2197410 c1a4dca0
[ 5.745955] dda0: c2197410 f081ddb8 c2197410 c080a514 c1268d70 c2197410 f081ddb8 f081ddb8
[ 5.754114] ddc0: c12f2e98 c6f787fa f0c01004 c208f580 00000000 c1a4dc30 c2197410 f081de04
[ 5.762272] dde0: c12f2e98 c1447854 c1464a64 c0810548 f081de04 c1268d20 c208f580 c6f787fa
[ 5.770431] de00: c2197410 07600000 00000002 c6f787fa c1a4dc4c c2197410 c1a4dc4c c1aaa6a8
[ 5.778590] de20: c1a4dc4c 00000000 c12f2e98 c088a798 c2197410 00000000 c1aaa6a8 c088765c
[ 5.786749] de40: c2197510 c2197410 c2197410 c1a4dc4c c1aaa6a8 00000000 c236bcb8 c0887a34
[ 5.794909] de60: 60000013 c12f2e98 c1af34fc c1a4dc4c c2197410 c236bcb8 c12f2e98 c0887c58
[ 5.803070] de80: c2197410 c1a4dc4c c1a4dc4c c0887e60 c236bcb8 c0887f54 c1447854 c2197454
[ 5.811230] dea0: 00000000 c205c900 c1a4dc4c c0885280 00000000 c205c958 c2162434 c6f787fa
[ 5.819389] dec0: c205c900 c236bc80 00000000 c1a4dc4c c205c900 c08866f4 c1268f80 c1425a74
[ 5.827550] dee0: c1a4dc4c cf7e8000 00000006 c1425a74 00000000 c0889060 00000126 cf7e8000
[ 5.835713] df00: c1a95c80 c0119b18 00000126 c0162cec 00000126 00000000 00000000 00000000
[ 5.843871] df20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 5.852028] df40: 00000000 00000000 00000000 00000000 00000000 c6f787fa 00000126 c205cd00
[ 5.860189] df60: 00000006 c1447834 c1a95c80 c1401488 00000006 00000006 00000000 c14004cc
[ 5.868348] df80: 00000000 00000000 c0d0b5bc 00000000 00000000 00000000 00000000 00000000
[ 5.876509] dfa0: 00000000 c0d0b5cc 00000000 c010014c 00000000 00000000 00000000 00000000
[ 5.884665] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 5.892824] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 5.900971] Call trace:
[ 5.900999] qcom_iommu_of_xlate from of_iommu_xlate+0x7c/0x9c
[ 5.911734] of_iommu_xlate from of_iommu_configure+0x238/0x290
[ 5.917377] of_iommu_configure from of_dma_configure_id+0xe8/0x400
[ 5.923203] of_dma_configure_id from platform_dma_configure+0xb0/0xcc
[ 5.929446] platform_dma_configure from iommu_init_device+0x1f4/0x2ec
[ 5.936044] iommu_init_device from __iommu_probe_device+0x30/0x204
[ 5.942551] __iommu_probe_device from probe_iommu_group+0x24/0x48
[ 5.948714] probe_iommu_group from bus_for_each_dev+0x7c/0xcc
[ 5.954963] bus_for_each_dev from iommu_device_register+0xb4/0x20c
[ 5.960778] iommu_device_register from msm_iommu_probe+0x248/0x344
[ 5.966944] msm_iommu_probe from platform_probe+0x5c/0x90
[ 5.973191] platform_probe from really_probe+0xe0/0x41c
[ 5.978748] really_probe from __driver_probe_device+0x9c/0x1f4
[ 5.984215] __driver_probe_device from driver_probe_device+0x34/0xd0
[ 5.989861] driver_probe_device from __driver_attach+0xf4/0x228
[ 5.996456] __driver_attach from bus_for_each_dev+0x7c/0xcc
[ 6.002531] bus_for_each_dev from bus_add_driver+0xe0/0x230
[ 6.008176] bus_add_driver from driver_register+0x84/0x130
[ 6.013817] driver_register from do_one_initcall+0x58/0x260
[ 6.019116] do_one_initcall from kernel_init_freeable+0x1cc/0x238
[ 6.025019] kernel_init_freeable from kernel_init+0x10/0x130
[ 6.031007] kernel_init from ret_from_fork+0x14/0x28
[ 6.036817] Exception stack(0xf081dfb0 to 0xf081dff8)
[ 6.041861] dfa0: 00000000 00000000 00000000 00000000
[ 6.046904] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 6.055059] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 6.063225] Code: e2836028 15911020 e1560002 0a000019 (e591c088)
[ 6.069635] ---[ end trace 0000000000000000 ]---
[ 6.075883] note: swapper/0[1] exited with irqs disabled
[ 6.080766] note: swapper/0[1] exited with preempt_count 1
[ 6.085969] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 6.091452] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
```
To build the kernel and bootable image I've used the rootfs CPIO and makefile from
https://dflund.se/~triad/krad/nexus7-flo/
The makefile was adapted slightly to work on a modern kernel
it can be found at: https://pastebin.com/cC4knWDf
In order to build and run it I'm using the following commands:
```
make -f nexus7.mak config
make -f nexus7.mak build
fastboot --base 0x80200000 --cmdline "xxxxxxxxxxxxxxxxxxxxxxxxxxconsole=ttyMSM0,115200,n8 debug earlycon" boot ~/zImage
```
The full output on the serial console:
https://pastebin.com/mZDCtTg2
It is only dirty because of the nexus7.mak file.
The apq8064-pinctrl probe failure is unrelated, it starts occurring from somewhere between 6.17.0 and 6.18,
I'll need to investigate that later.
The full resulting .config: https://pastebin.com/GsiHrxhq
Cheers,
Christian
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
2025-12-29 22:26 [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064 Christian Schrefl
@ 2026-01-05 18:02 ` Jason Gunthorpe
2026-01-06 18:50 ` Robin Murphy
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2026-01-05 18:02 UTC (permalink / raw)
To: Christian Schrefl
Cc: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
Robin Murphy, iommu, linux-arm-msm, Rudraksha Gupta
On Mon, Dec 29, 2025 at 11:26:42PM +0100, Christian Schrefl wrote:
> Hi everyone,
>
> I've found a panic on boot with v6.19-rc3 on the asus-nexus7-flo tablet with a APQ8064 CPU.
>
> I've bisected it down to commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper
> probe path"). Reverting the drivers/iommu/iommu.c changes (removing the added if block)
> fixes the crash, but that presumably exists for a reason.
> The diff for the fix:
> ```
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2ca990dfbb88..9f32d70b207d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -453,14 +453,6 @@ static int iommu_init_device(struct device *dev)
> * already having a driver bound means dma_configure has already run and
> * found no IOMMU to wait for, so there's no point calling it again.
> */
> - if (!dev->iommu->fwspec && !dev->driver && dev->bus->dma_configure) {
> - mutex_unlock(&iommu_probe_device_lock);
> - dev->bus->dma_configure(dev);
> - mutex_lock(&iommu_probe_device_lock);
> - /* If another instance finished the job for us, skip it */
> - if (!dev->iommu || dev->iommu_group)
> - return -ENODEV;
> - }
> /*
> * At this point, relevant devices either now have a fwspec which will
> * match ops registered with a non-NULL fwnode, or we can reasonably
> ```
> [ 5.900971] Call trace:
> [ 5.900999] qcom_iommu_of_xlate from of_iommu_xlate+0x7c/0x9c
Did you look at what line in qcom_iommu_of_xlate() is crashing with
NULL pointer?
It wasn't so obvious to me what it could be..
Though this looks really weird:
struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
int sid;
if (list_empty(&(*iommu)->ctx_list)) {
master = kzalloc(sizeof(*master), GFP_ATOMIC);
So maybe master is NULL and !list_empty()? It is really confused, it
looks like it only adds exactly one entry to the ctx_list? Why have
the list?
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
2026-01-05 18:02 ` Jason Gunthorpe
@ 2026-01-06 18:50 ` Robin Murphy
2026-01-15 19:02 ` Christian Schrefl
0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2026-01-06 18:50 UTC (permalink / raw)
To: Jason Gunthorpe, Christian Schrefl
Cc: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
iommu, linux-arm-msm, Rudraksha Gupta
On 2026-01-05 6:02 pm, Jason Gunthorpe wrote:
> On Mon, Dec 29, 2025 at 11:26:42PM +0100, Christian Schrefl wrote:
>> Hi everyone,
>>
>> I've found a panic on boot with v6.19-rc3 on the asus-nexus7-flo tablet with a APQ8064 CPU.
>>
>> I've bisected it down to commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper
>> probe path"). Reverting the drivers/iommu/iommu.c changes (removing the added if block)
>> fixes the crash, but that presumably exists for a reason.
>> The diff for the fix:
>> ```
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2ca990dfbb88..9f32d70b207d 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -453,14 +453,6 @@ static int iommu_init_device(struct device *dev)
>> * already having a driver bound means dma_configure has already run and
>> * found no IOMMU to wait for, so there's no point calling it again.
>> */
>> - if (!dev->iommu->fwspec && !dev->driver && dev->bus->dma_configure) {
>> - mutex_unlock(&iommu_probe_device_lock);
>> - dev->bus->dma_configure(dev);
>> - mutex_lock(&iommu_probe_device_lock);
>> - /* If another instance finished the job for us, skip it */
>> - if (!dev->iommu || dev->iommu_group)
>> - return -ENODEV;
>> - }
>> /*
>> * At this point, relevant devices either now have a fwspec which will
>> * match ops registered with a non-NULL fwnode, or we can reasonably
>> ```
>
>> [ 5.900971] Call trace:
>> [ 5.900999] qcom_iommu_of_xlate from of_iommu_xlate+0x7c/0x9c
>
> Did you look at what line in qcom_iommu_of_xlate() is crashing with
> NULL pointer?
>
> It wasn't so obvious to me what it could be..
Comparing the "Code:" line from the dump against a local build, this
looks to be the dereference of master->num_mids at the start of the
loop.
> Though this looks really weird:
>
> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
> int sid;
>
> if (list_empty(&(*iommu)->ctx_list)) {
> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>
> So maybe master is NULL and !list_empty()?
AFAICS that could happen if of_xlate has run once, and then for whatever
reason dev->iommu is torn down and the whole process started from
scratch a second time - although I'm struggling to see any obvious cause
for that in the absence of any other visible errors or async probe
races (and assuming that the IOMMUs on this platform ever actually
worked at all, of course...)
However it certainly stands out as a bit wrong that of_xlate is touching
the IOMMU instance itself - that should never have been expected to work
well. Does the diff below help?
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 819add75a665..e57780fc3287 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -360,14 +360,11 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
/* Must be called under msm_iommu_lock */
static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
{
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct msm_iommu_dev *iommu, *ret = NULL;
- struct msm_iommu_ctx_dev *master;
list_for_each_entry(iommu, &qcom_iommu_devices, dev_node) {
- master = list_first_entry(&iommu->ctx_list,
- struct msm_iommu_ctx_dev,
- list);
- if (master->of_node == dev->of_node) {
+ if (iommu->dev->fwnode == fwspec->iommu_fwnode) {
ret = iommu;
break;
}
@@ -378,6 +375,7 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
static struct iommu_device *msm_iommu_probe_device(struct device *dev)
{
+ struct msm_iommu_ctx_dev *master;
struct msm_iommu_dev *iommu;
unsigned long flags;
@@ -388,6 +386,8 @@ static struct iommu_device *msm_iommu_probe_device(struct device *dev)
if (!iommu)
return ERR_PTR(-ENODEV);
+ master = dev_iommu_priv_get(dev);
+ list_add(&master->list, &iommu->ctx_list);
return &iommu->iommu;
}
@@ -604,14 +604,13 @@ static int insert_iommu_master(struct device *dev,
struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
int sid;
- if (list_empty(&(*iommu)->ctx_list)) {
+ if (!master) {
master = kzalloc(sizeof(*master), GFP_ATOMIC);
if (!master) {
dev_err(dev, "Failed to allocate iommu_master\n");
return -ENOMEM;
}
master->of_node = dev->of_node;
- list_add(&master->list, &(*iommu)->ctx_list);
dev_iommu_priv_set(dev, master);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
2026-01-06 18:50 ` Robin Murphy
@ 2026-01-15 19:02 ` Christian Schrefl
2026-01-23 18:02 ` Robin Murphy
0 siblings, 1 reply; 6+ messages in thread
From: Christian Schrefl @ 2026-01-15 19:02 UTC (permalink / raw)
To: Robin Murphy, Jason Gunthorpe
Cc: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
iommu, linux-arm-msm, Rudraksha Gupta
Hi Robin,
On 1/6/26 7:50 PM, Robin Murphy wrote:
> On 2026-01-05 6:02 pm, Jason Gunthorpe wrote:
>> Though this looks really weird:
>>
>> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
>> int sid;
>>
>> if (list_empty(&(*iommu)->ctx_list)) {
>> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>>
>> So maybe master is NULL and !list_empty()?
>
> AFAICS that could happen if of_xlate has run once, and then for whatever
> reason dev->iommu is torn down and the whole process started from
> scratch a second time - although I'm struggling to see any obvious cause
> for that in the absence of any other visible errors or async probe
> races (and assuming that the IOMMUs on this platform ever actually
> worked at all, of course...)
>
> However it certainly stands out as a bit wrong that of_xlate is touching
> the IOMMU instance itself - that should never have been expected to work
> well. Does the diff below help?
>
> Thanks,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 819add75a665..e57780fc3287 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -360,14 +360,11 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
> /* Must be called under msm_iommu_lock */
> static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
> {
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> struct msm_iommu_dev *iommu, *ret = NULL;
> - struct msm_iommu_ctx_dev *master;
>
> list_for_each_entry(iommu, &qcom_iommu_devices, dev_node) {
> - master = list_first_entry(&iommu->ctx_list,
> - struct msm_iommu_ctx_dev,
> - list);
> - if (master->of_node == dev->of_node) {
> + if (iommu->dev->fwnode == fwspec->iommu_fwnode) {
> ret = iommu;
> break;
> }
> @@ -378,6 +375,7 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
>
> static struct iommu_device *msm_iommu_probe_device(struct device *dev)
> {
> + struct msm_iommu_ctx_dev *master;
> struct msm_iommu_dev *iommu;
> unsigned long flags;
>
> @@ -388,6 +386,8 @@ static struct iommu_device *msm_iommu_probe_device(struct device *dev)
> if (!iommu)
> return ERR_PTR(-ENODEV);
>
> + master = dev_iommu_priv_get(dev);
> + list_add(&master->list, &iommu->ctx_list);
> return &iommu->iommu;
> }
>
> @@ -604,14 +604,13 @@ static int insert_iommu_master(struct device *dev,
> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
> int sid;
>
> - if (list_empty(&(*iommu)->ctx_list)) {
> + if (!master) {
> master = kzalloc(sizeof(*master), GFP_ATOMIC);
> if (!master) {
> dev_err(dev, "Failed to allocate iommu_master\n");
> return -ENOMEM;
> }
> master->of_node = dev->of_node;
> - list_add(&master->list, &(*iommu)->ctx_list);
> dev_iommu_priv_set(dev, master);
> }
>
Sorry for taking so long until I got to trying it out.
The patch fixes the crash!
Cheers,
Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
2026-01-15 19:02 ` Christian Schrefl
@ 2026-01-23 18:02 ` Robin Murphy
2026-01-23 18:10 ` Christian Schrefl
0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2026-01-23 18:02 UTC (permalink / raw)
To: Christian Schrefl, Jason Gunthorpe
Cc: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
iommu, linux-arm-msm, Rudraksha Gupta
On 2026-01-15 7:02 pm, Christian Schrefl wrote:
> Hi Robin,
>
> On 1/6/26 7:50 PM, Robin Murphy wrote:
>> On 2026-01-05 6:02 pm, Jason Gunthorpe wrote:
>>> Though this looks really weird:
>>>
>>> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
>>> int sid;
>>>
>>> if (list_empty(&(*iommu)->ctx_list)) {
>>> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>>>
>>> So maybe master is NULL and !list_empty()?
>>
>> AFAICS that could happen if of_xlate has run once, and then for whatever
>> reason dev->iommu is torn down and the whole process started from
>> scratch a second time - although I'm struggling to see any obvious cause
>> for that in the absence of any other visible errors or async probe
>> races (and assuming that the IOMMUs on this platform ever actually
>> worked at all, of course...)
>>
>> However it certainly stands out as a bit wrong that of_xlate is touching
>> the IOMMU instance itself - that should never have been expected to work
>> well. Does the diff below help?
>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>> index 819add75a665..e57780fc3287 100644
>> --- a/drivers/iommu/msm_iommu.c
>> +++ b/drivers/iommu/msm_iommu.c
>> @@ -360,14 +360,11 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>> /* Must be called under msm_iommu_lock */
>> static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
>> {
>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> struct msm_iommu_dev *iommu, *ret = NULL;
>> - struct msm_iommu_ctx_dev *master;
>>
>> list_for_each_entry(iommu, &qcom_iommu_devices, dev_node) {
>> - master = list_first_entry(&iommu->ctx_list,
>> - struct msm_iommu_ctx_dev,
>> - list);
>> - if (master->of_node == dev->of_node) {
>> + if (iommu->dev->fwnode == fwspec->iommu_fwnode) {
>> ret = iommu;
>> break;
>> }
>> @@ -378,6 +375,7 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
>>
>> static struct iommu_device *msm_iommu_probe_device(struct device *dev)
>> {
>> + struct msm_iommu_ctx_dev *master;
>> struct msm_iommu_dev *iommu;
>> unsigned long flags;
>>
>> @@ -388,6 +386,8 @@ static struct iommu_device *msm_iommu_probe_device(struct device *dev)
>> if (!iommu)
>> return ERR_PTR(-ENODEV);
>>
>> + master = dev_iommu_priv_get(dev);
>> + list_add(&master->list, &iommu->ctx_list);
>> return &iommu->iommu;
>> }
>>
>> @@ -604,14 +604,13 @@ static int insert_iommu_master(struct device *dev,
>> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
>> int sid;
>>
>> - if (list_empty(&(*iommu)->ctx_list)) {
>> + if (!master) {
>> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>> if (!master) {
>> dev_err(dev, "Failed to allocate iommu_master\n");
>> return -ENOMEM;
>> }
>> master->of_node = dev->of_node;
>> - list_add(&master->list, &(*iommu)->ctx_list);
>> dev_iommu_priv_set(dev, master);
>> }
>>
>
> Sorry for taking so long until I got to trying it out.
> The patch fixes the crash!
Bah, in the process of writing this up properly I've realised that
although it fixes the crash, I think it breaks the multi-IOMMU
functionality, as .add_device will only be called for the first IOMMU
instance, whereas the current code is cheekily abusing .of_xlate to link
the device to both instances. And in fact that implies the existing code
is yet more buggy, as it seems it will leak and reallocate
dev_iommu_priv the first time it looks up the second instance (whose
ctx_list will be empty), thus it's only working at all because in all
cases the DT happens to list all the IDs for one instance before all for
the other, and both use an identical set of ID values. Oh dear...
Are you using the GPU and/or display device(s) enough to be able to tell
whether their IOMMUs are working as expected?
Thanks,
Robin.
>
> Cheers,
> Christian
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
2026-01-23 18:02 ` Robin Murphy
@ 2026-01-23 18:10 ` Christian Schrefl
0 siblings, 0 replies; 6+ messages in thread
From: Christian Schrefl @ 2026-01-23 18:10 UTC (permalink / raw)
To: Robin Murphy, Jason Gunthorpe
Cc: Konrad Dybcio, Dmitry Baryshkov, Rob Clark, Matti Vaittinen,
iommu, linux-arm-msm, Rudraksha Gupta
Hi Robin
On 1/23/26 7:02 PM, Robin Murphy wrote:
> On 2026-01-15 7:02 pm, Christian Schrefl wrote:
>> Hi Robin,
>>
>> Sorry for taking so long until I got to trying it out.
>> The patch fixes the crash!
>
> Bah, in the process of writing this up properly I've realised that although it fixes the crash, I think it breaks the multi-IOMMU functionality, as .add_device will only be called for the first IOMMU instance, whereas the current code is cheekily abusing .of_xlate to link the device to both instances. And in fact that implies the existing code is yet more buggy, as it seems it will leak and reallocate dev_iommu_priv the first time it looks up the second instance (whose ctx_list will be empty), thus it's only working at all because in all cases the DT happens to list all the IDs for one instance before all for the other, and both use an identical set of ID values. Oh dear...
Thanks for looking into this! I'm not familiar with the IOMMU subsystem so it would probably take me a long
time to figure out anything useful about this.
> Are you using the GPU and/or display device(s) enough to be able to tell whether their IOMMUs are working as expected?
No I currently only have a serial console, though at least the display used to work on earlier versions,
but maybe that's just a configuration issue. Its on my list to investigate when I've got the time for it.
Cheers,
Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-23 18:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-29 22:26 [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064 Christian Schrefl
2026-01-05 18:02 ` Jason Gunthorpe
2026-01-06 18:50 ` Robin Murphy
2026-01-15 19:02 ` Christian Schrefl
2026-01-23 18:02 ` Robin Murphy
2026-01-23 18:10 ` Christian Schrefl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox