* NULL pointer exception in drivers/memory/mtk-smi.c
@ 2025-06-17 15:18 Uwe Kleine-König
2025-06-23 10:00 ` AngeloGioacchino Del Regno
2025-07-03 16:41 ` Uwe Kleine-König
0 siblings, 2 replies; 7+ messages in thread
From: Uwe Kleine-König @ 2025-06-17 15:18 UTC (permalink / raw)
To: Yong Wu
Cc: Krzysztof Kozlowski, Matthias Brugger, AngeloGioacchino Del Regno,
linux-mediatek, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 4634 bytes --]
Hello,
on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit the
following during boot:
[ 6.304796] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 6.305069] platform 1400d000.rdma: Adding to iommu group 0
[ 6.305924] Mem abort info:
[ 6.307867] ESR = 0x0000000096000004
[ 6.309032] EC = 0x25: DABT (current EL), IL = 32 bits
[ 6.309731] SET = 0, FnV = 0
[ 6.310126] EA = 0, S1PTW = 0
[ 6.310189] platform 14016000.rdma: Adding to iommu group 0
[ 6.310532] FSC = 0x04: level 0 translation fault
[ 6.312001] Data abort info:
[ 6.312144] mtk-iommu 10205000.iommu: bound 14003000.larb (ops mtk_smi_larb_component_ops)
[ 6.312381] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 6.313433] mtk-iommu 10205000.iommu: bound 17010000.larb (ops mtk_smi_larb_component_ops)
[ 6.314112] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 6.315156] mtk-iommu 10205000.iommu: bound 15001000.larb (ops mtk_smi_larb_component_ops)
[ 6.315812] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 6.315822] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046fde000
[ 6.315829] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 6.315846] Internal error: Oops: 0000000096000004 [#1] SMP
[ 6.315852] Modules linked in: mediatek_drm(+) drm_dma_helper drm_display_helper drm_client_lib drm_kms_helper mt8365_mt6357
[ 6.316909] mtk-iommu 10205000.iommu: bound 16010000.larb (ops mtk_smi_larb_component_ops)
[ 6.317563] mtk_mmsys mtk_mutex
[ 6.318471] probe of 10205000.iommu returned 0 after 19235 usecs
[ 6.319221] mtk_cmdq_helper snd_soc_mt8365_pcm snd_soc_mtk_common snd_soc_mt6357 pwm_mediatek
[ 6.324638] CPU: 1 UID: 0 PID: 112 Comm: (udev-worker) Not tainted 6.16.0-rc2-00015-g44a5ab7a7958-dirty #27 PREEMPT
[ 6.325964] Hardware name: MediaTek MT8365 Open Platform EVK (DT)
[ 6.326732] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 6.327563] probe of 11201000.usb:connector returned 517 after 6627 usecs
[ 6.327609] pc : mtk_smi_larb_config_port_gen2_general+0x9c/0x1e8
[ 6.329228] lr : mtk_smi_larb_resume+0x90/0xd8
[ 6.329792] sp : ffff80008219b650
[ 6.330210] x29: ffff80008219b660 x28: ffff80008219bc20 x27: ffff000002fd6938
[ 6.331112] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800080f9b8d8
[ 6.332015] x23: ffff000005aa8e90 x22: 0000000000000003 x21: ffff000002fd6810
[ 6.332918] x20: ffff000002fd6810 x19: ffff000005aa8e80 x18: 00000000ffffffff
[ 6.333821] x17: 74706164612d6c76 x16: 6f2d707369642d6b x15: 6574616964656d2f
[ 6.334723] x14: ffff800081b8b0c0 x13: 6c766f2d70736964 x12: 2d6b657461696465
[ 6.335626] x11: 766972643d4d4554 x10: ffff800081c8bc7c x9 : 0000000000000004
[ 6.336528] x8 : ffff80008219b578 x7 : ffff80008219b630 x6 : 0000000000000004
[ 6.337430] x5 : ffff80008219b5b8 x4 : ffff800080f9b8d8 x3 : 0000000000000000
[ 6.338333] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 6.339236] Call trace:
[ 6.339546] mtk_smi_larb_config_port_gen2_general+0x9c/0x1e8 (P)
[ 6.340318] mtk_smi_larb_resume+0x90/0xd8
[ 6.340838] pm_generic_runtime_resume+0x2c/0x44
[ 6.341426] __genpd_runtime_resume+0x30/0x7c
[ 6.341979] genpd_runtime_resume+0xd8/0x288
[ 6.342522] __rpm_callback+0x48/0x1dc
[ 6.343001] rpm_callback+0x74/0x80
[ 6.343445] rpm_resume+0x468/0x63c
[ 6.343889] __pm_runtime_resume+0x4c/0x90
[ 6.344410] pm_runtime_get_suppliers+0x60/0x8c
[ 6.344985] __driver_probe_device+0x48/0x12c
[ 6.345539] driver_probe_device+0xd8/0x15c
[ 6.346070] __driver_attach+0x94/0x19c
[ 6.346558] bus_for_each_dev+0x78/0xd4
[ 6.347046] driver_attach+0x24/0x30
[ 6.347501] bus_add_driver+0xe4/0x208
[ 6.347977] driver_register+0x60/0x128
[ 6.348465] __platform_register_drivers+0x60/0xe8
[ 6.349071] mtk_drm_init+0x24/0x1000 [mediatek_drm]
[ 6.349736] do_one_initcall+0x58/0x268
[ 6.350226] do_init_module+0x58/0x238
[ 6.350705] load_module+0x1db8/0x1e84
[ 6.351181] init_module_from_file+0x84/0xc4
[ 6.351723] __arm64_sys_finit_module+0x144/0x328
[ 6.352321] invoke_syscall.constprop.0+0x50/0xe4
[ 6.352919] do_el0_svc+0x40/0xc4
[ 6.353342] el0_svc+0x48/0x1a0
[ 6.353744] el0t_64_sync_handler+0x10c/0x138
[ 6.354297] el0t_64_sync+0x198/0x19c
[ 6.354765] Code: 39400025 35ffff05 f9404a60 371805a6 (b9400000)
[ 6.355533] ---[ end trace 0000000000000000 ]---
I think this is larb->mmu being NULL in line 277 of
drivers/memory/mtk-smi.c.
Does this ring a bell?
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-06-17 15:18 NULL pointer exception in drivers/memory/mtk-smi.c Uwe Kleine-König
@ 2025-06-23 10:00 ` AngeloGioacchino Del Regno
2025-07-03 16:41 ` Uwe Kleine-König
1 sibling, 0 replies; 7+ messages in thread
From: AngeloGioacchino Del Regno @ 2025-06-23 10:00 UTC (permalink / raw)
To: Uwe Kleine-König, Louis-Alexis Eyraud
Cc: Krzysztof Kozlowski, Matthias Brugger, linux-mediatek,
linux-arm-kernel, Nícolas F. R. A. Prado, Yong Wu
Il 17/06/25 17:18, Uwe Kleine-König ha scritto:
> Hello,
>
> on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit the
> following during boot:
>
> [ 6.304796] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [ 6.305069] platform 1400d000.rdma: Adding to iommu group 0
> [ 6.305924] Mem abort info:
> [ 6.307867] ESR = 0x0000000096000004
> [ 6.309032] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 6.309731] SET = 0, FnV = 0
> [ 6.310126] EA = 0, S1PTW = 0
> [ 6.310189] platform 14016000.rdma: Adding to iommu group 0
> [ 6.310532] FSC = 0x04: level 0 translation fault
> [ 6.312001] Data abort info:
> [ 6.312144] mtk-iommu 10205000.iommu: bound 14003000.larb (ops mtk_smi_larb_component_ops)
> [ 6.312381] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 6.313433] mtk-iommu 10205000.iommu: bound 17010000.larb (ops mtk_smi_larb_component_ops)
> [ 6.314112] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 6.315156] mtk-iommu 10205000.iommu: bound 15001000.larb (ops mtk_smi_larb_component_ops)
> [ 6.315812] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 6.315822] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046fde000
> [ 6.315829] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> [ 6.315846] Internal error: Oops: 0000000096000004 [#1] SMP
> [ 6.315852] Modules linked in: mediatek_drm(+) drm_dma_helper drm_display_helper drm_client_lib drm_kms_helper mt8365_mt6357
> [ 6.316909] mtk-iommu 10205000.iommu: bound 16010000.larb (ops mtk_smi_larb_component_ops)
> [ 6.317563] mtk_mmsys mtk_mutex
> [ 6.318471] probe of 10205000.iommu returned 0 after 19235 usecs
> [ 6.319221] mtk_cmdq_helper snd_soc_mt8365_pcm snd_soc_mtk_common snd_soc_mt6357 pwm_mediatek
> [ 6.324638] CPU: 1 UID: 0 PID: 112 Comm: (udev-worker) Not tainted 6.16.0-rc2-00015-g44a5ab7a7958-dirty #27 PREEMPT
> [ 6.325964] Hardware name: MediaTek MT8365 Open Platform EVK (DT)
> [ 6.326732] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 6.327563] probe of 11201000.usb:connector returned 517 after 6627 usecs
> [ 6.327609] pc : mtk_smi_larb_config_port_gen2_general+0x9c/0x1e8
> [ 6.329228] lr : mtk_smi_larb_resume+0x90/0xd8
> [ 6.329792] sp : ffff80008219b650
> [ 6.330210] x29: ffff80008219b660 x28: ffff80008219bc20 x27: ffff000002fd6938
> [ 6.331112] x26: 0000000000000000 x25: 0000000000000000 x24: ffff800080f9b8d8
> [ 6.332015] x23: ffff000005aa8e90 x22: 0000000000000003 x21: ffff000002fd6810
> [ 6.332918] x20: ffff000002fd6810 x19: ffff000005aa8e80 x18: 00000000ffffffff
> [ 6.333821] x17: 74706164612d6c76 x16: 6f2d707369642d6b x15: 6574616964656d2f
> [ 6.334723] x14: ffff800081b8b0c0 x13: 6c766f2d70736964 x12: 2d6b657461696465
> [ 6.335626] x11: 766972643d4d4554 x10: ffff800081c8bc7c x9 : 0000000000000004
> [ 6.336528] x8 : ffff80008219b578 x7 : ffff80008219b630 x6 : 0000000000000004
> [ 6.337430] x5 : ffff80008219b5b8 x4 : ffff800080f9b8d8 x3 : 0000000000000000
> [ 6.338333] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> [ 6.339236] Call trace:
> [ 6.339546] mtk_smi_larb_config_port_gen2_general+0x9c/0x1e8 (P)
> [ 6.340318] mtk_smi_larb_resume+0x90/0xd8
> [ 6.340838] pm_generic_runtime_resume+0x2c/0x44
> [ 6.341426] __genpd_runtime_resume+0x30/0x7c
> [ 6.341979] genpd_runtime_resume+0xd8/0x288
> [ 6.342522] __rpm_callback+0x48/0x1dc
> [ 6.343001] rpm_callback+0x74/0x80
> [ 6.343445] rpm_resume+0x468/0x63c
> [ 6.343889] __pm_runtime_resume+0x4c/0x90
> [ 6.344410] pm_runtime_get_suppliers+0x60/0x8c
> [ 6.344985] __driver_probe_device+0x48/0x12c
> [ 6.345539] driver_probe_device+0xd8/0x15c
> [ 6.346070] __driver_attach+0x94/0x19c
> [ 6.346558] bus_for_each_dev+0x78/0xd4
> [ 6.347046] driver_attach+0x24/0x30
> [ 6.347501] bus_add_driver+0xe4/0x208
> [ 6.347977] driver_register+0x60/0x128
> [ 6.348465] __platform_register_drivers+0x60/0xe8
> [ 6.349071] mtk_drm_init+0x24/0x1000 [mediatek_drm]
> [ 6.349736] do_one_initcall+0x58/0x268
> [ 6.350226] do_init_module+0x58/0x238
> [ 6.350705] load_module+0x1db8/0x1e84
> [ 6.351181] init_module_from_file+0x84/0xc4
> [ 6.351723] __arm64_sys_finit_module+0x144/0x328
> [ 6.352321] invoke_syscall.constprop.0+0x50/0xe4
> [ 6.352919] do_el0_svc+0x40/0xc4
> [ 6.353342] el0_svc+0x48/0x1a0
> [ 6.353744] el0t_64_sync_handler+0x10c/0x138
> [ 6.354297] el0t_64_sync+0x198/0x19c
> [ 6.354765] Code: 39400025 35ffff05 f9404a60 371805a6 (b9400000)
> [ 6.355533] ---[ end trace 0000000000000000 ]---
>
> I think this is larb->mmu being NULL in line 277 of
> drivers/memory/mtk-smi.c.
>
> Does this ring a bell?
>
No, not really... but I can connect you with a colleague of mine that has a MT8365
EVK that works just fine.
Louis, can you please follow up?
Thanks,
Angelo
> Best regards
> Uwe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-06-17 15:18 NULL pointer exception in drivers/memory/mtk-smi.c Uwe Kleine-König
2025-06-23 10:00 ` AngeloGioacchino Del Regno
@ 2025-07-03 16:41 ` Uwe Kleine-König
2025-07-07 7:24 ` Yong Wu (吴勇)
1 sibling, 1 reply; 7+ messages in thread
From: Uwe Kleine-König @ 2025-07-03 16:41 UTC (permalink / raw)
To: Yong Wu
Cc: Krzysztof Kozlowski, Matthias Brugger, AngeloGioacchino Del Regno,
linux-mediatek, linux-arm-kernel, Louis-Alexis Eyraud, iommu
[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]
Hello,
[expanding the audience a bit according to the drivers that are involved
now that the problem is better understood]
On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
> on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit the
> following during boot:
I invested some time now to understand the issue. So here comes what I
understood, maybe that helps someone to spot the fix to the described
problem.
With a configuration that has all drivers built-in but
CONFIG_DRM_MEDIATEK=m
CONFIG_MTK_IOMMU=m
and
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index cb95fecf6016..d4320db7cd2d 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct platform_device *pdev)
goto out_list_del;
ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
if (ret)
goto out_sysfs_remove;
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
+ msleep(10000);
ret = component_master_add_with_match(dev, &mtk_iommu_com_ops, match);
if (ret)
goto out_device_unregister;
}
return ret;
out_device_unregister:
I can reliably trigger the race.
With that sleep in place iommu_device_register() completes quickly which
enables probing of the devices with drivers contained in the
drm_mediatek module (because the modules are loaded in parallel on
a different CPU).
Then generic driver code calls resume on all suppliers for devices to
bind, among them the four larb devices. However due to
component_master_add_with_match() not being called yet, the larb devices
are not yet bound to the iommu device and so larb->mmu is still NULL.
The latter is a problem in mtk_smi_larb_config_port_gen2_general() which
is called from mtk_smi_larb_resume().
I don't know what the right fix here is, but maybe someone else does?
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-07-03 16:41 ` Uwe Kleine-König
@ 2025-07-07 7:24 ` Yong Wu (吴勇)
2025-07-07 8:47 ` AngeloGioacchino Del Regno
0 siblings, 1 reply; 7+ messages in thread
From: Yong Wu (吴勇) @ 2025-07-07 7:24 UTC (permalink / raw)
To: u.kleine-koenig@baylibre.com, AngeloGioacchino Del Regno
Cc: linux-arm-kernel@lists.infradead.org, matthias.bgg@gmail.com,
linux-mediatek@lists.infradead.org, Louis-Alexis Eyraud,
krzk@kernel.org, iommu@lists.linux.dev
On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
> Hello,
>
> [expanding the audience a bit according to the drivers that are
> involved
> now that the problem is better understood]
>
> On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
> > on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
> > the
> > following during boot:
>
> I invested some time now to understand the issue. So here comes what
> I
> understood, maybe that helps someone to spot the fix to the described
> problem.
>
> With a configuration that has all drivers built-in but
>
> CONFIG_DRM_MEDIATEK=m
> CONFIG_MTK_IOMMU=m
>
> and
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index cb95fecf6016..d4320db7cd2d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
> goto out_list_del;
>
> ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
> if (ret)
> goto out_sysfs_remove;
>
> if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
> + msleep(10000);
> ret = component_master_add_with_match(dev,
> &mtk_iommu_com_ops, match);
> if (ret)
> goto out_device_unregister;
> }
> return ret;
>
> out_device_unregister:
>
> I can reliably trigger the race.
>
> With that sleep in place iommu_device_register() completes quickly
> which
> enables probing of the devices with drivers contained in the
> drm_mediatek module (because the modules are loaded in parallel on
> a different CPU).
>
> Then generic driver code calls resume on all suppliers for devices to
> bind, among them the four larb devices. However due to
> component_master_add_with_match() not being called yet, the larb
> devices
> are not yet bound to the iommu device and so larb->mmu is still NULL.
> The latter is a problem in mtk_smi_larb_config_port_gen2_general()
> which
> is called from mtk_smi_larb_resume().
Hi Uwe,
Thanks for your help. In this case, it looks like the disp probe occurs
before the smi_larb_bind operation, we need to let disp wait for the
bind to complete.
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -666,6 +666,10 @@ static int __maybe_unused
mtk_smi_larb_resume(struct device *dev)
if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
MTK_SMI_FLAG_SLEEP_CTL))
mtk_smi_larb_sleep_ctrl_disable(larb);
+ /* The larb_bind operation may be later than the master probe. */
+ if (!larb->mmu)
+ return -EPROBE_DEFER;
+
/* Configure the basic setting for this larb */
Hi Angelo,
Do you have any suggestion?
Thanks.
>
> I don't know what the right fix here is, but maybe someone else does?
>
> Best regards
> Uwe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-07-07 7:24 ` Yong Wu (吴勇)
@ 2025-07-07 8:47 ` AngeloGioacchino Del Regno
2025-07-07 13:41 ` u.kleine-koenig
0 siblings, 1 reply; 7+ messages in thread
From: AngeloGioacchino Del Regno @ 2025-07-07 8:47 UTC (permalink / raw)
To: Yong Wu (吴勇), u.kleine-koenig@baylibre.com
Cc: linux-arm-kernel@lists.infradead.org, matthias.bgg@gmail.com,
linux-mediatek@lists.infradead.org, Louis-Alexis Eyraud,
krzk@kernel.org, iommu@lists.linux.dev
Il 07/07/25 09:24, Yong Wu (吴勇) ha scritto:
> On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
>> Hello,
>>
>> [expanding the audience a bit according to the drivers that are
>> involved
>> now that the problem is better understood]
>>
>> On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
>>> on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
>>> the
>>> following during boot:
>>
>> I invested some time now to understand the issue. So here comes what
>> I
>> understood, maybe that helps someone to spot the fix to the described
>> problem.
>>
>> With a configuration that has all drivers built-in but
>>
>> CONFIG_DRM_MEDIATEK=m
>> CONFIG_MTK_IOMMU=m
>>
>> and
>>
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index cb95fecf6016..d4320db7cd2d 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
>> platform_device *pdev)
>> goto out_list_del;
>>
>> ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
>> if (ret)
>> goto out_sysfs_remove;
>>
>> if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
>> + msleep(10000);
>> ret = component_master_add_with_match(dev,
>> &mtk_iommu_com_ops, match);
>> if (ret)
>> goto out_device_unregister;
>> }
>> return ret;
>>
>> out_device_unregister:
>>
>> I can reliably trigger the race.
>>
>> With that sleep in place iommu_device_register() completes quickly
>> which
>> enables probing of the devices with drivers contained in the
>> drm_mediatek module (because the modules are loaded in parallel on
>> a different CPU).
>>
>> Then generic driver code calls resume on all suppliers for devices to
>> bind, among them the four larb devices. However due to
>> component_master_add_with_match() not being called yet, the larb
>> devices
>> are not yet bound to the iommu device and so larb->mmu is still NULL.
>> The latter is a problem in mtk_smi_larb_config_port_gen2_general()
>> which
>> is called from mtk_smi_larb_resume().
>
> Hi Uwe,
>
> Thanks for your help. In this case, it looks like the disp probe occurs
> before the smi_larb_bind operation, we need to let disp wait for the
> bind to complete.
>
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -666,6 +666,10 @@ static int __maybe_unused
> mtk_smi_larb_resume(struct device *dev)
> if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
> MTK_SMI_FLAG_SLEEP_CTL))
> mtk_smi_larb_sleep_ctrl_disable(larb);
>
> + /* The larb_bind operation may be later than the master probe. */
> + if (!larb->mmu)
> + return -EPROBE_DEFER;
> +
> /* Configure the basic setting for this larb */
>
>
> Hi Angelo,
> Do you have any suggestion?
>
I don't have the necessary bandwidth to really broadly research about this, but
there's two ideas:
1. What if we disable PM OPS until MMU is bound? That would avoid calls to
larb_resume().....
2. Instead of component, we could register a device - which carries its own suspend
and resume ops; the device would be registered only after all local arbiters
are bound, so there would be no suspend/resume ops until everything is in place.
That's more or less how it's done in drivers/usb/core/port.c
Probably N.1 would work best - we could...
probe:
pm_runtime_set_active(larb->smi.dev);
pm_runtime_forbid(larb->smi.dev);
bind:
larb->mmu = ...
larb->bank = ...
pm_runtime_set_suspended(larb->smi.dev);
pm_runtime_allow(larb->smi.dev);
...the next usage/wakeup should then call resume() but at that point we're sure
that larb->mmu is actually initialized, because we allowed PM resume only after
binding.
I want to repeat - I didn't make any extensive research, just a fast read and
threw some ideas on the table without any testing or anything else :-)
Cheers,
Angelo
>
> Thanks.
>
>>
>> I don't know what the right fix here is, but maybe someone else does?
>>
>> Best regards
>> Uwe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-07-07 8:47 ` AngeloGioacchino Del Regno
@ 2025-07-07 13:41 ` u.kleine-koenig
2025-07-07 14:34 ` AngeloGioacchino Del Regno
0 siblings, 1 reply; 7+ messages in thread
From: u.kleine-koenig @ 2025-07-07 13:41 UTC (permalink / raw)
To: AngeloGioacchino Del Regno
Cc: Yong Wu (吴勇), linux-arm-kernel@lists.infradead.org,
matthias.bgg@gmail.com, linux-mediatek@lists.infradead.org,
Louis-Alexis Eyraud, krzk@kernel.org, iommu@lists.linux.dev
[-- Attachment #1: Type: text/plain, Size: 4641 bytes --]
On Mon, Jul 07, 2025 at 10:47:53AM +0200, AngeloGioacchino Del Regno wrote:
> Il 07/07/25 09:24, Yong Wu (吴勇) ha scritto:
> > On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
> > > Hello,
> > >
> > > [expanding the audience a bit according to the drivers that are
> > > involved
> > > now that the problem is better understood]
> > >
> > > On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
> > > > on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
> > > > the
> > > > following during boot:
> > >
> > > I invested some time now to understand the issue. So here comes what
> > > I
> > > understood, maybe that helps someone to spot the fix to the described
> > > problem.
> > >
> > > With a configuration that has all drivers built-in but
> > >
> > > CONFIG_DRM_MEDIATEK=m
> > > CONFIG_MTK_IOMMU=m
> > >
> > > and
> > >
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index cb95fecf6016..d4320db7cd2d 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
> > > platform_device *pdev)
> > > goto out_list_del;
> > > ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
> > > if (ret)
> > > goto out_sysfs_remove;
> > > if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
> > > + msleep(10000);
> > > ret = component_master_add_with_match(dev,
> > > &mtk_iommu_com_ops, match);
> > > if (ret)
> > > goto out_device_unregister;
> > > }
> > > return ret;
> > > out_device_unregister:
> > >
> > > I can reliably trigger the race.
> > >
> > > With that sleep in place iommu_device_register() completes quickly
> > > which
> > > enables probing of the devices with drivers contained in the
> > > drm_mediatek module (because the modules are loaded in parallel on
> > > a different CPU).
> > >
> > > Then generic driver code calls resume on all suppliers for devices to
> > > bind, among them the four larb devices. However due to
> > > component_master_add_with_match() not being called yet, the larb
> > > devices
> > > are not yet bound to the iommu device and so larb->mmu is still NULL.
> > > The latter is a problem in mtk_smi_larb_config_port_gen2_general()
> > > which
> > > is called from mtk_smi_larb_resume().
> >
> > Hi Uwe,
> >
> > Thanks for your help. In this case, it looks like the disp probe occurs
> > before the smi_larb_bind operation, we need to let disp wait for the
> > bind to complete.
> >
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -666,6 +666,10 @@ static int __maybe_unused
> > mtk_smi_larb_resume(struct device *dev)
> > if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
> > MTK_SMI_FLAG_SLEEP_CTL))
> > mtk_smi_larb_sleep_ctrl_disable(larb);
> >
> > + /* The larb_bind operation may be later than the master probe. */
> > + if (!larb->mmu)
> > + return -EPROBE_DEFER;
> > +
A .resume callback isn't supposed to return -EPROBE_DEFER.
> > /* Configure the basic setting for this larb */
> >
> >
> > Hi Angelo,
> > Do you have any suggestion?
> >
>
> I don't have the necessary bandwidth to really broadly research about this, but
> there's two ideas:
>
> 1. What if we disable PM OPS until MMU is bound? That would avoid calls to
> larb_resume().....
I think the key question here is: Is the iommu functional without the
larbs bound? If not I'd claim that iommu_device_register() should only
happen if all larbs are present and ready.
> 2. Instead of component, we could register a device - which carries its own suspend
> and resume ops; the device would be registered only after all local arbiters
> are bound, so there would be no suspend/resume ops until everything is in place.
> That's more or less how it's done in drivers/usb/core/port.c
>
> Probably N.1 would work best - we could...
>
> probe:
> pm_runtime_set_active(larb->smi.dev);
> pm_runtime_forbid(larb->smi.dev);
>
> bind:
> larb->mmu = ...
> larb->bank = ...
>
> pm_runtime_set_suspended(larb->smi.dev);
> pm_runtime_allow(larb->smi.dev);
>
>
> ...the next usage/wakeup should then call resume() but at that point we're sure
> that larb->mmu is actually initialized, because we allowed PM resume only after
> binding.
Not knowing the details of the hardware that feels wrong. IMHO a device
that is operational should be able to do PM ops.
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NULL pointer exception in drivers/memory/mtk-smi.c
2025-07-07 13:41 ` u.kleine-koenig
@ 2025-07-07 14:34 ` AngeloGioacchino Del Regno
0 siblings, 0 replies; 7+ messages in thread
From: AngeloGioacchino Del Regno @ 2025-07-07 14:34 UTC (permalink / raw)
To: u.kleine-koenig@baylibre.com
Cc: Yong Wu (吴勇), linux-arm-kernel@lists.infradead.org,
matthias.bgg@gmail.com, linux-mediatek@lists.infradead.org,
Louis-Alexis Eyraud, krzk@kernel.org, iommu@lists.linux.dev
Il 07/07/25 15:41, u.kleine-koenig@baylibre.com ha scritto:
> On Mon, Jul 07, 2025 at 10:47:53AM +0200, AngeloGioacchino Del Regno wrote:
>> Il 07/07/25 09:24, Yong Wu (吴勇) ha scritto:
>>> On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
>>>> Hello,
>>>>
>>>> [expanding the audience a bit according to the drivers that are
>>>> involved
>>>> now that the problem is better understood]
>>>>
>>>> On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
>>>>> on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
>>>>> the
>>>>> following during boot:
>>>>
>>>> I invested some time now to understand the issue. So here comes what
>>>> I
>>>> understood, maybe that helps someone to spot the fix to the described
>>>> problem.
>>>>
>>>> With a configuration that has all drivers built-in but
>>>>
>>>> CONFIG_DRM_MEDIATEK=m
>>>> CONFIG_MTK_IOMMU=m
>>>>
>>>> and
>>>>
>>>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>>>> index cb95fecf6016..d4320db7cd2d 100644
>>>> --- a/drivers/iommu/mtk_iommu.c
>>>> +++ b/drivers/iommu/mtk_iommu.c
>>>> @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
>>>> platform_device *pdev)
>>>> goto out_list_del;
>>>> ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
>>>> if (ret)
>>>> goto out_sysfs_remove;
>>>> if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
>>>> + msleep(10000);
>>>> ret = component_master_add_with_match(dev,
>>>> &mtk_iommu_com_ops, match);
>>>> if (ret)
>>>> goto out_device_unregister;
>>>> }
>>>> return ret;
>>>> out_device_unregister:
>>>>
>>>> I can reliably trigger the race.
>>>>
>>>> With that sleep in place iommu_device_register() completes quickly
>>>> which
>>>> enables probing of the devices with drivers contained in the
>>>> drm_mediatek module (because the modules are loaded in parallel on
>>>> a different CPU).
>>>>
>>>> Then generic driver code calls resume on all suppliers for devices to
>>>> bind, among them the four larb devices. However due to
>>>> component_master_add_with_match() not being called yet, the larb
>>>> devices
>>>> are not yet bound to the iommu device and so larb->mmu is still NULL.
>>>> The latter is a problem in mtk_smi_larb_config_port_gen2_general()
>>>> which
>>>> is called from mtk_smi_larb_resume().
>>>
>>> Hi Uwe,
>>>
>>> Thanks for your help. In this case, it looks like the disp probe occurs
>>> before the smi_larb_bind operation, we need to let disp wait for the
>>> bind to complete.
>>>
>>> --- a/drivers/memory/mtk-smi.c
>>> +++ b/drivers/memory/mtk-smi.c
>>> @@ -666,6 +666,10 @@ static int __maybe_unused
>>> mtk_smi_larb_resume(struct device *dev)
>>> if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
>>> MTK_SMI_FLAG_SLEEP_CTL))
>>> mtk_smi_larb_sleep_ctrl_disable(larb);
>>>
>>> + /* The larb_bind operation may be later than the master probe. */
>>> + if (!larb->mmu)
>>> + return -EPROBE_DEFER;
>>> +
>
> A .resume callback isn't supposed to return -EPROBE_DEFER.
>
Obviously :-)
>>> /* Configure the basic setting for this larb */
>>>
>>>
>>> Hi Angelo,
>>> Do you have any suggestion?
>>>
>>
>> I don't have the necessary bandwidth to really broadly research about this, but
>> there's two ideas:
>>
>> 1. What if we disable PM OPS until MMU is bound? That would avoid calls to
>> larb_resume().....
>
> I think the key question here is: Is the iommu functional without the
> larbs bound? If not I'd claim that iommu_device_register() should only
> happen if all larbs are present and ready.
>
Not all IOMMUs need Local Arbiters... some of them are very specific and do not
need to be arbitered, mostly because they're not shared.
Some.. like the infra contexts for the PCI-Express controller... because then
generally all of the multimedia related contexts should need the arbitering
because they should be all shared between the multimedia IPs (but it may also
be possible to use some without arbitering, depending on the application, though
upstream I think there's no multimedia context with no arbiter - I didn't really
carefully check in devicetrees anyway).
>> 2. Instead of component, we could register a device - which carries its own suspend
>> and resume ops; the device would be registered only after all local arbiters
>> are bound, so there would be no suspend/resume ops until everything is in place.
>> That's more or less how it's done in drivers/usb/core/port.c
>>
>> Probably N.1 would work best - we could...
>>
>> probe:
>> pm_runtime_set_active(larb->smi.dev);
>> pm_runtime_forbid(larb->smi.dev);
>>
>> bind:
>> larb->mmu = ...
>> larb->bank = ...
>>
>> pm_runtime_set_suspended(larb->smi.dev);
>> pm_runtime_allow(larb->smi.dev);
>>
>>
>> ...the next usage/wakeup should then call resume() but at that point we're sure
>> that larb->mmu is actually initialized, because we allowed PM resume only after
>> binding.
>
> Not knowing the details of the hardware that feels wrong. IMHO a device
> that is operational should be able to do PM ops.
>
That's true and I completely agree with you, in general - though, this is a "Local
Arbiter" and... if there's nothing to arbiter, it's not functional.
This is arbitering iommu-bus/memory(dma)-bus access to provide some kind of basic
bus QoS: of course, if there's no iommu, there can't be any access, so again there
would be nothing to perform bandwidth arbitration with.
Count that we will (hopefully...) (and hopefully soon...) see some interconnect
driver that takes the defaults in SMI (in a way or another) and changes them to
provide a finer grain QoS (because right now it's static, there's no usage analysis
in place), so the amount of users of mtk-smi should also get higher.. maybe.
Cheers,
Angelo
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-07 14:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 15:18 NULL pointer exception in drivers/memory/mtk-smi.c Uwe Kleine-König
2025-06-23 10:00 ` AngeloGioacchino Del Regno
2025-07-03 16:41 ` Uwe Kleine-König
2025-07-07 7:24 ` Yong Wu (吴勇)
2025-07-07 8:47 ` AngeloGioacchino Del Regno
2025-07-07 13:41 ` u.kleine-koenig
2025-07-07 14:34 ` AngeloGioacchino Del Regno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox