dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Help: Samsung Exynos 7870 DECON SYSMMU panic
@ 2025-06-18 14:02 Kaustabh Chakraborty
  2025-06-18 14:06 ` Kaustabh Chakraborty
  2025-06-24 17:12 ` Robin Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Kaustabh Chakraborty @ 2025-06-18 14:02 UTC (permalink / raw)
  To: Marek Szyprowski, Joerg Roedel, Will Deacon, Robin Murphy,
	Inki Dae, Seung-Woo Kim, Kyungmin Park
  Cc: iommu, dri-devel

Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe path),
The Samsung Exynos 7870 DECON device (with patches [1], [2], and [3]) seems
to not work anymore. Upon closer inspection, I observe that there is an
IOMMU crash.

[    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] PAGE FAULT occurred at 0x6715b3e0
[    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 0x0000000044a14000
[    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
[    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
[    2.922877] Kernel panic - not syncing: Unrecoverable System MMU Fault!
[    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc2-exynos7870 #722 PREEMPT 
[    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
[    3.000509] Call trace:
[    3.002938]  show_stack+0x18/0x24 (C)
[    3.006582]  dump_stack_lvl+0x60/0x80
[    3.010224]  dump_stack+0x18/0x24
[    3.013521]  panic+0x168/0x360
[    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
[         ...]
[    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable System MMU Fault! ]---

The commit has been introduced in mainline v6.15-rc1. I've also tested in
v6.15, v6.15.2, and v6.16-rc2, and there have been no apparent changes.

I've tried to revert the commit, and it does work that way. But on reading
the commit message I understand that I need to find a proper solution here.
I've tried to skim down the revert, and this is what I get - if i change
line [4] as follows:

-		dev->bus->dma_configure(dev);
+		if (!strcmp(dev_name(dev), "14830000.decon"))
+			dev->bus->dma_configure(dev);

It really doesn't like dma_configure for some reason. It works, but:

[    2.779291] exynos-decon 14830000.decon: late IOMMU probe at driver bind, something fishy here!

I believe the IOMMU hardware doesn't like the fact that it is being
initialized/resumed too early. I formed this conclusion by comparing the
logs:

mainline:
[    1.274575] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
[    2.859656] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
[    2.864192] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
[    2.869299] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000007
[    2.875062] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
[    2.880006] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 0x01100784
[    2.885819] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
[    2.890677] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 0x00044a14
[    2.896490] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate enter
[    2.901695] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 0x00000001
[    2.907507] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
[    2.912366] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000005
[    2.912371] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_irq enter
[    2.912836] [drm] Exynos DRM: using 14830000.decon device for DMA mapping operations
[    2.918175] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_v5_get_fault_info enter
[    2.918182] I_HAVE_ADDED_THESE: SYSMMU: show_fault_information enter
[    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] PAGE FAULT occurred at 0x6715b3e0
[    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 0x0000000044a14000
[    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
[    2.922859] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
[    2.922864] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
[    2.922877] Kernel panic - not syncing: Unrecoverable System MMU Fault!

and with the patch above applied:
[    3.018478] [drm] Exynos DRM: using 14830000.decon device for DMA mapping operations
[    3.025794] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
[    3.058655] exynos-dsi 14800000.dsi: [drm:samsung_dsim_host_attach] Attached td4300-panel device (lanes:4 bpp:24 mode-flags:0x23)
[    3.070189] exynos-drm exynos-drm: bound 14800000.dsi (ops exynos_dsi_component_ops)
[    3.078506] [drm] Initialized exynos 1.1.0 for exynos-drm on minor 1
[    3.090747] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
[    3.090845] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
[    3.093325] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
[    3.097662] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    3.102000] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    3.106337] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
[    3.110762] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
[    3.115726] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
[    3.120317] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
[    3.124914] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
[    3.129240] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    3.133578] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    3.137916] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
[    3.142340] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
[         ...] (a lot of repetitions later)
[    4.322904] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
[    4.327230] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    4.331567] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    4.335905] I_HAVE_ADDED_THESE: SYSMMU: alloc_lv2entry enter
[    4.340329] I_HAVE_ADDED_THESE: SYSMMU: page_entry enter
[    4.344407] I_HAVE_ADDED_THESE: SYSMMU: lv2ent_offset enter
[    4.348744] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
[    4.353082] I_HAVE_ADDED_THESE: SYSMMU: lv2set_page enter
[    4.357246] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
[    4.362751] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
[    4.362767] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
[    4.362771] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
[    4.362777] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000007
[    4.362782] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
[    4.362786] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 0x01100784
[    4.362791] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
[    4.362795] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 0x00042a64
[    4.362799] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate enter
[    4.362803] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 0x00000001
[    4.362808] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
[    4.362811] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000005

Then it continues booting as usual.

My lack of understanding of the IOMMU and DRM subsystems are really
limiting my triaging capabilities here, therefore I ask for any form
guidance or assistance with this.

Thank you.

[1] https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroot.org
[2] https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-2-88c0779af6cb@disroot.org
[3] https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-3-88c0779af6cb@disroot.org
[4] https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/iommu/iommu.c#L431

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-18 14:02 Help: Samsung Exynos 7870 DECON SYSMMU panic Kaustabh Chakraborty
@ 2025-06-18 14:06 ` Kaustabh Chakraborty
  2025-06-24 17:12 ` Robin Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Kaustabh Chakraborty @ 2025-06-18 14:06 UTC (permalink / raw)
  To: Marek Szyprowski, Joerg Roedel, Will Deacon, Robin Murphy,
	Inki Dae, Seung-Woo Kim, Kyungmin Park
  Cc: iommu, dri-devel

On 2025-06-18 14:02, Kaustabh Chakraborty wrote:
> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe 
> path),
> The Samsung Exynos 7870 DECON device (with patches [1], [2], and [3]) 
> seems
> to not work anymore. Upon closer inspection, I observe that there is an
> IOMMU crash.
> 
> [...]
> 
> -		dev->bus->dma_configure(dev);
> +		if (!strcmp(dev_name(dev), "14830000.decon"))
s/!//

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-18 14:02 Help: Samsung Exynos 7870 DECON SYSMMU panic Kaustabh Chakraborty
  2025-06-18 14:06 ` Kaustabh Chakraborty
@ 2025-06-24 17:12 ` Robin Murphy
  2025-06-25  7:39   ` Kaustabh Chakraborty
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-06-24 17:12 UTC (permalink / raw)
  To: Kaustabh Chakraborty, Marek Szyprowski, Joerg Roedel, Will Deacon,
	Inki Dae, Seung-Woo Kim, Kyungmin Park
  Cc: iommu, dri-devel

On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe path),
> The Samsung Exynos 7870 DECON device (with patches [1], [2], and [3]) seems
> to not work anymore. Upon closer inspection, I observe that there is an
> IOMMU crash.
> 
> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] PAGE FAULT occurred at 0x6715b3e0
> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 0x0000000044a14000
> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU Fault!
> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc2-exynos7870 #722 PREEMPT
> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
> [    3.000509] Call trace:
> [    3.002938]  show_stack+0x18/0x24 (C)
> [    3.006582]  dump_stack_lvl+0x60/0x80
> [    3.010224]  dump_stack+0x18/0x24
> [    3.013521]  panic+0x168/0x360
> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
> [         ...]
> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable System MMU Fault! ]---

For starters, what if you just remove this panic() from the IOMMU 
driver? Frankly it seems a bit excessive anyway...

 From the logs below it seems there is apparently unexpected traffic 
already going through the IOMMU when it wakes up. Is this the DRM 
drivers doing something sketchy, or has the bootloader left the display 
running for a splash screen? However in the latter case I don't 
obviously see why delaying the IOMMU probe should make much difference, 
given that the decon driver should still be waiting for it either way.

Thanks,
Robin.

> The commit has been introduced in mainline v6.15-rc1. I've also tested in
> v6.15, v6.15.2, and v6.16-rc2, and there have been no apparent changes.
> 
> I've tried to revert the commit, and it does work that way. But on reading
> the commit message I understand that I need to find a proper solution here.
> I've tried to skim down the revert, and this is what I get - if i change
> line [4] as follows:
> 
> -		dev->bus->dma_configure(dev);
> +		if (!strcmp(dev_name(dev), "14830000.decon"))
> +			dev->bus->dma_configure(dev);
> 
> It really doesn't like dma_configure for some reason. It works, but:
> 
> [    2.779291] exynos-decon 14830000.decon: late IOMMU probe at driver bind, something fishy here!
> 
> I believe the IOMMU hardware doesn't like the fact that it is being
> initialized/resumed too early. I formed this conclusion by comparing the
> logs:
> 
> mainline:
> [    1.274575] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
> [    2.859656] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
> [    2.864192] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
> [    2.869299] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000007
> [    2.875062] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
> [    2.880006] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 0x01100784
> [    2.885819] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
> [    2.890677] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 0x00044a14
> [    2.896490] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate enter
> [    2.901695] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 0x00000001
> [    2.907507] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
> [    2.912366] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000005
> [    2.912371] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_irq enter
> [    2.912836] [drm] Exynos DRM: using 14830000.decon device for DMA mapping operations
> [    2.918175] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_v5_get_fault_info enter
> [    2.918182] I_HAVE_ADDED_THESE: SYSMMU: show_fault_information enter
> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] PAGE FAULT occurred at 0x6715b3e0
> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 0x0000000044a14000
> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
> [    2.922859] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
> [    2.922864] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU Fault!
> 
> and with the patch above applied:
> [    3.018478] [drm] Exynos DRM: using 14830000.decon device for DMA mapping operations
> [    3.025794] exynos-drm exynos-drm: bound 14830000.decon (ops decon_component_ops)
> [    3.058655] exynos-dsi 14800000.dsi: [drm:samsung_dsim_host_attach] Attached td4300-panel device (lanes:4 bpp:24 mode-flags:0x23)
> [    3.070189] exynos-drm exynos-drm: bound 14800000.dsi (ops exynos_dsi_component_ops)
> [    3.078506] [drm] Initialized exynos 1.1.0 for exynos-drm on minor 1
> [    3.090747] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
> [    3.090845] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
> [    3.093325] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
> [    3.097662] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    3.102000] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    3.106337] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
> [    3.110762] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
> [    3.115726] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
> [    3.120317] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
> [    3.124914] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
> [    3.129240] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    3.133578] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    3.137916] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
> [    3.142340] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
> [         ...] (a lot of repetitions later)
> [    4.322904] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
> [    4.327230] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    4.331567] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    4.335905] I_HAVE_ADDED_THESE: SYSMMU: alloc_lv2entry enter
> [    4.340329] I_HAVE_ADDED_THESE: SYSMMU: page_entry enter
> [    4.344407] I_HAVE_ADDED_THESE: SYSMMU: lv2ent_offset enter
> [    4.348744] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
> [    4.353082] I_HAVE_ADDED_THESE: SYSMMU: lv2set_page enter
> [    4.357246] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
> [    4.362751] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
> [    4.362767] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
> [    4.362771] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
> [    4.362777] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000007
> [    4.362782] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
> [    4.362786] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 0x01100784
> [    4.362791] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
> [    4.362795] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 0x00042a64
> [    4.362799] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate enter
> [    4.362803] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 0x00000001
> [    4.362808] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
> [    4.362811] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 0x00000005
> 
> Then it continues booting as usual.
> 
> My lack of understanding of the IOMMU and DRM subsystems are really
> limiting my triaging capabilities here, therefore I ask for any form
> guidance or assistance with this.
> 
> Thank you.
> 
> [1] https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroot.org
> [2] https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-2-88c0779af6cb@disroot.org
> [3] https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-3-88c0779af6cb@disroot.org
> [4] https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/iommu/iommu.c#L431


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-24 17:12 ` Robin Murphy
@ 2025-06-25  7:39   ` Kaustabh Chakraborty
  2025-06-25  8:42     ` Marek Szyprowski
  0 siblings, 1 reply; 9+ messages in thread
From: Kaustabh Chakraborty @ 2025-06-25  7:39 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Marek Szyprowski, Joerg Roedel, Will Deacon, Inki Dae,
	Seung-Woo Kim, Kyungmin Park, iommu, dri-devel

On 2025-06-24 17:12, Robin Murphy wrote:
> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe 
>> path),
>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and [3]) 
>> seems
>> to not work anymore. Upon closer inspection, I observe that there is 
>> an
>> IOMMU crash.
>> 
>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] 
>> PAGE FAULT occurred at 0x6715b3e0
>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 
>> 0x0000000044a14000
>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops 
>> decon_component_ops)
>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU 
>> Fault!
>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>> [    3.000509] Call trace:
>> [    3.002938]  show_stack+0x18/0x24 (C)
>> [    3.006582]  dump_stack_lvl+0x60/0x80
>> [    3.010224]  dump_stack+0x18/0x24
>> [    3.013521]  panic+0x168/0x360
>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>> [         ...]
>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable 
>> System MMU Fault! ]---
> 
> For starters, what if you just remove this panic() from the IOMMU 
> driver? Frankly it seems a bit excessive anyway...

I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, even
after clearing the interrupt bit) indefinitely.

> 
> From the logs below it seems there is apparently unexpected traffic 
> already going through the IOMMU when it wakes up. Is this the DRM 
> drivers doing something sketchy, or has the bootloader left the display 
> running for a splash screen? However in the latter case I don't 
> obviously see why delaying the IOMMU probe should make much difference, 
> given that the decon driver should still be waiting for it either way.

The display is initialized by the bootloader for splash yes, but I 
reckon
it doesn't use the IOMMU as it's accessible from a framebuffer region.

> 
> Thanks,
> Robin.
> 
>> The commit has been introduced in mainline v6.15-rc1. I've also tested 
>> in
>> v6.15, v6.15.2, and v6.16-rc2, and there have been no apparent 
>> changes.
>> 
>> I've tried to revert the commit, and it does work that way. But on 
>> reading
>> the commit message I understand that I need to find a proper solution 
>> here.
>> I've tried to skim down the revert, and this is what I get - if i 
>> change
>> line [4] as follows:
>> 
>> -		dev->bus->dma_configure(dev);
>> +		if (!strcmp(dev_name(dev), "14830000.decon"))
>> +			dev->bus->dma_configure(dev);
>> 
>> It really doesn't like dma_configure for some reason. It works, but:
>> 
>> [    2.779291] exynos-decon 14830000.decon: late IOMMU probe at driver 
>> bind, something fishy here!
>> 
>> I believe the IOMMU hardware doesn't like the fact that it is being
>> initialized/resumed too early. I formed this conclusion by comparing 
>> the
>> logs:
>> 
>> mainline:
>> [    1.274575] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
>> [    2.859656] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>> [    2.864192] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks 
>> enter
>> [    2.869299] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>> 0x00000007
>> [    2.875062] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
>> [    2.880006] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 
>> 0x01100784
>> [    2.885819] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>> [    2.890677] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 
>> 0x00044a14
>> [    2.896490] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate 
>> enter
>> [    2.901695] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 
>> 0x00000001
>> [    2.907507] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>> [    2.912366] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>> 0x00000005
>> [    2.912371] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_irq enter
>> [    2.912836] [drm] Exynos DRM: using 14830000.decon device for DMA 
>> mapping operations
>> [    2.918175] I_HAVE_ADDED_THESE: SYSMMU: 
>> exynos_sysmmu_v5_get_fault_info enter
>> [    2.918182] I_HAVE_ADDED_THESE: SYSMMU: show_fault_information 
>> enter
>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] 
>> PAGE FAULT occurred at 0x6715b3e0
>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 
>> 0x0000000044a14000
>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops 
>> decon_component_ops)
>> [    2.922859] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>> [    2.922864] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU 
>> Fault!
>> 
>> and with the patch above applied:
>> [    3.018478] [drm] Exynos DRM: using 14830000.decon device for DMA 
>> mapping operations
>> [    3.025794] exynos-drm exynos-drm: bound 14830000.decon (ops 
>> decon_component_ops)
>> [    3.058655] exynos-dsi 14800000.dsi: [drm:samsung_dsim_host_attach] 
>> Attached td4300-panel device (lanes:4 bpp:24 mode-flags:0x23)
>> [    3.070189] exynos-drm exynos-drm: bound 14800000.dsi (ops 
>> exynos_dsi_component_ops)
>> [    3.078506] [drm] Initialized exynos 1.1.0 for exynos-drm on minor 
>> 1
>> [    3.090747] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>> [    3.090845] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>> [    3.093325] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>> [    3.097662] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    3.102000] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    3.106337] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>> [    3.110762] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>> [    3.115726] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>> [    3.120317] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>> [    3.124914] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>> [    3.129240] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    3.133578] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    3.137916] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>> [    3.142340] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>> [         ...] (a lot of repetitions later)
>> [    4.322904] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>> [    4.327230] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    4.331567] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    4.335905] I_HAVE_ADDED_THESE: SYSMMU: alloc_lv2entry enter
>> [    4.340329] I_HAVE_ADDED_THESE: SYSMMU: page_entry enter
>> [    4.344407] I_HAVE_ADDED_THESE: SYSMMU: lv2ent_offset enter
>> [    4.348744] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>> [    4.353082] I_HAVE_ADDED_THESE: SYSMMU: lv2set_page enter
>> [    4.357246] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>> [    4.362751] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
>> [    4.362767] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>> [    4.362771] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks 
>> enter
>> [    4.362777] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>> 0x00000007
>> [    4.362782] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
>> [    4.362786] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 
>> 0x01100784
>> [    4.362791] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>> [    4.362795] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 
>> 0x00042a64
>> [    4.362799] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate 
>> enter
>> [    4.362803] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 
>> 0x00000001
>> [    4.362808] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>> [    4.362811] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>> 0x00000005
>> 
>> Then it continues booting as usual.
>> 
>> My lack of understanding of the IOMMU and DRM subsystems are really
>> limiting my triaging capabilities here, therefore I ask for any form
>> guidance or assistance with this.
>> 
>> Thank you.
>> 
>> [1] 
>> https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroot.org
>> [2] 
>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-2-88c0779af6cb@disroot.org
>> [3] 
>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-3-88c0779af6cb@disroot.org
>> [4] 
>> https://elixir.bootlin.com/linux/v6.16-rc2/source/drivers/iommu/iommu.c#L431

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-25  7:39   ` Kaustabh Chakraborty
@ 2025-06-25  8:42     ` Marek Szyprowski
  2025-06-25 10:12       ` Kaustabh Chakraborty
  2025-06-25 11:34       ` Robin Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Marek Szyprowski @ 2025-06-25  8:42 UTC (permalink / raw)
  To: Kaustabh Chakraborty, Robin Murphy
  Cc: Joerg Roedel, Will Deacon, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	iommu, dri-devel

On 25.06.2025 09:39, Kaustabh Chakraborty wrote:
> On 2025-06-24 17:12, Robin Murphy wrote:
>> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe 
>>> path),
>>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and 
>>> [3]) seems
>>> to not work anymore. Upon closer inspection, I observe that there is an
>>> IOMMU crash.
>>>
>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] 
>>> PAGE FAULT occurred at 0x6715b3e0
>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 
>>> 0x0000000044a14000
>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops 
>>> decon_component_ops)
>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU 
>>> Fault!
>>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 
>>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>>> [    3.000509] Call trace:
>>> [    3.002938]  show_stack+0x18/0x24 (C)
>>> [    3.006582]  dump_stack_lvl+0x60/0x80
>>> [    3.010224]  dump_stack+0x18/0x24
>>> [    3.013521]  panic+0x168/0x360
>>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>>> [         ...]
>>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable 
>>> System MMU Fault! ]---
>>
>> For starters, what if you just remove this panic() from the IOMMU 
>> driver? Frankly it seems a bit excessive anyway...
>
> I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, even
> after clearing the interrupt bit) indefinitely.
>
Right, this is because decon device is still accessing system memory in 
a loop trying to display the splash screen. That panic is indeed a bit 
excessive, but what else IOMMU driver can do if no page fault handle is 
registered?


>>
>> From the logs below it seems there is apparently unexpected traffic 
>> already going through the IOMMU when it wakes up. Is this the DRM 
>> drivers doing something sketchy, or has the bootloader left the 
>> display running for a splash screen? However in the latter case I 
>> don't obviously see why delaying the IOMMU probe should make much 
>> difference, given that the decon driver should still be waiting for 
>> it either way.
>
> The display is initialized by the bootloader for splash yes, but I reckon
> it doesn't use the IOMMU as it's accessible from a framebuffer region.

Right, bootloader configured decon device to display splash screen, what 
means that decon device is constantly reading splash screen pixel data 
from system memory. There is no such thing as a 'framebuffer region', it 
is just a system memory, which exynos sysmmu protects when enabled. So 
far this issue of splash screen from bootloader has not yet been solved 
in mainline. On other Exynos based supported boards this works only 
because there are also power domain drivers enabled, which are 
instantiated before the display related device and respective sysmmu 
device. That power domain driver shuts down power effectively disabling 
the display before the sysmmu gets probbed.

Long time ago I've pointed this issue and proposed some simple solution 
like a special initial identity mapping for the memory range used for 
splash screen, but that proposal is no longer applicable for the current 
code.

As a workaround I would suggest to shutdown display in the decon device 
before starting the kernel (i.e. from the 'kernel loading mid-stage 
bootloader' if you have such).


>
>>
>> Thanks,
>> Robin.
>>
>>> The commit has been introduced in mainline v6.15-rc1. I've also 
>>> tested in
>>> v6.15, v6.15.2, and v6.16-rc2, and there have been no apparent changes.
>>>
>>> I've tried to revert the commit, and it does work that way. But on 
>>> reading
>>> the commit message I understand that I need to find a proper 
>>> solution here.
>>> I've tried to skim down the revert, and this is what I get - if i 
>>> change
>>> line [4] as follows:
>>>
>>> -        dev->bus->dma_configure(dev);
>>> +        if (!strcmp(dev_name(dev), "14830000.decon"))
>>> +            dev->bus->dma_configure(dev);
>>>
>>> It really doesn't like dma_configure for some reason. It works, but:
>>>
>>> [    2.779291] exynos-decon 14830000.decon: late IOMMU probe at 
>>> driver bind, something fishy here!
>>>
>>> I believe the IOMMU hardware doesn't like the fact that it is being
>>> initialized/resumed too early. I formed this conclusion by comparing 
>>> the
>>> logs:
>>>
>>> mainline:
>>> [    1.274575] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
>>> [    2.859656] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>>> [    2.864192] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
>>> [    2.869299] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>>> 0x00000007
>>> [    2.875062] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
>>> [    2.880006] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 
>>> 0x01100784
>>> [    2.885819] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>>> [    2.890677] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 
>>> 0x00044a14
>>> [    2.896490] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate 
>>> enter
>>> [    2.901695] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 
>>> 0x00000001
>>> [    2.907507] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>>> [    2.912366] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>>> 0x00000005
>>> [    2.912371] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_irq enter
>>> [    2.912836] [drm] Exynos DRM: using 14830000.decon device for DMA 
>>> mapping operations
>>> [    2.918175] I_HAVE_ADDED_THESE: SYSMMU: 
>>> exynos_sysmmu_v5_get_fault_info enter
>>> [    2.918182] I_HAVE_ADDED_THESE: SYSMMU: show_fault_information enter
>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ] 
>>> PAGE FAULT occurred at 0x6715b3e0
>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base: 
>>> 0x0000000044a14000
>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops 
>>> decon_component_ops)
>>> [    2.922859] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>> [    2.922864] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU 
>>> Fault!
>>>
>>> and with the patch above applied:
>>> [    3.018478] [drm] Exynos DRM: using 14830000.decon device for DMA 
>>> mapping operations
>>> [    3.025794] exynos-drm exynos-drm: bound 14830000.decon (ops 
>>> decon_component_ops)
>>> [    3.058655] exynos-dsi 14800000.dsi: 
>>> [drm:samsung_dsim_host_attach] Attached td4300-panel device (lanes:4 
>>> bpp:24 mode-flags:0x23)
>>> [    3.070189] exynos-drm exynos-drm: bound 14800000.dsi (ops 
>>> exynos_dsi_component_ops)
>>> [    3.078506] [drm] Initialized exynos 1.1.0 for exynos-drm on minor 1
>>> [    3.090747] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>>> [    3.090845] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>>> [    3.093325] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>> [    3.097662] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    3.102000] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    3.106337] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>>> [    3.110762] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>>> [    3.115726] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>>> [    3.120317] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>>> [    3.124914] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>> [    3.129240] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    3.133578] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    3.137916] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>>> [    3.142340] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>>> [         ...] (a lot of repetitions later)
>>> [    4.322904] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>> [    4.327230] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    4.331567] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    4.335905] I_HAVE_ADDED_THESE: SYSMMU: alloc_lv2entry enter
>>> [    4.340329] I_HAVE_ADDED_THESE: SYSMMU: page_entry enter
>>> [    4.344407] I_HAVE_ADDED_THESE: SYSMMU: lv2ent_offset enter
>>> [    4.348744] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>> [    4.353082] I_HAVE_ADDED_THESE: SYSMMU: lv2set_page enter
>>> [    4.357246] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte enter
>>> [    4.362751] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume enter
>>> [    4.362767] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>>> [    4.362771] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks enter
>>> [    4.362777] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>>> 0x00000007
>>> [    4.362782] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config enter
>>> [    4.362786] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val 
>>> 0x01100784
>>> [    4.362791] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>>> [    4.362795] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val 
>>> 0x00042a64
>>> [    4.362799] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate 
>>> enter
>>> [    4.362803] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val 
>>> 0x00000001
>>> [    4.362808] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>>> [    4.362811] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val 
>>> 0x00000005
>>>
>>> Then it continues booting as usual.
>>>
>>> My lack of understanding of the IOMMU and DRM subsystems are really
>>> limiting my triaging capabilities here, therefore I ask for any form
>>> guidance or assistance with this.
>>>
>>> Thank you.
>>>
>>> [1] 
>>> https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroot.org
>>> [2] 
>>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-2-88c0779af6cb@disroot.org
>>> [3] 
>>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-3-88c0779af6cb@disroot.org
>>> [4] 
>>> https://protect2.fireeye.com/v1/url?k=015e6282-60d577b4-015fe9cd-74fe485cbff1-5039f5dfc286ac1d&q=1&e=4899b4fe-bcb8-43f2-87ed-92e79b0abe08&u=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv6.16-rc2%2Fsource%2Fdrivers%2Fiommu%2Fiommu.c%23L431
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-25  8:42     ` Marek Szyprowski
@ 2025-06-25 10:12       ` Kaustabh Chakraborty
  2025-06-25 11:34       ` Robin Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Kaustabh Chakraborty @ 2025-06-25 10:12 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Robin Murphy, Joerg Roedel, Will Deacon, Inki Dae, Seung-Woo Kim,
	Kyungmin Park, iommu, dri-devel

On 2025-06-25 08:42, Marek Szyprowski wrote:
> On 25.06.2025 09:39, Kaustabh Chakraborty wrote:
>> On 2025-06-24 17:12, Robin Murphy wrote:
>>> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>>>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe
>>>> path),
>>>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and
>>>> [3]) seems
>>>> to not work anymore. Upon closer inspection, I observe that there is 
>>>> an
>>>> IOMMU crash.
>>>> 
>>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ]
>>>> PAGE FAULT occurred at 0x6715b3e0
>>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base:
>>>> 0x0000000044a14000
>>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>> decon_component_ops)
>>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU
>>>> Fault!
>>>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
>>>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>>>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>>>> [    3.000509] Call trace:
>>>> [    3.002938]  show_stack+0x18/0x24 (C)
>>>> [    3.006582]  dump_stack_lvl+0x60/0x80
>>>> [    3.010224]  dump_stack+0x18/0x24
>>>> [    3.013521]  panic+0x168/0x360
>>>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>>>> [         ...]
>>>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable
>>>> System MMU Fault! ]---
>>> 
>>> For starters, what if you just remove this panic() from the IOMMU
>>> driver? Frankly it seems a bit excessive anyway...
>> 
>> I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, even
>> after clearing the interrupt bit) indefinitely.
>> 
> Right, this is because decon device is still accessing system memory in
> a loop trying to display the splash screen. That panic is indeed a bit
> excessive, but what else IOMMU driver can do if no page fault handle is
> registered?
> 
> 
>>> 
>>> From the logs below it seems there is apparently unexpected traffic
>>> already going through the IOMMU when it wakes up. Is this the DRM
>>> drivers doing something sketchy, or has the bootloader left the
>>> display running for a splash screen? However in the latter case I
>>> don't obviously see why delaying the IOMMU probe should make much
>>> difference, given that the decon driver should still be waiting for
>>> it either way.
>> 
>> The display is initialized by the bootloader for splash yes, but I 
>> reckon
>> it doesn't use the IOMMU as it's accessible from a framebuffer region.
> 
> Right, bootloader configured decon device to display splash screen, 
> what
> means that decon device is constantly reading splash screen pixel data
> from system memory. There is no such thing as a 'framebuffer region', 
> it
> is just a system memory, which exynos sysmmu protects when enabled. So
> far this issue of splash screen from bootloader has not yet been solved
> in mainline. On other Exynos based supported boards this works only
> because there are also power domain drivers enabled, which are
> instantiated before the display related device and respective sysmmu
> device. That power domain driver shuts down power effectively disabling
> the display before the sysmmu gets probbed.
> 
> Long time ago I've pointed this issue and proposed some simple solution
> like a special initial identity mapping for the memory range used for
> splash screen, but that proposal is no longer applicable for the 
> current
> code.
> 
> As a workaround I would suggest to shutdown display in the decon device
> before starting the kernel (i.e. from the 'kernel loading mid-stage
> bootloader' if you have such).

Hi, thanks a lot for the explanation! That makes sense, I'll try
implementing your suggestions and will report back :)

> 
>> 
>>> 
>>> Thanks,
>>> Robin.
>>> 
>>>> The commit has been introduced in mainline v6.15-rc1. I've also
>>>> tested in
>>>> v6.15, v6.15.2, and v6.16-rc2, and there have been no apparent 
>>>> changes.
>>>> 
>>>> I've tried to revert the commit, and it does work that way. But on
>>>> reading
>>>> the commit message I understand that I need to find a proper
>>>> solution here.
>>>> I've tried to skim down the revert, and this is what I get - if i
>>>> change
>>>> line [4] as follows:
>>>> 
>>>> -        dev->bus->dma_configure(dev);
>>>> +        if (!strcmp(dev_name(dev), "14830000.decon"))
>>>> +            dev->bus->dma_configure(dev);
>>>> 
>>>> It really doesn't like dma_configure for some reason. It works, but:
>>>> 
>>>> [    2.779291] exynos-decon 14830000.decon: late IOMMU probe at
>>>> driver bind, something fishy here!
>>>> 
>>>> I believe the IOMMU hardware doesn't like the fact that it is being
>>>> initialized/resumed too early. I formed this conclusion by comparing
>>>> the
>>>> logs:
>>>> 
>>>> mainline:
>>>> [    1.274575] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume 
>>>> enter
>>>> [    2.859656] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>>>> [    2.864192] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks 
>>>> enter
>>>> [    2.869299] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val
>>>> 0x00000007
>>>> [    2.875062] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config 
>>>> enter
>>>> [    2.880006] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val
>>>> 0x01100784
>>>> [    2.885819] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>>>> [    2.890677] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val
>>>> 0x00044a14
>>>> [    2.896490] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate
>>>> enter
>>>> [    2.901695] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val
>>>> 0x00000001
>>>> [    2.907507] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>>>> [    2.912366] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val
>>>> 0x00000005
>>>> [    2.912371] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_irq enter
>>>> [    2.912836] [drm] Exynos DRM: using 14830000.decon device for DMA
>>>> mapping operations
>>>> [    2.918175] I_HAVE_ADDED_THESE: SYSMMU:
>>>> exynos_sysmmu_v5_get_fault_info enter
>>>> [    2.918182] I_HAVE_ADDED_THESE: SYSMMU: show_fault_information 
>>>> enter
>>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ]
>>>> PAGE FAULT occurred at 0x6715b3e0
>>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base:
>>>> 0x0000000044a14000
>>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>> decon_component_ops)
>>>> [    2.922859] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>>> [    2.922864] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU
>>>> Fault!
>>>> 
>>>> and with the patch above applied:
>>>> [    3.018478] [drm] Exynos DRM: using 14830000.decon device for DMA
>>>> mapping operations
>>>> [    3.025794] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>> decon_component_ops)
>>>> [    3.058655] exynos-dsi 14800000.dsi:
>>>> [drm:samsung_dsim_host_attach] Attached td4300-panel device (lanes:4
>>>> bpp:24 mode-flags:0x23)
>>>> [    3.070189] exynos-drm exynos-drm: bound 14800000.dsi (ops
>>>> exynos_dsi_component_ops)
>>>> [    3.078506] [drm] Initialized exynos 1.1.0 for exynos-drm on 
>>>> minor 1
>>>> [    3.090747] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>>>> [    3.090845] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>>>> [    3.093325] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>>> [    3.097662] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    3.102000] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    3.106337] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>>>> [    3.110762] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte 
>>>> enter
>>>> [    3.115726] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_map enter
>>>> [    3.120317] I_HAVE_ADDED_THESE: SYSMMU: to_exynos_domain enter
>>>> [    3.124914] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>>> [    3.129240] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    3.133578] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    3.137916] I_HAVE_ADDED_THESE: SYSMMU: lv1set_section enter
>>>> [    3.142340] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte 
>>>> enter
>>>> [         ...] (a lot of repetitions later)
>>>> [    4.322904] I_HAVE_ADDED_THESE: SYSMMU: section_entry enter
>>>> [    4.327230] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    4.331567] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    4.335905] I_HAVE_ADDED_THESE: SYSMMU: alloc_lv2entry enter
>>>> [    4.340329] I_HAVE_ADDED_THESE: SYSMMU: page_entry enter
>>>> [    4.344407] I_HAVE_ADDED_THESE: SYSMMU: lv2ent_offset enter
>>>> [    4.348744] I_HAVE_ADDED_THESE: SYSMMU: lv1ent_offset enter
>>>> [    4.353082] I_HAVE_ADDED_THESE: SYSMMU: lv2set_page enter
>>>> [    4.357246] I_HAVE_ADDED_THESE: SYSMMU: exynos_iommu_set_pte 
>>>> enter
>>>> [    4.362751] I_HAVE_ADDED_THESE: SYSMMU: exynos_sysmmu_resume 
>>>> enter
>>>> [    4.362767] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable enter
>>>> [    4.362771] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_clocks 
>>>> enter
>>>> [    4.362777] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val
>>>> 0x00000007
>>>> [    4.362782] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_init_config 
>>>> enter
>>>> [    4.362786] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000004 val
>>>> 0x01100784
>>>> [    4.362791] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_set_ptbase enter
>>>> [    4.362795] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x0000000c val
>>>> 0x00042a64
>>>> [    4.362799] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_tlb_invalidate
>>>> enter
>>>> [    4.362803] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000010 val
>>>> 0x00000001
>>>> [    4.362808] I_HAVE_ADDED_THESE: SYSMMU: __sysmmu_enable_vid enter
>>>> [    4.362811] I_HAVE_ADDED_THESE: SYSMMU: write: reg 0x00000000 val
>>>> 0x00000005
>>>> 
>>>> Then it continues booting as usual.
>>>> 
>>>> My lack of understanding of the IOMMU and DRM subsystems are really
>>>> limiting my triaging capabilities here, therefore I ask for any form
>>>> guidance or assistance with this.
>>>> 
>>>> Thank you.
>>>> 
>>>> [1]
>>>> https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroot.org
>>>> [2]
>>>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-2-88c0779af6cb@disroot.org
>>>> [3]
>>>> https://lore.kernel.org/all/20250612-exynos7870-drm-dts-v1-3-88c0779af6cb@disroot.org
>>>> [4]
>>>> https://protect2.fireeye.com/v1/url?k=015e6282-60d577b4-015fe9cd-74fe485cbff1-5039f5dfc286ac1d&q=1&e=4899b4fe-bcb8-43f2-87ed-92e79b0abe08&u=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv6.16-rc2%2Fsource%2Fdrivers%2Fiommu%2Fiommu.c%23L431
>> 
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-25  8:42     ` Marek Szyprowski
  2025-06-25 10:12       ` Kaustabh Chakraborty
@ 2025-06-25 11:34       ` Robin Murphy
  2025-06-28 15:39         ` Kaustabh Chakraborty
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-06-25 11:34 UTC (permalink / raw)
  To: Marek Szyprowski, Kaustabh Chakraborty
  Cc: Joerg Roedel, Will Deacon, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	iommu, dri-devel

On 2025-06-25 9:42 am, Marek Szyprowski wrote:
> On 25.06.2025 09:39, Kaustabh Chakraborty wrote:
>> On 2025-06-24 17:12, Robin Murphy wrote:
>>> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>>>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe
>>>> path),
>>>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and
>>>> [3]) seems
>>>> to not work anymore. Upon closer inspection, I observe that there is an
>>>> IOMMU crash.
>>>>
>>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ]
>>>> PAGE FAULT occurred at 0x6715b3e0
>>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base:
>>>> 0x0000000044a14000
>>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>> decon_component_ops)
>>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU
>>>> Fault!
>>>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
>>>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>>>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>>>> [    3.000509] Call trace:
>>>> [    3.002938]  show_stack+0x18/0x24 (C)
>>>> [    3.006582]  dump_stack_lvl+0x60/0x80
>>>> [    3.010224]  dump_stack+0x18/0x24
>>>> [    3.013521]  panic+0x168/0x360
>>>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>>>> [         ...]
>>>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable
>>>> System MMU Fault! ]---
>>>
>>> For starters, what if you just remove this panic() from the IOMMU
>>> driver? Frankly it seems a bit excessive anyway...
>>
>> I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, even
>> after clearing the interrupt bit) indefinitely.
>>
> Right, this is because decon device is still accessing system memory in
> a loop trying to display the splash screen. That panic is indeed a bit
> excessive, but what else IOMMU driver can do if no page fault handle is
> registered?

Report the unhandled fault and continue, like most drivers already do. 
If there's another fault, then that can get reported as well. It's kind 
of the point that if a misbehaving device has been prevented from 
accessing memory then it has *not* adversely affected the rest of the 
system.

I suppose if one wanted to be really clever then a driver could 
implement its own backoff mechanism where if it detects a sustained high 
rate of unhandled faults then it disables its interrupt for a bit, to 
mitigate the physical interrupt storm as well as avoid flooding the 
kernel log more than is useful.

>>>  From the logs below it seems there is apparently unexpected traffic
>>> already going through the IOMMU when it wakes up. Is this the DRM
>>> drivers doing something sketchy, or has the bootloader left the
>>> display running for a splash screen? However in the latter case I
>>> don't obviously see why delaying the IOMMU probe should make much
>>> difference, given that the decon driver should still be waiting for
>>> it either way.
>>
>> The display is initialized by the bootloader for splash yes, but I reckon
>> it doesn't use the IOMMU as it's accessible from a framebuffer region.
> 
> Right, bootloader configured decon device to display splash screen, what
> means that decon device is constantly reading splash screen pixel data
> from system memory. There is no such thing as a 'framebuffer region', it
> is just a system memory, which exynos sysmmu protects when enabled. So
> far this issue of splash screen from bootloader has not yet been solved
> in mainline. On other Exynos based supported boards this works only
> because there are also power domain drivers enabled, which are
> instantiated before the display related device and respective sysmmu
> device. That power domain driver shuts down power effectively disabling
> the display before the sysmmu gets probbed.

And presumably the sysmmu device itself doesn't need to depend on that 
power domain? OK, that at least makes sense.

> Long time ago I've pointed this issue and proposed some simple solution
> like a special initial identity mapping for the memory range used for
> splash screen, but that proposal is no longer applicable for the current
> code.
> 
> As a workaround I would suggest to shutdown display in the decon device
> before starting the kernel (i.e. from the 'kernel loading mid-stage
> bootloader' if you have such).

We do now have the tools to handle this properly - if the bootloader can 
be updated to add the appropriate "iommu-addresses" property[1] to the 
framebuffer reservation, then it's a case of hooking up support in 
exynos-iommu via of_iommu_get_resv_regions().

For a short-term kernel-side hack you could probably implement 
.def_domain_type to force IDENTITY for decon devices, as long as you can 
then convince the DRM driver to pick another device to grab a DMA ops 
domain from.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-25 11:34       ` Robin Murphy
@ 2025-06-28 15:39         ` Kaustabh Chakraborty
  2025-06-30 12:09           ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Kaustabh Chakraborty @ 2025-06-28 15:39 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Marek Szyprowski, Joerg Roedel, Will Deacon, Inki Dae,
	Seung-Woo Kim, Kyungmin Park, iommu, dri-devel

On 2025-06-25 11:34, Robin Murphy wrote:
> On 2025-06-25 9:42 am, Marek Szyprowski wrote:
>> On 25.06.2025 09:39, Kaustabh Chakraborty wrote:
>>> On 2025-06-24 17:12, Robin Murphy wrote:
>>>> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>>>>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper 
>>>>> probe
>>>>> path),
>>>>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and
>>>>> [3]) seems
>>>>> to not work anymore. Upon closer inspection, I observe that there 
>>>>> is an
>>>>> IOMMU crash.
>>>>> 
>>>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: 
>>>>> [READ]
>>>>> PAGE FAULT occurred at 0x6715b3e0
>>>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base:
>>>>> 0x0000000044a14000
>>>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>>> decon_component_ops)
>>>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 
>>>>> 0x4205001
>>>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU
>>>>> Fault!
>>>>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
>>>>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>>>>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>>>>> [    3.000509] Call trace:
>>>>> [    3.002938]  show_stack+0x18/0x24 (C)
>>>>> [    3.006582]  dump_stack_lvl+0x60/0x80
>>>>> [    3.010224]  dump_stack+0x18/0x24
>>>>> [    3.013521]  panic+0x168/0x360
>>>>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>>>>> [         ...]
>>>>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable
>>>>> System MMU Fault! ]---
>>>> 
>>>> For starters, what if you just remove this panic() from the IOMMU
>>>> driver? Frankly it seems a bit excessive anyway...
>>> 
>>> I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, 
>>> even
>>> after clearing the interrupt bit) indefinitely.
>>> 
>> Right, this is because decon device is still accessing system memory 
>> in
>> a loop trying to display the splash screen. That panic is indeed a bit
>> excessive, but what else IOMMU driver can do if no page fault handle 
>> is
>> registered?
> 
> Report the unhandled fault and continue, like most drivers already do. 
> If there's another fault, then that can get reported as well. It's kind 
> of the point that if a misbehaving device has been prevented from 
> accessing memory then it has *not* adversely affected the rest of the 
> system.
> 
> I suppose if one wanted to be really clever then a driver could 
> implement its own backoff mechanism where if it detects a sustained 
> high rate of unhandled faults then it disables its interrupt for a bit, 
> to mitigate the physical interrupt storm as well as avoid flooding the 
> kernel log more than is useful.
> 
>>>>  From the logs below it seems there is apparently unexpected traffic
>>>> already going through the IOMMU when it wakes up. Is this the DRM
>>>> drivers doing something sketchy, or has the bootloader left the
>>>> display running for a splash screen? However in the latter case I
>>>> don't obviously see why delaying the IOMMU probe should make much
>>>> difference, given that the decon driver should still be waiting for
>>>> it either way.
>>> 
>>> The display is initialized by the bootloader for splash yes, but I 
>>> reckon
>>> it doesn't use the IOMMU as it's accessible from a framebuffer 
>>> region.
>> 
>> Right, bootloader configured decon device to display splash screen, 
>> what
>> means that decon device is constantly reading splash screen pixel data
>> from system memory. There is no such thing as a 'framebuffer region', 
>> it
>> is just a system memory, which exynos sysmmu protects when enabled. So
>> far this issue of splash screen from bootloader has not yet been 
>> solved
>> in mainline. On other Exynos based supported boards this works only
>> because there are also power domain drivers enabled, which are
>> instantiated before the display related device and respective sysmmu
>> device. That power domain driver shuts down power effectively 
>> disabling
>> the display before the sysmmu gets probbed.
> 
> And presumably the sysmmu device itself doesn't need to depend on that 
> power domain? OK, that at least makes sense.
> 
>> Long time ago I've pointed this issue and proposed some simple 
>> solution
>> like a special initial identity mapping for the memory range used for
>> splash screen, but that proposal is no longer applicable for the 
>> current
>> code.
>> 
>> As a workaround I would suggest to shutdown display in the decon 
>> device
>> before starting the kernel (i.e. from the 'kernel loading mid-stage
>> bootloader' if you have such).
> 
> We do now have the tools to handle this properly - if the bootloader 
> can be updated to add the appropriate "iommu-addresses" property[1] to 
> the framebuffer reservation, then it's a case of hooking up support in 
> exynos-iommu via of_iommu_get_resv_regions().

Hey, thanks a lot! I got it to work. [1] [2]

The Exynos IOMMU driver doesn't have support for get_resv_regions in
iommu_ops. So I tried to find existing drivers which have it 
implemented,
for examples. [3] [4] [5]

All of them have some reserved region allocation before calling
iommu_dma_get_resv_regions(). I don't know what they are, and I don't 
know
what base and length should be used for sysmmu. Either way I gave it a 
shot
with [6]:

#define EXYNOS_DEV_ADDR_START	0x20000000
#define EXYNOS_DEV_ADDR_SIZE	0x40000000

static void exynos_iommu_get_resv_regions(struct device *dev,
					  struct list_head *head)
{
	struct iommu_resv_region *region;
	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

	region = iommu_alloc_resv_region(EXYNOS_DEV_ADDR_START, 
EXYNOS_DEV_ADDR_SIZE,
					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
	if (!region)
		return;

	list_add_tail(&region->list, head);

	iommu_dma_get_resv_regions(dev, head);
}

...and it worked. It works even without that first allocation, so not
sure why and if its needed.

I need some input for making it upstreamable (mainly if the base and
length are correct or no), I'll send a patch.

[1] 
https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/of_iommu.c#L206
[2] 
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml
[3] 
https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/apple-dart.c#L966
[4] 
https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c#L3573
[5] 
https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/arm/arm-smmu/arm-smmu.c#L1594
[6] 
https://elixir.bootlin.com/linux/v6.15.3/source/drivers/gpu/drm/exynos/exynos_drm_dma.c#L30

> 
> For a short-term kernel-side hack you could probably implement 
> .def_domain_type to force IDENTITY for decon devices, as long as you 
> can then convince the DRM driver to pick another device to grab a DMA 
> ops domain from.
> 
> Thanks,
> Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Help: Samsung Exynos 7870 DECON SYSMMU panic
  2025-06-28 15:39         ` Kaustabh Chakraborty
@ 2025-06-30 12:09           ` Robin Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Robin Murphy @ 2025-06-30 12:09 UTC (permalink / raw)
  To: Kaustabh Chakraborty
  Cc: Marek Szyprowski, Joerg Roedel, Will Deacon, Inki Dae,
	Seung-Woo Kim, Kyungmin Park, iommu, dri-devel

On 28/06/2025 4:39 pm, Kaustabh Chakraborty wrote:
> On 2025-06-25 11:34, Robin Murphy wrote:
>> On 2025-06-25 9:42 am, Marek Szyprowski wrote:
>>> On 25.06.2025 09:39, Kaustabh Chakraborty wrote:
>>>> On 2025-06-24 17:12, Robin Murphy wrote:
>>>>> On 2025-06-18 3:02 pm, Kaustabh Chakraborty wrote:
>>>>>> Since bcb81ac6ae3c (iommu: Get DT/ACPI parsing into the proper probe
>>>>>> path),
>>>>>> The Samsung Exynos 7870 DECON device (with patches [1], [2], and
>>>>>> [3]) seems
>>>>>> to not work anymore. Upon closer inspection, I observe that there 
>>>>>> is an
>>>>>> IOMMU crash.
>>>>>>
>>>>>> [    2.918189] exynos-sysmmu 14860000.sysmmu: 14830000.decon: [READ]
>>>>>> PAGE FAULT occurred at 0x6715b3e0
>>>>>> [    2.918199] exynos-sysmmu 14860000.sysmmu: Page table base:
>>>>>> 0x0000000044a14000
>>>>>> [    2.918243] exynos-drm exynos-drm: bound 14830000.decon (ops
>>>>>> decon_component_ops)
>>>>>> [    2.922868] exynos-sysmmu 14860000.sysmmu:   Lv1 entry: 0x4205001
>>>>>> [    2.922877] Kernel panic - not syncing: Unrecoverable System MMU
>>>>>> Fault!
>>>>>> [    2.922885] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
>>>>>> 6.16.0-rc2-exynos7870 #722 PREEMPT
>>>>>> [    2.995312] Hardware name: Samsung Galaxy J7 Prime (DT)
>>>>>> [    3.000509] Call trace:
>>>>>> [    3.002938]  show_stack+0x18/0x24 (C)
>>>>>> [    3.006582]  dump_stack_lvl+0x60/0x80
>>>>>> [    3.010224]  dump_stack+0x18/0x24
>>>>>> [    3.013521]  panic+0x168/0x360
>>>>>> [    3.016558]  exynos_sysmmu_irq+0x224/0x2ac
>>>>>> [         ...]
>>>>>> [    3.108786] ---[ end Kernel panic - not syncing: Unrecoverable
>>>>>> System MMU Fault! ]---
>>>>>
>>>>> For starters, what if you just remove this panic() from the IOMMU
>>>>> driver? Frankly it seems a bit excessive anyway...
>>>>
>>>> I've tried that, sysmmu repeatedly keeps issuing interrupts (yes, even
>>>> after clearing the interrupt bit) indefinitely.
>>>>
>>> Right, this is because decon device is still accessing system memory in
>>> a loop trying to display the splash screen. That panic is indeed a bit
>>> excessive, but what else IOMMU driver can do if no page fault handle is
>>> registered?
>>
>> Report the unhandled fault and continue, like most drivers already do. 
>> If there's another fault, then that can get reported as well. It's 
>> kind of the point that if a misbehaving device has been prevented from 
>> accessing memory then it has *not* adversely affected the rest of the 
>> system.
>>
>> I suppose if one wanted to be really clever then a driver could 
>> implement its own backoff mechanism where if it detects a sustained 
>> high rate of unhandled faults then it disables its interrupt for a 
>> bit, to mitigate the physical interrupt storm as well as avoid 
>> flooding the kernel log more than is useful.
>>
>>>>>  From the logs below it seems there is apparently unexpected traffic
>>>>> already going through the IOMMU when it wakes up. Is this the DRM
>>>>> drivers doing something sketchy, or has the bootloader left the
>>>>> display running for a splash screen? However in the latter case I
>>>>> don't obviously see why delaying the IOMMU probe should make much
>>>>> difference, given that the decon driver should still be waiting for
>>>>> it either way.
>>>>
>>>> The display is initialized by the bootloader for splash yes, but I 
>>>> reckon
>>>> it doesn't use the IOMMU as it's accessible from a framebuffer region.
>>>
>>> Right, bootloader configured decon device to display splash screen, what
>>> means that decon device is constantly reading splash screen pixel data
>>> from system memory. There is no such thing as a 'framebuffer region', it
>>> is just a system memory, which exynos sysmmu protects when enabled. So
>>> far this issue of splash screen from bootloader has not yet been solved
>>> in mainline. On other Exynos based supported boards this works only
>>> because there are also power domain drivers enabled, which are
>>> instantiated before the display related device and respective sysmmu
>>> device. That power domain driver shuts down power effectively disabling
>>> the display before the sysmmu gets probbed.
>>
>> And presumably the sysmmu device itself doesn't need to depend on that 
>> power domain? OK, that at least makes sense.
>>
>>> Long time ago I've pointed this issue and proposed some simple solution
>>> like a special initial identity mapping for the memory range used for
>>> splash screen, but that proposal is no longer applicable for the current
>>> code.
>>>
>>> As a workaround I would suggest to shutdown display in the decon device
>>> before starting the kernel (i.e. from the 'kernel loading mid-stage
>>> bootloader' if you have such).
>>
>> We do now have the tools to handle this properly - if the bootloader 
>> can be updated to add the appropriate "iommu-addresses" property[1] to 
>> the framebuffer reservation, then it's a case of hooking up support in 
>> exynos-iommu via of_iommu_get_resv_regions().
> 
> Hey, thanks a lot! I got it to work. [1] [2]

Hurrah!

> The Exynos IOMMU driver doesn't have support for get_resv_regions in
> iommu_ops. So I tried to find existing drivers which have it implemented,
> for examples. [3] [4] [5]
> 
> All of them have some reserved region allocation before calling
> iommu_dma_get_resv_regions(). I don't know what they are, and I don't know
> what base and length should be used for sysmmu. Either way I gave it a shot
> with [6]:
> 
> #define EXYNOS_DEV_ADDR_START    0x20000000
> #define EXYNOS_DEV_ADDR_SIZE    0x40000000
> 
> static void exynos_iommu_get_resv_regions(struct device *dev,
>                        struct list_head *head)
> {
>      struct iommu_resv_region *region;
>      int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> 
>      region = iommu_alloc_resv_region(EXYNOS_DEV_ADDR_START, 
> EXYNOS_DEV_ADDR_SIZE,
>                       prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
>      if (!region)
>          return;
> 
>      list_add_tail(&region->list, head);
> 
>      iommu_dma_get_resv_regions(dev, head);
> }
> 
> ...and it worked. It works even without that first allocation, so not
> sure why and if its needed.

No, unless you have client devices that use MSIs *and* you want to 
support VFIO for them, you don't need to make up a SW_MSI region - feel 
free to hook up ".get_resv_regions = iommu_dma_get_resv_regions" directly.

The only other thing to be wary of in general here is the window between 
initialising the IOMMU device itself and attaching the client(s) (e.g. 
SMMUv3 has to take special care there). However in this particular case 
I guess you're OK, since exynos_sysmmu_probe() doesn't really touch the 
hardware anyway, and you don't have that intermediate "globally enabled 
without client-specific config" state which can disrupt traffic on the 
bigger more complicated IOMMUs.

Thanks,
Robin.

> I need some input for making it upstreamable (mainly if the base and
> length are correct or no), I'll send a patch.
> 
> [1] 
> https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/of_iommu.c#L206
> [2] 
> https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml
> [3] 
> https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/apple-dart.c#L966
> [4] 
> https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c#L3573
> [5] 
> https://elixir.bootlin.com/linux/v6.15.3/source/drivers/iommu/arm/arm-smmu/arm-smmu.c#L1594
> [6] 
> https://elixir.bootlin.com/linux/v6.15.3/source/drivers/gpu/drm/exynos/exynos_drm_dma.c#L30
> 
>>
>> For a short-term kernel-side hack you could probably implement 
>> .def_domain_type to force IDENTITY for decon devices, as long as you 
>> can then convince the DRM driver to pick another device to grab a DMA 
>> ops domain from.
>>
>> Thanks,
>> Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-30 12:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 14:02 Help: Samsung Exynos 7870 DECON SYSMMU panic Kaustabh Chakraborty
2025-06-18 14:06 ` Kaustabh Chakraborty
2025-06-24 17:12 ` Robin Murphy
2025-06-25  7:39   ` Kaustabh Chakraborty
2025-06-25  8:42     ` Marek Szyprowski
2025-06-25 10:12       ` Kaustabh Chakraborty
2025-06-25 11:34       ` Robin Murphy
2025-06-28 15:39         ` Kaustabh Chakraborty
2025-06-30 12:09           ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).