Linux Tegra architecture development
 help / color / mirror / Atom feed
* [REGRESSION] EMEM address decode error when using Tegra210 media engines
@ 2025-06-03 15:06 Diogo Ivo
  2025-06-03 15:32 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Diogo Ivo @ 2025-06-03 15:06 UTC (permalink / raw)
  To: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh, jgg,
	baolu.lu, jsnitsel, jroedel
  Cc: regressions, linux-tegra, iommu, diogo.ivo

Hello,

Commit 50568f87d1e233e introduced a regression when trying to use the media
accelerators present on the Tegra X1 SoC.

I came across this regression when testing the branch [1] that leverages
the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
50568f87d1e233e we see the following error messages after submitting a job
through the TEGRA_CHANNEL_SUBMIT IOCTL:

[   17.993237] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbe00: EMEM address decode error (SMMU translation error 
[--S])
[   18.003625] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbe00: Page fault (SMMU translation error [--S])
[   18.015088] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbf00: EMEM address decode error (SMMU translation error 
[--S])
[   18.027626] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbf00: Page fault (SMMU translation error [--S])
[   28.131228] ---- mlocks ----
[   28.131816] 0: unlocked
[   28.134238] 1: unlocked
[   28.136680] 2: unlocked
[   28.139091] 3: unlocked
[   28.141527] 4: unlocked
[   28.143950] 5: unlocked
[   28.146371] 6: unlocked
[   28.148803] 7: unlocked
[   28.151229] 8: unlocked
[   28.153649] 9: unlocked
[   28.156089] 10: unlocked
[   28.158589] 11: unlocked
[   28.161110] 12: unlocked
[   28.163621] 13: unlocked
[   28.166128] 14: unlocked
[   28.168650] 15: unlocked
[   28.171154]
[   28.172633] ---- syncpts ----
[   28.175588] id 0 (0-reserved-nop) min 0 max 0 (0 waiters)
[   28.180964] id 1 (1-54200000.dc) min 0 max 0 (0 waiters)
[   28.186246] id 2 (2-54240000.dc) min 0 max 0 (0 waiters)
[   28.191531] id 3 (3-54340000.vic) min 0 max 0 (0 waiters)
[   28.196907] id 4 (4-ffmpeg) min 0 max 1 (1 waiters)
[   28.201988]
[   28.203234] ---- channels ----
[   28.206259] 0: fifo:
[   28.208431] FIFOSTAT 80100840
[   28.211375] [empty]
[   28.213454] 0-54340000.vic:
[   28.213457] inactive
[   28.213457]
[   28.219956] 1: fifo:
[   28.222116] FIFOSTAT 80100840
[   28.225070] [empty]
[   28.227146] 1-54340000.vic:
[   28.227150] inactive
[   28.227150]
[   28.233650] 2: fifo:
[   28.235816] FIFOSTAT 80100840
[   28.238754] [empty]
[   28.240846] 2-54380000.nvjpg:
[   28.240851] active class c0, offset 0000, val 00000104
[   28.248990] DMASTART 0x00000000ffffd000, DMAEND 0x0000000000000ffc
[   28.255141] DMAPUT 00000018 DMAGET 00000018 DMACTL 00000000
[   28.260689] CBREAD 00000104 CBSTAT 00c00000
[   28.264852] JOB, syncpt 4: 1 timeout: 10000 num_slots: 3 num_handles: 1
[   28.271440]     0x00000000ffffd000: 00080041: SETCL(class=001, 
offset=008, mask=01, [04000000])
[   28.280106]     0x00000000ffffd008: 00003000: SETCL(class=0c0)
[   28.285910]     0x00000000ffffd00c: 20000000: NONINCR(offset=000, [])
[   28.292333]     0x00000000ffffd010: 6000001d: GATHER(offset=000, 
insert=0, type=0, count=001d, addr=[ffffc000])
[   28.302380]   GATHER at 0x00000000ffffc000+0x0, 29 words
[   28.307673]     0x00000000ffffc000: 10100002: INCR(offset=010, 
[00000080, 00000001])
[   28.315380]     0x00000000ffffc00c: 10100002: INCR(offset=010, 
[000001c0, 00000000])
[   28.323091]     0x00000000ffffc018: 10100002: INCR(offset=010, 
[000001c1, 00000000])
[   28.330804]     0x00000000ffffc024: 10100002: INCR(offset=010, 
[000001c2, 00014cc0])
[   28.338517]     0x00000000ffffc030: 10100002: INCR(offset=010, 
[000001c4, 00011710])
[   28.346232]     0x00000000ffffc03c: 10100002: INCR(offset=010, 
[000001c5, 00000000])
[   28.353945]     0x00000000ffffc048: 10100002: INCR(offset=010, 
[000001c6, 0000b910])
[   28.361658]     0x00000000ffffc054: 10100002: INCR(offset=010, 
[000001c7, 00000000])
[   28.369372]     0x00000000ffffc060: 10100002: INCR(offset=010, 
[000000c0, 00000100])
[   28.377085]     0x00000000ffffc06c: 20000001: NONINCR(offset=000, 
[00000104])
[   28.384190]
[   28.385660] tegra-host1x 50000000.host1x: cdma_timeout_handler: 
timeout: 4 (4-ffmpeg), HW thresh 0, done 1

Please let me know if you need more information on my side and I'll be
happy to provide it.

Best regards,
Diogo

#regzbot introduced: 50568f87d1e233e

[1]: https://gitlab.freedesktop.org/d.ivo/mesa/-/tree/diogo/vaapi_remove_gpu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] EMEM address decode error when using Tegra210 media engines
  2025-06-03 15:06 [REGRESSION] EMEM address decode error when using Tegra210 media engines Diogo Ivo
@ 2025-06-03 15:32 ` Jason Gunthorpe
  2025-06-03 16:52   ` Diogo Ivo
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-06-03 15:32 UTC (permalink / raw)
  To: Diogo Ivo
  Cc: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

On Tue, Jun 03, 2025 at 04:06:47PM +0100, Diogo Ivo wrote:
> Hello,
> 
> Commit 50568f87d1e233e introduced a regression when trying to use the media
> accelerators present on the Tegra X1 SoC.
> 
> I came across this regression when testing the branch [1] that leverages
> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
> 50568f87d1e233e we see the following error messages after submitting a job
> through the TEGRA_CHANNEL_SUBMIT IOCTL:

Maybe this?

@@ -567,7 +567,7 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
 
        /* The flush the page directory entry from caches */
        dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
-                                        sizeof(*pd), DMA_TO_DEVICE);
+                                        sizeof(pd->val[0]), DMA_TO_DEVICE);
 
        /* And flush the iommu */
        smmu_flush_ptc(smmu, as->pd_dma, offset);

It is the only mistake I was able to notice.

But I'd be puzzled - I'd expect bigger sizeof would make it slower not
broken.. Though your crash sure looks like either missing cache
coherency or a bad PTE construction.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] EMEM address decode error when using Tegra210 media engines
  2025-06-03 15:32 ` Jason Gunthorpe
@ 2025-06-03 16:52   ` Diogo Ivo
  2025-06-03 17:43     ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Diogo Ivo @ 2025-06-03 16:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu


On 6/3/25 4:32 PM, Jason Gunthorpe wrote:
> On Tue, Jun 03, 2025 at 04:06:47PM +0100, Diogo Ivo wrote:
>> Hello,
>>
>> Commit 50568f87d1e233e introduced a regression when trying to use the media
>> accelerators present on the Tegra X1 SoC.
>>
>> I came across this regression when testing the branch [1] that leverages
>> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
>> 50568f87d1e233e we see the following error messages after submitting a job
>> through the TEGRA_CHANNEL_SUBMIT IOCTL:
> 
> Maybe this?
> 
> @@ -567,7 +567,7 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
>   
>          /* The flush the page directory entry from caches */
>          dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
> -                                        sizeof(*pd), DMA_TO_DEVICE);
> +                                        sizeof(pd->val[0]), DMA_TO_DEVICE);
>   
>          /* And flush the iommu */
>          smmu_flush_ptc(smmu, as->pd_dma, offset);
> 
> It is the only mistake I was able to notice.
> 
> But I'd be puzzled - I'd expect bigger sizeof would make it slower not
> broken.. Though your crash sure looks like either missing cache
> coherency or a bad PTE construction.
> 

With this change there is still an error:

[   21.794016] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbe00: EMEM address decode error (SMMU translation error 
[--S])
[   21.804409] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
@0x00000000ffffbe00: Page fault (SMMU translation error [--S])

the difference being that we only get it for one address compared to the
previous log.

Diogo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] EMEM address decode error when using Tegra210 media engines
  2025-06-03 16:52   ` Diogo Ivo
@ 2025-06-03 17:43     ` Robin Murphy
  2025-06-03 18:14       ` Diogo Ivo
  2025-06-03 19:04       ` Jason Gunthorpe
  0 siblings, 2 replies; 6+ messages in thread
From: Robin Murphy @ 2025-06-03 17:43 UTC (permalink / raw)
  To: Diogo Ivo, Jason Gunthorpe
  Cc: thierry.reding, vdumpa, joro, will, jonathanh, baolu.lu, jsnitsel,
	jroedel, regressions, linux-tegra, iommu

On 2025-06-03 5:52 pm, Diogo Ivo wrote:
> 
> On 6/3/25 4:32 PM, Jason Gunthorpe wrote:
>> On Tue, Jun 03, 2025 at 04:06:47PM +0100, Diogo Ivo wrote:
>>> Hello,
>>>
>>> Commit 50568f87d1e233e introduced a regression when trying to use the 
>>> media
>>> accelerators present on the Tegra X1 SoC.
>>>
>>> I came across this regression when testing the branch [1] that leverages
>>> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
>>> 50568f87d1e233e we see the following error messages after submitting 
>>> a job
>>> through the TEGRA_CHANNEL_SUBMIT IOCTL:
>>
>> Maybe this?
>>
>> @@ -567,7 +567,7 @@ static void tegra_smmu_set_pde(struct 
>> tegra_smmu_as *as, unsigned long iova,
>>          /* The flush the page directory entry from caches */
>>          dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
>> -                                        sizeof(*pd), DMA_TO_DEVICE);
>> +                                        sizeof(pd->val[0]), 
>> DMA_TO_DEVICE);
>>          /* And flush the iommu */
>>          smmu_flush_ptc(smmu, as->pd_dma, offset);
>>
>> It is the only mistake I was able to notice.
>>
>> But I'd be puzzled - I'd expect bigger sizeof would make it slower not
>> broken.. Though your crash sure looks like either missing cache
>> coherency or a bad PTE construction.

I reckon the "unsigned long offset = pd_index * sizeof(*pd);" a few 
lines above is probably more impactful ;)

Robin.

>>
> 
> With this change there is still an error:
> 
> [   21.794016] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
> @0x00000000ffffbe00: EMEM address decode error (SMMU translation error 
> [--S])
> [   21.804409] tegra-mc 70019000.memory-controller: nvjpgsrd: read 
> @0x00000000ffffbe00: Page fault (SMMU translation error [--S])
> 
> the difference being that we only get it for one address compared to the
> previous log.
> 
> Diogo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] EMEM address decode error when using Tegra210 media engines
  2025-06-03 17:43     ` Robin Murphy
@ 2025-06-03 18:14       ` Diogo Ivo
  2025-06-03 19:04       ` Jason Gunthorpe
  1 sibling, 0 replies; 6+ messages in thread
From: Diogo Ivo @ 2025-06-03 18:14 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: thierry.reding, vdumpa, joro, will, jonathanh, baolu.lu, jsnitsel,
	jroedel, regressions, linux-tegra, iommu



On 6/3/25 6:43 PM, Robin Murphy wrote:
> On 2025-06-03 5:52 pm, Diogo Ivo wrote:
>>
>> On 6/3/25 4:32 PM, Jason Gunthorpe wrote:
>>> On Tue, Jun 03, 2025 at 04:06:47PM +0100, Diogo Ivo wrote:
>>>> Hello,
>>>>
>>>> Commit 50568f87d1e233e introduced a regression when trying to use 
>>>> the media
>>>> accelerators present on the Tegra X1 SoC.
>>>>
>>>> I came across this regression when testing the branch [1] that 
>>>> leverages
>>>> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
>>>> 50568f87d1e233e we see the following error messages after submitting 
>>>> a job
>>>> through the TEGRA_CHANNEL_SUBMIT IOCTL:
>>>
>>> Maybe this?
>>>
>>> @@ -567,7 +567,7 @@ static void tegra_smmu_set_pde(struct 
>>> tegra_smmu_as *as, unsigned long iova,
>>>          /* The flush the page directory entry from caches */
>>>          dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
>>> -                                        sizeof(*pd), DMA_TO_DEVICE);
>>> +                                        sizeof(pd->val[0]), 
>>> DMA_TO_DEVICE);
>>>          /* And flush the iommu */
>>>          smmu_flush_ptc(smmu, as->pd_dma, offset);
>>>
>>> It is the only mistake I was able to notice.
>>>
>>> But I'd be puzzled - I'd expect bigger sizeof would make it slower not
>>> broken.. Though your crash sure looks like either missing cache
>>> coherency or a bad PTE construction.
> 
> I reckon the "unsigned long offset = pd_index * sizeof(*pd);" a few 
> lines above is probably more impactful ;)

Yes, that fixes it :)

Diogo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] EMEM address decode error when using Tegra210 media engines
  2025-06-03 17:43     ` Robin Murphy
  2025-06-03 18:14       ` Diogo Ivo
@ 2025-06-03 19:04       ` Jason Gunthorpe
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2025-06-03 19:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Diogo Ivo, thierry.reding, vdumpa, joro, will, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

On Tue, Jun 03, 2025 at 06:43:49PM +0100, Robin Murphy wrote:
> On 2025-06-03 5:52 pm, Diogo Ivo wrote:
> > 
> > On 6/3/25 4:32 PM, Jason Gunthorpe wrote:
> > > On Tue, Jun 03, 2025 at 04:06:47PM +0100, Diogo Ivo wrote:
> > > > Hello,
> > > > 
> > > > Commit 50568f87d1e233e introduced a regression when trying to
> > > > use the media
> > > > accelerators present on the Tegra X1 SoC.
> > > > 
> > > > I came across this regression when testing the branch [1] that leverages
> > > > the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
> > > > 50568f87d1e233e we see the following error messages after
> > > > submitting a job
> > > > through the TEGRA_CHANNEL_SUBMIT IOCTL:
> > > 
> > > Maybe this?
> > > 
> > > @@ -567,7 +567,7 @@ static void tegra_smmu_set_pde(struct
> > > tegra_smmu_as *as, unsigned long iova,
> > >          /* The flush the page directory entry from caches */
> > >          dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
> > > -                                        sizeof(*pd), DMA_TO_DEVICE);
> > > +                                        sizeof(pd->val[0]),
> > > DMA_TO_DEVICE);
> > >          /* And flush the iommu */
> > >          smmu_flush_ptc(smmu, as->pd_dma, offset);
> > > 
> > > It is the only mistake I was able to notice.
> > > 
> > > But I'd be puzzled - I'd expect bigger sizeof would make it slower not
> > > broken.. Though your crash sure looks like either missing cache
> > > coherency or a bad PTE construction.
> 
> I reckon the "unsigned long offset = pd_index * sizeof(*pd);" a few lines
> above is probably more impactful ;)

Oh yes, almost certainly. Very good of you to notice it!

Diogo how about this:

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 61897d50162dd7..72a400f7ae0c20 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -560,14 +560,14 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
        unsigned int pd_index = iova_pd_index(iova);
        struct tegra_smmu *smmu = as->smmu;
        struct tegra_pd *pd = as->pd;
-       unsigned long offset = pd_index * sizeof(*pd);
+       unsigned long offset = pd_index * sizeof(pd->val[0]);
 
        /* Set the page directory entry first */
        pd->val[pd_index] = value;
 
        /* The flush the page directory entry from caches */
        dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
-                                        sizeof(*pd), DMA_TO_DEVICE);
+                                        sizeof(pd->val[0]), DMA_TO_DEVICE);
 
        /* And flush the iommu */
        smmu_flush_ptc(smmu, as->pd_dma, offset);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-03 19:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 15:06 [REGRESSION] EMEM address decode error when using Tegra210 media engines Diogo Ivo
2025-06-03 15:32 ` Jason Gunthorpe
2025-06-03 16:52   ` Diogo Ivo
2025-06-03 17:43     ` Robin Murphy
2025-06-03 18:14       ` Diogo Ivo
2025-06-03 19:04       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox