All of lore.kernel.org
 help / color / mirror / Atom feed
* etnaviv: mmu issue after end of address space reached?
@ 2016-12-10 11:00 Wladimir J. van der Laan
  2016-12-10 11:47 ` Wladimir J. van der Laan
  0 siblings, 1 reply; 5+ messages in thread
From: Wladimir J. van der Laan @ 2016-12-10 11:00 UTC (permalink / raw)
  To: etnaviv, dri-devel


I'm having an issue where a long-running test eventually runs into a MMU
fault. What this test does is basically:

- while [ 1 ]; do start a program that:
    - Allocate bo A, B and C, D
    - Map bo C, update it
    - Loop
        - Map bo A B and C, update them
        - Build command buffer
        - Submit command buffer
        - etna_cmd_stream_finish
        - Map buffer A, check output
    - Delete buffer A, B, C and D
    - Exit program
(code is here: https://github.com/etnaviv/etnaviv_gpu_tests/blob/master/src/etnaviv_verifyops.c#L735)

The curious thing is that after the fault happens once, it keeps running into
the same fault almost immediately, even after a GPU reset. This made me suspect
it has to do with kernel driver state not GPU state.

I added some debugging in the kernel driver in etnaviv_iommu_find_iova:

<4>[  549.776209] Found iova: 00000000 eff82000
<4>[  549.780712] Found iova: 00000000 eff93000
<4>[  549.785173] Found iova: 00000000 effa4000
<4>[  549.789706] Found iova: 00000000 effb5000
<4>[  549.794167] Found iova: 00000000 effc6000
<4>[  549.798686] Found iova: 00000000 effd7000
<4>[  549.803171] Found iova: 00000000 effe8000
<4>[  549.803171] Found iova: 00000000 effe8000
<4>[  549.807680] last_iova <- end of range
<4>[  549.809966] Found iova: 00000000 e8783000
<3>[  549.814025] etnaviv-gpu 130000.gpu: MMU fault status 0x00000002 <- happens almost immediately
<3>[  549.819960] etnaviv-gpu 130000.gpu: MMU 0 fault addr 0xe8783040
<3>[  549.825889] etnaviv-gpu 130000.gpu: MMU 1 fault addr 0x00000000
<3>[  549.831817] etnaviv-gpu 130000.gpu: MMU 2 fault addr 0x00000000
<3>[  549.837744] etnaviv-gpu 130000.gpu: MMU 3 fault addr 0x00000000

Apparently it is running out of the address space.
(I changed the end of the range to 0xf0000000 instead of 0xffffffff to rule out
that it had to do with the GPU disliking certain addresses)

In principle this shouldn't be an issue - after last_iova it starts over, with a
flushed MMU. I verified that this flush is actually being queued in etnaviv_buffer_queue.

However for some reason that logic doesn't seem to be working. I have not found
out what is wrong yet. I have not verified whether the MMU flush is actually flushing,
or whether this is a problem with updating the page tables.

What I find curious, though, is that after the search presumably starts over at
0 it returns 0xe8783000 instead of an earlier address. For this reason
last_iova is stuck near the end of the address space and the problem keeps
repeating once it's been hit.

It's certainly possible that I'm doing something dumb here and am somehow spamming
full the address space :)

Wladimir

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: etnaviv: mmu issue after end of address space reached?
  2016-12-10 11:00 etnaviv: mmu issue after end of address space reached? Wladimir J. van der Laan
@ 2016-12-10 11:47 ` Wladimir J. van der Laan
  2016-12-10 17:05   ` Wladimir J. van der Laan
  0 siblings, 1 reply; 5+ messages in thread
From: Wladimir J. van der Laan @ 2016-12-10 11:47 UTC (permalink / raw)
  To: etnaviv, dri-devel


> <3>[  549.814025] etnaviv-gpu 130000.gpu: MMU fault status 0x00000002 <- happens almost immediately
> <3>[  549.819960] etnaviv-gpu 130000.gpu: MMU 0 fault addr 0xe8783040
> <3>[  549.825889] etnaviv-gpu 130000.gpu: MMU 1 fault addr 0x00000000
> <3>[  549.831817] etnaviv-gpu 130000.gpu: MMU 2 fault addr 0x00000000
> <3>[  549.837744] etnaviv-gpu 130000.gpu: MMU 3 fault addr 0x00000000

Okay I just tried to get the same while rendering in Mesa and it doesn't happen.

It reaches the end of the address space, sets last_iova back to 0, and just continues.

So the MMU fault is somehow specific to what I'm doing. Interesting.

> What I find curious, though, is that after the search presumably starts over at
> 0 it returns 0xe8783000 instead of an earlier address. For this reason
> last_iova is stuck near the end of the address space and the problem keeps
> repeating once it's been hit.

This does happen when rendering - it keeps dealing out iovas near the end of the
address space. But that seems harmless, though maybe causes some more MMU
flushes than necessary.

Wladimir
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: etnaviv: mmu issue after end of address space reached?
  2016-12-10 11:47 ` Wladimir J. van der Laan
@ 2016-12-10 17:05   ` Wladimir J. van der Laan
  2016-12-12  9:25     ` Lucas Stach
  0 siblings, 1 reply; 5+ messages in thread
From: Wladimir J. van der Laan @ 2016-12-10 17:05 UTC (permalink / raw)
  To: etnaviv, dri-devel

> So the MMU fault is somehow specific to what I'm doing. Interesting.

I think I found the issue: the MMU "flush and sync" is not good enough in some
cases.

What the Vivante kernel driver does, for MMUv2, after mapping some kinds of
buffer objects (apparently those tagged INDEX and VERTEX, this includes shader
code and CL buffers) is 

- Send MMU flush command (like we do)
- Add a notify event "resume" (they hardwire event 29 for this)
- Add END command the command buffer so that the FE stops
- Remember where to continue

Then in the interrupt handler:

- If the "resume" notify event comes in
    - Wait for FE to be idle
    - Restart the FE to the remembered position

This is implemented in "pause" here http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_command.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n395
gcvPAGE_TABLE_DIRTY_BIT_FE is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_mmu.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n2176
endAfterFlushMmuCache is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/arch/gc_hal_kernel_hardware.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n1259
The interrupt notification is handled here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_event.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n2224

I hacked this into the DRM driver and have been running my test for quite some time,
bumping against the tail end of the address range many times, without any MMU faults.

My proposal is to add a bo flag for buffers that need this kind of "hard" MMU
reset (this is not all of them, e.g. textures don't), and if their iova mapping
requires a MMU flush, do the above stop-and-start ritual (in case of MMUv2).

Wladimir
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: etnaviv: mmu issue after end of address space reached?
  2016-12-10 17:05   ` Wladimir J. van der Laan
@ 2016-12-12  9:25     ` Lucas Stach
  2016-12-12  9:57       ` Wladimir J. van der Laan
  0 siblings, 1 reply; 5+ messages in thread
From: Lucas Stach @ 2016-12-12  9:25 UTC (permalink / raw)
  To: Wladimir J. van der Laan; +Cc: etnaviv, dri-devel

Hi Wladimir,

Am Samstag, den 10.12.2016, 18:05 +0100 schrieb Wladimir J. van der
Laan:
> > So the MMU fault is somehow specific to what I'm doing. Interesting.
> 
> I think I found the issue: the MMU "flush and sync" is not good enough in some
> cases.
> 
> What the Vivante kernel driver does, for MMUv2, after mapping some kinds of
> buffer objects (apparently those tagged INDEX and VERTEX, this includes shader
> code and CL buffers) is 
> 
> - Send MMU flush command (like we do)
> - Add a notify event "resume" (they hardwire event 29 for this)
> - Add END command the command buffer so that the FE stops
> - Remember where to continue
> 
> Then in the interrupt handler:
> 
> - If the "resume" notify event comes in
>     - Wait for FE to be idle
>     - Restart the FE to the remembered position
> 
> This is implemented in "pause" here http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_command.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n395
> gcvPAGE_TABLE_DIRTY_BIT_FE is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_mmu.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n2176
> endAfterFlushMmuCache is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/arch/gc_hal_kernel_hardware.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n1259
> The interrupt notification is handled here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/gpu-viv/hal/kernel/gc_hal_kernel_event.c?id=77f61547834c4f127b44b13e43c59133a35880dc#n2224
> 
> I hacked this into the DRM driver and have been running my test for quite some time,
> bumping against the tail end of the address range many times, without any MMU faults.
> 
> My proposal is to add a bo flag for buffers that need this kind of "hard" MMU
> reset (this is not all of them, e.g. textures don't), and if their iova mapping
> requires a MMU flush, do the above stop-and-start ritual (in case of MMUv2).

I'm aware of what the Vivante driver does. Unfortunately we would
basically need to flush the MMU before each user command stream, as we
continuously map new command buffers into the IOVA, which would be
crippling for performance. Vivante gets around this by setting up a 1:1
virt:phys mapping by default.

The current etnaviv code gets around this stop->irq->start dance by
spacing out the command streams, which seems to be enough to get around
the FE MMU flush failure. This may not work correctly at the end of the
address range. I'll take a look at this.

Blindly implementing the Vivante way does not seem like the correct
approach to me.

Regards,
Lucas

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: etnaviv: mmu issue after end of address space reached?
  2016-12-12  9:25     ` Lucas Stach
@ 2016-12-12  9:57       ` Wladimir J. van der Laan
  0 siblings, 0 replies; 5+ messages in thread
From: Wladimir J. van der Laan @ 2016-12-12  9:57 UTC (permalink / raw)
  To: Lucas Stach; +Cc: etnaviv, dri-devel


> The current etnaviv code gets around this stop->irq->start dance by
> spacing out the command streams, which seems to be enough to get around
> the FE MMU flush failure. This may not work correctly at the end of the
> address range. I'll take a look at this.

In my case it seems not a command buffer that this is happening for, but another
bo used by the command buffer.

> Blindly implementing the Vivante way does not seem like the correct
> approach to me.

I'm not suggesting that that is a good solution! Just needed to do that to
narrow down the issue, as well as get rid of it for now.

Regards,
Wladimir
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-12  9:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-10 11:00 etnaviv: mmu issue after end of address space reached? Wladimir J. van der Laan
2016-12-10 11:47 ` Wladimir J. van der Laan
2016-12-10 17:05   ` Wladimir J. van der Laan
2016-12-12  9:25     ` Lucas Stach
2016-12-12  9:57       ` Wladimir J. van der Laan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.