All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/exynos: g2d userptr memory corruption
@ 2015-08-16 12:48 Tobias Jakobi
  2015-08-17 10:26 ` Lucas Stach
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-16 12:48 UTC (permalink / raw)
  To: linux-samsung-soc, dri-devel, Inki Dae, Marek Szyprowski,
	Joonyoung Shim

Hello,

some time ago I checked whether I could use the userptr functionality to
do zero-copy from userspace allocated buffers via the G2D. This didn't
work out so well, so kinda put this to the bottom of my TODO list.

Now that IOMMU support has landed and Jan Kara has rewrote page pinning
using frame vectors (see [1]) I gave userptr another try.

The results are much better. I'm not experiencing any kernel lockups or
sysmmu pagefaults anymore. However the image now suffers from visual
artifacts. These images show the nature of the artifacts:
http://i.imgur.com/nzT6g3Y.jpg
http://i.imgur.com/wkuYI6X.jpg

The corruption always manifests itself in these pixel lines of fixed
size and wrong color.

I have written a testcase as part of libdrm for this issue:
https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71

It allocates N (N an even number) buffers which are aligned to the
system pagesize. Then it does this each iteration:
1) Fill the first N/2 buffers with random data
2) Copy the first half to the second half of the buffers
3) memcmp() first and second half (verification pass)

Usually this verification already fails on the first iteration. An
interesting observation is that increasing (!) the buffer size (so the
amount of pixels that have to copied per buffer grows) makes this issue
less likely to happen.

With the default 512x512 buffers however it happens, like I said above,
almost immediately.

I first suspected that the clock rate of the G2D was too high (I
overclock the engine from 200MHz to 400MHz here), but even with the
default clock there is no change to the behaviour.

While looking at the issue I remember this discussion [2] so while ago.

Adding Marek to Cc since I guess that this could be related to the IOMMU
as well (some missing flushing?).


With best wishes,
Tobias


[1] http://www.spinics.net/lists/linux-samsung-soc/msg45931.html
[2] http://lists.freedesktop.org/archives/dri-devel/2014-July/062675.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* drm/exynos: g2d userptr memory corruption
@ 2015-08-16 12:48 Tobias Jakobi
  0 siblings, 0 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-16 12:48 UTC (permalink / raw)
  To: linux-samsung-soc, dri-devel, Inki Dae, Marek Szyprowski,
	Joonyoung Shim

Hello,

some time ago I checked whether I could use the userptr functionality to
do zero-copy from userspace allocated buffers via the G2D. This didn't
work out so well, so kinda put this to the bottom of my TODO list.

Now that IOMMU support has landed and Jan Kara has rewrote page pinning
using frame vectors (see [1]) I gave userptr another try.

The results are much better. I'm not experiencing any kernel lockups or
sysmmu pagefaults anymore. However the image now suffers from visual
artifacts. These images show the nature of the artifacts:
http://i.imgur.com/nzT6g3Y.jpg
http://i.imgur.com/wkuYI6X.jpg

The corruption always manifests itself in these pixel lines of fixed
size and wrong color.

I have written a testcase as part of libdrm for this issue:
https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71

It allocates N (N an even number) buffers which are aligned to the
system pagesize. Then it does this each iteration:
1) Fill the first N/2 buffers with random data
2) Copy the first half to the second half of the buffers
3) memcmp() first and second half (verification pass)

Usually this verification already fails on the first iteration. An
interesting observation is that increasing (!) the buffer size (so the
amount of pixels that have to copied per buffer grows) makes this issue
less likely to happen.

With the default 512x512 buffers however it happens, like I said above,
almost immediately.

I first suspected that the clock rate of the G2D was too high (I
overclock the engine from 200MHz to 400MHz here), but even with the
default clock there is no change to the behaviour.

While looking at the issue I remember this discussion [2] so while ago.

Adding Marek to Cc since I guess that this could be related to the IOMMU
as well (some missing flushing?).


With best wishes,
Tobias


[1] http://www.spinics.net/lists/linux-samsung-soc/msg45931.html
[2] http://lists.freedesktop.org/archives/dri-devel/2014-July/062675.html

------------------------------------------------------------------------------
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* drm/exynos: g2d userptr memory corruption
@ 2015-08-16 12:48 Tobias Jakobi
  0 siblings, 0 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-16 12:48 UTC (permalink / raw)
  To: linux-samsung-soc, dri-devel, Inki Dae, Marek Szyprowski,
	Joonyoung Shim

Hello,

some time ago I checked whether I could use the userptr functionality to
do zero-copy from userspace allocated buffers via the G2D. This didn't
work out so well, so kinda put this to the bottom of my TODO list.

Now that IOMMU support has landed and Jan Kara has rewrote page pinning
using frame vectors (see [1]) I gave userptr another try.

The results are much better. I'm not experiencing any kernel lockups or
sysmmu pagefaults anymore. However the image now suffers from visual
artifacts. These images show the nature of the artifacts:
http://i.imgur.com/nzT6g3Y.jpg
http://i.imgur.com/wkuYI6X.jpg

The corruption always manifests itself in these pixel lines of fixed
size and wrong color.

I have written a testcase as part of libdrm for this issue:
https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71

It allocates N (N an even number) buffers which are aligned to the
system pagesize. Then it does this each iteration:
1) Fill the first N/2 buffers with random data
2) Copy the first half to the second half of the buffers
3) memcmp() first and second half (verification pass)

Usually this verification already fails on the first iteration. An
interesting observation is that increasing (!) the buffer size (so the
amount of pixels that have to copied per buffer grows) makes this issue
less likely to happen.

With the default 512x512 buffers however it happens, like I said above,
almost immediately.

I first suspected that the clock rate of the G2D was too high (I
overclock the engine from 200MHz to 400MHz here), but even with the
default clock there is no change to the behaviour.

While looking at the issue I remember this discussion [2] so while ago.

Adding Marek to Cc since I guess that this could be related to the IOMMU
as well (some missing flushing?).


With best wishes,
Tobias


[1] http://www.spinics.net/lists/linux-samsung-soc/msg45931.html
[2] http://lists.freedesktop.org/archives/dri-devel/2014-July/062675.html

------------------------------------------------------------------------------
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-16 12:48 drm/exynos: g2d userptr memory corruption Tobias Jakobi
  2015-08-17 10:26 ` Lucas Stach
@ 2015-08-17 10:26 ` Lucas Stach
  2015-08-17 15:41   ` Tobias Jakobi
                     ` (2 more replies)
  2015-08-17 10:26 ` Lucas Stach
  2 siblings, 3 replies; 14+ messages in thread
From: Lucas Stach @ 2015-08-17 10:26 UTC (permalink / raw)
  To: Tobias Jakobi
  Cc: linux-samsung-soc, dri-devel, Inki Dae, Marek Szyprowski,
	Joonyoung Shim

Hi Tobias,

Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
> Hello,
> 
> some time ago I checked whether I could use the userptr functionality to
> do zero-copy from userspace allocated buffers via the G2D. This didn't
> work out so well, so kinda put this to the bottom of my TODO list.
> 
> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
> using frame vectors (see [1]) I gave userptr another try.
> 
> The results are much better. I'm not experiencing any kernel lockups or
> sysmmu pagefaults anymore. However the image now suffers from visual
> artifacts. These images show the nature of the artifacts:
> http://i.imgur.com/nzT6g3Y.jpg
> http://i.imgur.com/wkuYI6X.jpg
> 
> The corruption always manifests itself in these pixel lines of fixed
> size and wrong color.
> 
> I have written a testcase as part of libdrm for this issue:
> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
> 
> It allocates N (N an even number) buffers which are aligned to the
> system pagesize. Then it does this each iteration:
> 1) Fill the first N/2 buffers with random data
> 2) Copy the first half to the second half of the buffers
> 3) memcmp() first and second half (verification pass)
> 
> Usually this verification already fails on the first iteration. An
> interesting observation is that increasing (!) the buffer size (so the
> amount of pixels that have to copied per buffer grows) makes this issue
> less likely to happen.
> 
> With the default 512x512 buffers however it happens, like I said above,
> almost immediately.
> 
This is obviously a cache flush missing. The memory you get from
userspace is normal cached memory, so to make it visible to the GPU you
need to flush parts of the cache out to main memory.

The corruption you are seeing is just unflushed cachelines. This also
explains why increasing the buffer size helps: the more memory the CPU
touches the more cachelines will be flushed out to be replaced with new
data.

So you need to go and have a look at dma_map() and dma_sync_*_for_*()
and friends.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-16 12:48 drm/exynos: g2d userptr memory corruption Tobias Jakobi
  2015-08-17 10:26 ` Lucas Stach
  2015-08-17 10:26 ` Lucas Stach
@ 2015-08-17 10:26 ` Lucas Stach
  2 siblings, 0 replies; 14+ messages in thread
From: Lucas Stach @ 2015-08-17 10:26 UTC (permalink / raw)
  To: Tobias Jakobi; +Cc: Marek, linux-samsung-soc, dri-devel, Szyprowski

Hi Tobias,

Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
> Hello,
> 
> some time ago I checked whether I could use the userptr functionality to
> do zero-copy from userspace allocated buffers via the G2D. This didn't
> work out so well, so kinda put this to the bottom of my TODO list.
> 
> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
> using frame vectors (see [1]) I gave userptr another try.
> 
> The results are much better. I'm not experiencing any kernel lockups or
> sysmmu pagefaults anymore. However the image now suffers from visual
> artifacts. These images show the nature of the artifacts:
> http://i.imgur.com/nzT6g3Y.jpg
> http://i.imgur.com/wkuYI6X.jpg
> 
> The corruption always manifests itself in these pixel lines of fixed
> size and wrong color.
> 
> I have written a testcase as part of libdrm for this issue:
> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
> 
> It allocates N (N an even number) buffers which are aligned to the
> system pagesize. Then it does this each iteration:
> 1) Fill the first N/2 buffers with random data
> 2) Copy the first half to the second half of the buffers
> 3) memcmp() first and second half (verification pass)
> 
> Usually this verification already fails on the first iteration. An
> interesting observation is that increasing (!) the buffer size (so the
> amount of pixels that have to copied per buffer grows) makes this issue
> less likely to happen.
> 
> With the default 512x512 buffers however it happens, like I said above,
> almost immediately.
> 
This is obviously a cache flush missing. The memory you get from
userspace is normal cached memory, so to make it visible to the GPU you
need to flush parts of the cache out to main memory.

The corruption you are seeing is just unflushed cachelines. This also
explains why increasing the buffer size helps: the more memory the CPU
touches the more cachelines will be flushed out to be replaced with new
data.

So you need to go and have a look at dma_map() and dma_sync_*_for_*()
and friends.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


------------------------------------------------------------------------------
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-16 12:48 drm/exynos: g2d userptr memory corruption Tobias Jakobi
@ 2015-08-17 10:26 ` Lucas Stach
  2015-08-17 10:26 ` Lucas Stach
  2015-08-17 10:26 ` Lucas Stach
  2 siblings, 0 replies; 14+ messages in thread
From: Lucas Stach @ 2015-08-17 10:26 UTC (permalink / raw)
  To: Tobias Jakobi
  Cc: Marek, linux-samsung-soc, Joonyoung Shim, Inki Dae, dri-devel,
	Szyprowski

Hi Tobias,

Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
> Hello,
> 
> some time ago I checked whether I could use the userptr functionality to
> do zero-copy from userspace allocated buffers via the G2D. This didn't
> work out so well, so kinda put this to the bottom of my TODO list.
> 
> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
> using frame vectors (see [1]) I gave userptr another try.
> 
> The results are much better. I'm not experiencing any kernel lockups or
> sysmmu pagefaults anymore. However the image now suffers from visual
> artifacts. These images show the nature of the artifacts:
> http://i.imgur.com/nzT6g3Y.jpg
> http://i.imgur.com/wkuYI6X.jpg
> 
> The corruption always manifests itself in these pixel lines of fixed
> size and wrong color.
> 
> I have written a testcase as part of libdrm for this issue:
> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
> 
> It allocates N (N an even number) buffers which are aligned to the
> system pagesize. Then it does this each iteration:
> 1) Fill the first N/2 buffers with random data
> 2) Copy the first half to the second half of the buffers
> 3) memcmp() first and second half (verification pass)
> 
> Usually this verification already fails on the first iteration. An
> interesting observation is that increasing (!) the buffer size (so the
> amount of pixels that have to copied per buffer grows) makes this issue
> less likely to happen.
> 
> With the default 512x512 buffers however it happens, like I said above,
> almost immediately.
> 
This is obviously a cache flush missing. The memory you get from
userspace is normal cached memory, so to make it visible to the GPU you
need to flush parts of the cache out to main memory.

The corruption you are seeing is just unflushed cachelines. This also
explains why increasing the buffer size helps: the more memory the CPU
touches the more cachelines will be flushed out to be replaced with new
data.

So you need to go and have a look at dma_map() and dma_sync_*_for_*()
and friends.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


------------------------------------------------------------------------------
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-17 10:26 ` Lucas Stach
@ 2015-08-17 15:41   ` Tobias Jakobi
  2015-08-19 13:53     ` Tobias Jakobi
  2015-08-17 15:41   ` Tobias Jakobi
  2015-08-17 15:41   ` Tobias Jakobi
  2 siblings, 1 reply; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-17 15:41 UTC (permalink / raw)
  To: Lucas Stach, Tobias Jakobi
  Cc: linux-samsung-soc, dri-devel, Inki Dae, Marek Szyprowski,
	Joonyoung Shim

Thanks Lucas for the explanation!


Lucas Stach wrote:
> Hi Tobias,
> 
> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>> Hello,
>>
>> some time ago I checked whether I could use the userptr functionality to
>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>> work out so well, so kinda put this to the bottom of my TODO list.
>>
>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>> using frame vectors (see [1]) I gave userptr another try.
>>
>> The results are much better. I'm not experiencing any kernel lockups or
>> sysmmu pagefaults anymore. However the image now suffers from visual
>> artifacts. These images show the nature of the artifacts:
>> http://i.imgur.com/nzT6g3Y.jpg
>> http://i.imgur.com/wkuYI6X.jpg
>>
>> The corruption always manifests itself in these pixel lines of fixed
>> size and wrong color.
>>
>> I have written a testcase as part of libdrm for this issue:
>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>>
>> It allocates N (N an even number) buffers which are aligned to the
>> system pagesize. Then it does this each iteration:
>> 1) Fill the first N/2 buffers with random data
>> 2) Copy the first half to the second half of the buffers
>> 3) memcmp() first and second half (verification pass)
>>
>> Usually this verification already fails on the first iteration. An
>> interesting observation is that increasing (!) the buffer size (so the
>> amount of pixels that have to copied per buffer grows) makes this issue
>> less likely to happen.
>>
>> With the default 512x512 buffers however it happens, like I said above,
>> almost immediately.
>>
> This is obviously a cache flush missing. The memory you get from
> userspace is normal cached memory, so to make it visible to the GPU you
> need to flush parts of the cache out to main memory.
> 
> The corruption you are seeing is just unflushed cachelines. This also
> explains why increasing the buffer size helps: the more memory the CPU
> touches the more cachelines will be flushed out to be replaced with new
> data.
I should point out that the snapshots I uploaded were done with a
different setup. There only the source memory of the G2D operation is a
userspace allocated buffer. The destination is a GEM buffer allocated
through libdrm, which is then used as framebuffer. So the issue already
appears when just the source is userspace allocated.

What works however is an operation between GEM to GEM. However this
might be related to the default allocation flags libdrm uses.



> So you need to go and have a look at dma_map() and dma_sync_*_for_*()
> and friends.
> 
> Regards,
> Lucas
> 


With best wishes,
Tobias

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-17 10:26 ` Lucas Stach
  2015-08-17 15:41   ` Tobias Jakobi
@ 2015-08-17 15:41   ` Tobias Jakobi
  2015-08-17 15:41   ` Tobias Jakobi
  2 siblings, 0 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-17 15:41 UTC (permalink / raw)
  To: Lucas Stach, Tobias Jakobi; +Cc: linux-samsung-soc, dri-devel, Marek Szyprowski

Thanks Lucas for the explanation!


Lucas Stach wrote:
> Hi Tobias,
> 
> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>> Hello,
>>
>> some time ago I checked whether I could use the userptr functionality to
>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>> work out so well, so kinda put this to the bottom of my TODO list.
>>
>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>> using frame vectors (see [1]) I gave userptr another try.
>>
>> The results are much better. I'm not experiencing any kernel lockups or
>> sysmmu pagefaults anymore. However the image now suffers from visual
>> artifacts. These images show the nature of the artifacts:
>> http://i.imgur.com/nzT6g3Y.jpg
>> http://i.imgur.com/wkuYI6X.jpg
>>
>> The corruption always manifests itself in these pixel lines of fixed
>> size and wrong color.
>>
>> I have written a testcase as part of libdrm for this issue:
>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>>
>> It allocates N (N an even number) buffers which are aligned to the
>> system pagesize. Then it does this each iteration:
>> 1) Fill the first N/2 buffers with random data
>> 2) Copy the first half to the second half of the buffers
>> 3) memcmp() first and second half (verification pass)
>>
>> Usually this verification already fails on the first iteration. An
>> interesting observation is that increasing (!) the buffer size (so the
>> amount of pixels that have to copied per buffer grows) makes this issue
>> less likely to happen.
>>
>> With the default 512x512 buffers however it happens, like I said above,
>> almost immediately.
>>
> This is obviously a cache flush missing. The memory you get from
> userspace is normal cached memory, so to make it visible to the GPU you
> need to flush parts of the cache out to main memory.
> 
> The corruption you are seeing is just unflushed cachelines. This also
> explains why increasing the buffer size helps: the more memory the CPU
> touches the more cachelines will be flushed out to be replaced with new
> data.
I should point out that the snapshots I uploaded were done with a
different setup. There only the source memory of the G2D operation is a
userspace allocated buffer. The destination is a GEM buffer allocated
through libdrm, which is then used as framebuffer. So the issue already
appears when just the source is userspace allocated.

What works however is an operation between GEM to GEM. However this
might be related to the default allocation flags libdrm uses.



> So you need to go and have a look at dma_map() and dma_sync_*_for_*()
> and friends.
> 
> Regards,
> Lucas
> 


With best wishes,
Tobias



------------------------------------------------------------------------------
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-17 10:26 ` Lucas Stach
  2015-08-17 15:41   ` Tobias Jakobi
  2015-08-17 15:41   ` Tobias Jakobi
@ 2015-08-17 15:41   ` Tobias Jakobi
  2 siblings, 0 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-17 15:41 UTC (permalink / raw)
  To: Lucas Stach, Tobias Jakobi
  Cc: Inki Dae, Joonyoung Shim, linux-samsung-soc, dri-devel,
	Marek Szyprowski

Thanks Lucas for the explanation!


Lucas Stach wrote:
> Hi Tobias,
> 
> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>> Hello,
>>
>> some time ago I checked whether I could use the userptr functionality to
>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>> work out so well, so kinda put this to the bottom of my TODO list.
>>
>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>> using frame vectors (see [1]) I gave userptr another try.
>>
>> The results are much better. I'm not experiencing any kernel lockups or
>> sysmmu pagefaults anymore. However the image now suffers from visual
>> artifacts. These images show the nature of the artifacts:
>> http://i.imgur.com/nzT6g3Y.jpg
>> http://i.imgur.com/wkuYI6X.jpg
>>
>> The corruption always manifests itself in these pixel lines of fixed
>> size and wrong color.
>>
>> I have written a testcase as part of libdrm for this issue:
>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>>
>> It allocates N (N an even number) buffers which are aligned to the
>> system pagesize. Then it does this each iteration:
>> 1) Fill the first N/2 buffers with random data
>> 2) Copy the first half to the second half of the buffers
>> 3) memcmp() first and second half (verification pass)
>>
>> Usually this verification already fails on the first iteration. An
>> interesting observation is that increasing (!) the buffer size (so the
>> amount of pixels that have to copied per buffer grows) makes this issue
>> less likely to happen.
>>
>> With the default 512x512 buffers however it happens, like I said above,
>> almost immediately.
>>
> This is obviously a cache flush missing. The memory you get from
> userspace is normal cached memory, so to make it visible to the GPU you
> need to flush parts of the cache out to main memory.
> 
> The corruption you are seeing is just unflushed cachelines. This also
> explains why increasing the buffer size helps: the more memory the CPU
> touches the more cachelines will be flushed out to be replaced with new
> data.
I should point out that the snapshots I uploaded were done with a
different setup. There only the source memory of the G2D operation is a
userspace allocated buffer. The destination is a GEM buffer allocated
through libdrm, which is then used as framebuffer. So the issue already
appears when just the source is userspace allocated.

What works however is an operation between GEM to GEM. However this
might be related to the default allocation flags libdrm uses.



> So you need to go and have a look at dma_map() and dma_sync_*_for_*()
> and friends.
> 
> Regards,
> Lucas
> 


With best wishes,
Tobias



------------------------------------------------------------------------------
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-17 15:41   ` Tobias Jakobi
@ 2015-08-19 13:53     ` Tobias Jakobi
  2015-08-19 14:08       ` Jerome Glisse
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-19 13:53 UTC (permalink / raw)
  To: Tobias Jakobi, Lucas Stach
  Cc: linux-samsung-soc, Inki Dae, Marek Szyprowski, Joonyoung Shim,
	ML dri-devel, Jerome Glisse

Adding Jérôme to Cc. I think he looked the userptr code before, so maybe
he has some idea what is going wrong here.

I also had a look at the code, but my knowledge about the DMA API is
almost nonexistant. However I can see that before doing any DMA via the
G2D on the buffer the code calls dma_map() on it, and also unmaps it
when the commandlist is finished.


With best wishes,
Tobias


Tobias Jakobi wrote:
> Thanks Lucas for the explanation!
> 
> 
> Lucas Stach wrote:
>> Hi Tobias,
>>
>> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>>> Hello,
>>>
>>> some time ago I checked whether I could use the userptr functionality to
>>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>>> work out so well, so kinda put this to the bottom of my TODO list.
>>>
>>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>>> using frame vectors (see [1]) I gave userptr another try.
>>>
>>> The results are much better. I'm not experiencing any kernel lockups or
>>> sysmmu pagefaults anymore. However the image now suffers from visual
>>> artifacts. These images show the nature of the artifacts:
>>> http://i.imgur.com/nzT6g3Y.jpg
>>> http://i.imgur.com/wkuYI6X.jpg
>>>
>>> The corruption always manifests itself in these pixel lines of fixed
>>> size and wrong color.
>>>
>>> I have written a testcase as part of libdrm for this issue:
>>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>>>
>>> It allocates N (N an even number) buffers which are aligned to the
>>> system pagesize. Then it does this each iteration:
>>> 1) Fill the first N/2 buffers with random data
>>> 2) Copy the first half to the second half of the buffers
>>> 3) memcmp() first and second half (verification pass)
>>>
>>> Usually this verification already fails on the first iteration. An
>>> interesting observation is that increasing (!) the buffer size (so the
>>> amount of pixels that have to copied per buffer grows) makes this issue
>>> less likely to happen.
>>>
>>> With the default 512x512 buffers however it happens, like I said above,
>>> almost immediately.
>>>
>> This is obviously a cache flush missing. The memory you get from
>> userspace is normal cached memory, so to make it visible to the GPU you
>> need to flush parts of the cache out to main memory.
>>
>> The corruption you are seeing is just unflushed cachelines. This also
>> explains why increasing the buffer size helps: the more memory the CPU
>> touches the more cachelines will be flushed out to be replaced with new
>> data.
> I should point out that the snapshots I uploaded were done with a
> different setup. There only the source memory of the G2D operation is a
> userspace allocated buffer. The destination is a GEM buffer allocated
> through libdrm, which is then used as framebuffer. So the issue already
> appears when just the source is userspace allocated.
> 
> What works however is an operation between GEM to GEM. However this
> might be related to the default allocation flags libdrm uses.
> 
> 
> 
>> So you need to go and have a look at dma_map() and dma_sync_*_for_*()
>> and friends.
>>
>> Regards,
>> Lucas
>>
> 
> 
> With best wishes,
> Tobias
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-19 13:53     ` Tobias Jakobi
@ 2015-08-19 14:08       ` Jerome Glisse
  2015-08-19 14:41         ` Rob Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Jerome Glisse @ 2015-08-19 14:08 UTC (permalink / raw)
  To: Tobias Jakobi
  Cc: Tobias Jakobi, Lucas Stach, linux-samsung-soc, Inki Dae,
	Marek Szyprowski, Joonyoung Shim, ML dri-devel

On Wed, Aug 19, 2015 at 03:53:44PM +0200, Tobias Jakobi wrote:
> Adding Jérôme to Cc. I think he looked the userptr code before, so maybe
> he has some idea what is going wrong here.
> 
> I also had a look at the code, but my knowledge about the DMA API is
> almost nonexistant. However I can see that before doing any DMA via the
> G2D on the buffer the code calls dma_map() on it, and also unmaps it
> when the commandlist is finished.
> 
> 
> With best wishes,
> Tobias
> 
> 
> Tobias Jakobi wrote:
> > Thanks Lucas for the explanation!
> > 
> > 
> > Lucas Stach wrote:
> >> Hi Tobias,
> >>
> >> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
> >>> Hello,
> >>>
> >>> some time ago I checked whether I could use the userptr functionality to
> >>> do zero-copy from userspace allocated buffers via the G2D. This didn't
> >>> work out so well, so kinda put this to the bottom of my TODO list.
> >>>
> >>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
> >>> using frame vectors (see [1]) I gave userptr another try.
> >>>
> >>> The results are much better. I'm not experiencing any kernel lockups or
> >>> sysmmu pagefaults anymore. However the image now suffers from visual
> >>> artifacts. These images show the nature of the artifacts:
> >>> http://i.imgur.com/nzT6g3Y.jpg
> >>> http://i.imgur.com/wkuYI6X.jpg
> >>>
> >>> The corruption always manifests itself in these pixel lines of fixed
> >>> size and wrong color.
> >>>
> >>> I have written a testcase as part of libdrm for this issue:
> >>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
> >>>
> >>> It allocates N (N an even number) buffers which are aligned to the
> >>> system pagesize. Then it does this each iteration:
> >>> 1) Fill the first N/2 buffers with random data
> >>> 2) Copy the first half to the second half of the buffers
> >>> 3) memcmp() first and second half (verification pass)
> >>>
> >>> Usually this verification already fails on the first iteration. An
> >>> interesting observation is that increasing (!) the buffer size (so the
> >>> amount of pixels that have to copied per buffer grows) makes this issue
> >>> less likely to happen.
> >>>
> >>> With the default 512x512 buffers however it happens, like I said above,
> >>> almost immediately.
> >>>
> >> This is obviously a cache flush missing. The memory you get from
> >> userspace is normal cached memory, so to make it visible to the GPU you
> >> need to flush parts of the cache out to main memory.
> >>
> >> The corruption you are seeing is just unflushed cachelines. This also
> >> explains why increasing the buffer size helps: the more memory the CPU
> >> touches the more cachelines will be flushed out to be replaced with new
> >> data.
> > I should point out that the snapshots I uploaded were done with a
> > different setup. There only the source memory of the G2D operation is a
> > userspace allocated buffer. The destination is a GEM buffer allocated
> > through libdrm, which is then used as framebuffer. So the issue already
> > appears when just the source is userspace allocated.
> > 

This is still consistent with cachelines issue. Is your GPU & IOMMU cache
coherent with the CPU ? If not then it means you need to cache flush the
buffer before you use it with the GPU. The dma API provide few helpers for
that.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-19 14:08       ` Jerome Glisse
@ 2015-08-19 14:41         ` Rob Clark
  2015-08-27 15:10           ` Tobias Jakobi
  0 siblings, 1 reply; 14+ messages in thread
From: Rob Clark @ 2015-08-19 14:41 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Marek Szyprowski, Tobias Jakobi, linux-samsung-soc, ML dri-devel,
	Tobias Jakobi

On Wed, Aug 19, 2015 at 10:08 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Wed, Aug 19, 2015 at 03:53:44PM +0200, Tobias Jakobi wrote:
>> Adding Jérôme to Cc. I think he looked the userptr code before, so maybe
>> he has some idea what is going wrong here.
>>
>> I also had a look at the code, but my knowledge about the DMA API is
>> almost nonexistant. However I can see that before doing any DMA via the
>> G2D on the buffer the code calls dma_map() on it, and also unmaps it
>> when the commandlist is finished.
>>
>>
>> With best wishes,
>> Tobias
>>
>>
>> Tobias Jakobi wrote:
>> > Thanks Lucas for the explanation!
>> >
>> >
>> > Lucas Stach wrote:
>> >> Hi Tobias,
>> >>
>> >> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>> >>> Hello,
>> >>>
>> >>> some time ago I checked whether I could use the userptr functionality to
>> >>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>> >>> work out so well, so kinda put this to the bottom of my TODO list.
>> >>>
>> >>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>> >>> using frame vectors (see [1]) I gave userptr another try.
>> >>>
>> >>> The results are much better. I'm not experiencing any kernel lockups or
>> >>> sysmmu pagefaults anymore. However the image now suffers from visual
>> >>> artifacts. These images show the nature of the artifacts:
>> >>> http://i.imgur.com/nzT6g3Y.jpg
>> >>> http://i.imgur.com/wkuYI6X.jpg
>> >>>
>> >>> The corruption always manifests itself in these pixel lines of fixed
>> >>> size and wrong color.
>> >>>
>> >>> I have written a testcase as part of libdrm for this issue:
>> >>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>> >>>
>> >>> It allocates N (N an even number) buffers which are aligned to the
>> >>> system pagesize. Then it does this each iteration:
>> >>> 1) Fill the first N/2 buffers with random data
>> >>> 2) Copy the first half to the second half of the buffers
>> >>> 3) memcmp() first and second half (verification pass)
>> >>>
>> >>> Usually this verification already fails on the first iteration. An
>> >>> interesting observation is that increasing (!) the buffer size (so the
>> >>> amount of pixels that have to copied per buffer grows) makes this issue
>> >>> less likely to happen.
>> >>>
>> >>> With the default 512x512 buffers however it happens, like I said above,
>> >>> almost immediately.
>> >>>
>> >> This is obviously a cache flush missing. The memory you get from
>> >> userspace is normal cached memory, so to make it visible to the GPU you
>> >> need to flush parts of the cache out to main memory.
>> >>
>> >> The corruption you are seeing is just unflushed cachelines. This also
>> >> explains why increasing the buffer size helps: the more memory the CPU
>> >> touches the more cachelines will be flushed out to be replaced with new
>> >> data.
>> > I should point out that the snapshots I uploaded were done with a
>> > different setup. There only the source memory of the G2D operation is a
>> > userspace allocated buffer. The destination is a GEM buffer allocated
>> > through libdrm, which is then used as framebuffer. So the issue already
>> > appears when just the source is userspace allocated.
>> >
>
> This is still consistent with cachelines issue. Is your GPU & IOMMU cache
> coherent with the CPU ? If not then it means you need to cache flush the
> buffer before you use it with the GPU. The dma API provide few helpers for
> that.

although I suspect dma-api probably not aware of any device caches
(and I suspect a bit weak when it comes to devices that support mix of
coherent and non-coherent mappings)..

BR,
-R

> Cheers,
> Jérôme
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-19 14:41         ` Rob Clark
@ 2015-08-27 15:10           ` Tobias Jakobi
  2015-08-27 17:08             ` Tobias Jakobi
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-27 15:10 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tobias Jakobi, Jerome Glisse, linux-samsung-soc, ML dri-devel,
	Marek Szyprowski

Hello,

I did some further investigation into this issue.


I extended my test application to also cover GEM/GEM and
GEM/userptr transfers, and also putting GEM allocation
flags into the mix.

You can find the current version here:
https://github.com/tobiasjakobi/libdrm/blob/exynos/tests/exynos/exynos_fimg2d_verify_copy.c

What is interesting is that all GEM/GEM transfer
variants work, regardless of the allocation flags.

In particular I can allocate a cacheable GEM buffer
and the verification still works. This confuses me
since everything point into the direction that this
is a cache flush / cache coherency issue. So why
doesn't this happen for GEM buffers as well if we
allow caching to happen there?



Next I hacked up a custom ioctl to trigger
flush_cache_all(), outer_flush_all() and
outer_sync(), issuing this ioctl after filling the
buffers and also after the g2d operations. This
somehow helps, in the sense that the test application
runs longer. But it also makes the system prone to
lockups. Looks like calling outer_flush_all() a lot
triggers these. I don't know why this happens but
IIRC one shouldn't call these low-level functions
from driver code anyway.



Next I checked the DMA-API and looked into dma_sync_sg_for_cpu
and dma_sync_sg_for_device. But the driver code already does this:
1) dma_map before starting the engine
2) dma_unmap after the engine finishes

The DMA-API HOWTO explicitly says that the sync functions are
only there if I modify the buffer with the CPU between map
and unmap. Which is not the case here.

In particular I traced what dma_{map,unmap} do depending
on the DMA direction and deep down the callstack some
flush/invalidate stuff is called. And the GEM cases show
that this is somehow working.



Next I looked into Jerome's question about whethere the G2D is
cache coherent with the CPU. I looked into old Android code and
found FIMG2D_AXI_MODE_REG, a register that currently isn't
touched in the DRM code.
It seems to manipulate signals to the AXI Master interface.

The register looks like this:
[0:3] ARCACHE
[4:7] AWCACHE
[8:15] ARUSERS
[16:23] AWUSERS
[24:25] MaxBurstLength

If I understand ARM's AXI specifications correctly then you
can set cache coherent operation by setting AxUSER[0]=1 and
AxCACHE[1]=1.

The initialization values for AxCACHE and AxUSERS are all
zero, so the default seems to be non-coherent operation.
But still after setting the bits in FIMG2D_AXI_MODE_REG I
don't see any change in behaviour -- the issue stays.



Any idea what I'm missing here?

With best wishes,
Tobias


On 2015-08-19 16:41, Rob Clark wrote:
> On Wed, Aug 19, 2015 at 10:08 AM, Jerome Glisse <jglisse@redhat.com> 
> wrote:
>> On Wed, Aug 19, 2015 at 03:53:44PM +0200, Tobias Jakobi wrote:
>>> Adding Jérôme to Cc. I think he looked the userptr code before, so 
>>> maybe
>>> he has some idea what is going wrong here.
>>> 
>>> I also had a look at the code, but my knowledge about the DMA API is
>>> almost nonexistant. However I can see that before doing any DMA via 
>>> the
>>> G2D on the buffer the code calls dma_map() on it, and also unmaps it
>>> when the commandlist is finished.
>>> 
>>> 
>>> With best wishes,
>>> Tobias
>>> 
>>> 
>>> Tobias Jakobi wrote:
>>> > Thanks Lucas for the explanation!
>>> >
>>> >
>>> > Lucas Stach wrote:
>>> >> Hi Tobias,
>>> >>
>>> >> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi:
>>> >>> Hello,
>>> >>>
>>> >>> some time ago I checked whether I could use the userptr functionality to
>>> >>> do zero-copy from userspace allocated buffers via the G2D. This didn't
>>> >>> work out so well, so kinda put this to the bottom of my TODO list.
>>> >>>
>>> >>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning
>>> >>> using frame vectors (see [1]) I gave userptr another try.
>>> >>>
>>> >>> The results are much better. I'm not experiencing any kernel lockups or
>>> >>> sysmmu pagefaults anymore. However the image now suffers from visual
>>> >>> artifacts. These images show the nature of the artifacts:
>>> >>> http://i.imgur.com/nzT6g3Y.jpg
>>> >>> http://i.imgur.com/wkuYI6X.jpg
>>> >>>
>>> >>> The corruption always manifests itself in these pixel lines of fixed
>>> >>> size and wrong color.
>>> >>>
>>> >>> I have written a testcase as part of libdrm for this issue:
>>> >>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71
>>> >>>
>>> >>> It allocates N (N an even number) buffers which are aligned to the
>>> >>> system pagesize. Then it does this each iteration:
>>> >>> 1) Fill the first N/2 buffers with random data
>>> >>> 2) Copy the first half to the second half of the buffers
>>> >>> 3) memcmp() first and second half (verification pass)
>>> >>>
>>> >>> Usually this verification already fails on the first iteration. An
>>> >>> interesting observation is that increasing (!) the buffer size (so the
>>> >>> amount of pixels that have to copied per buffer grows) makes this issue
>>> >>> less likely to happen.
>>> >>>
>>> >>> With the default 512x512 buffers however it happens, like I said above,
>>> >>> almost immediately.
>>> >>>
>>> >> This is obviously a cache flush missing. The memory you get from
>>> >> userspace is normal cached memory, so to make it visible to the GPU you
>>> >> need to flush parts of the cache out to main memory.
>>> >>
>>> >> The corruption you are seeing is just unflushed cachelines. This also
>>> >> explains why increasing the buffer size helps: the more memory the CPU
>>> >> touches the more cachelines will be flushed out to be replaced with new
>>> >> data.
>>> > I should point out that the snapshots I uploaded were done with a
>>> > different setup. There only the source memory of the G2D operation is a
>>> > userspace allocated buffer. The destination is a GEM buffer allocated
>>> > through libdrm, which is then used as framebuffer. So the issue already
>>> > appears when just the source is userspace allocated.
>>> >
>> 
>> This is still consistent with cachelines issue. Is your GPU & IOMMU 
>> cache
>> coherent with the CPU ? If not then it means you need to cache flush 
>> the
>> buffer before you use it with the GPU. The dma API provide few helpers 
>> for
>> that.
> 
> although I suspect dma-api probably not aware of any device caches
> (and I suspect a bit weak when it comes to devices that support mix of
> coherent and non-coherent mappings)..
> 
> BR,
> -R
> 
>> Cheers,
>> Jérôme
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: drm/exynos: g2d userptr memory corruption
  2015-08-27 15:10           ` Tobias Jakobi
@ 2015-08-27 17:08             ` Tobias Jakobi
  0 siblings, 0 replies; 14+ messages in thread
From: Tobias Jakobi @ 2015-08-27 17:08 UTC (permalink / raw)
  To: Tobias Jakobi, Rob Clark
  Cc: Jerome Glisse, linux-samsung-soc, ML dri-devel, Marek Szyprowski

Tobias Jakobi wrote:
> Next I looked into Jerome's question about whethere the G2D is
> cache coherent with the CPU. I looked into old Android code and
> found FIMG2D_AXI_MODE_REG, a register that currently isn't
> touched in the DRM code.
> It seems to manipulate signals to the AXI Master interface.
> 
> The register looks like this:
> [0:3] ARCACHE
> [4:7] AWCACHE
> [8:15] ARUSERS
> [16:23] AWUSERS
> [24:25] MaxBurstLength
Correction, it looks like this:
[0:3] ARCACHE
[4:7] AWCACHE
[8:12] ARUSERS
[16:20] AWUSERS
[24:25] MaxBurstLength
(the rest of the bits are reserved)

With best wishes,
Tobias

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-08-27 17:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-16 12:48 drm/exynos: g2d userptr memory corruption Tobias Jakobi
2015-08-17 10:26 ` Lucas Stach
2015-08-17 10:26 ` Lucas Stach
2015-08-17 15:41   ` Tobias Jakobi
2015-08-19 13:53     ` Tobias Jakobi
2015-08-19 14:08       ` Jerome Glisse
2015-08-19 14:41         ` Rob Clark
2015-08-27 15:10           ` Tobias Jakobi
2015-08-27 17:08             ` Tobias Jakobi
2015-08-17 15:41   ` Tobias Jakobi
2015-08-17 15:41   ` Tobias Jakobi
2015-08-17 10:26 ` Lucas Stach
  -- strict thread matches above, loose matches on Subject: below --
2015-08-16 12:48 Tobias Jakobi
2015-08-16 12:48 Tobias Jakobi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.