* Update: UVD status on loongson 3a platform
@ 2013-09-05 14:14 Chen Jie
2013-09-05 19:29 ` Jerome Glisse
0 siblings, 1 reply; 6+ messages in thread
From: Chen Jie @ 2013-09-05 14:14 UTC (permalink / raw)
To: dri-devel, mesa-dev
Cc: 陈华才, 王锐,
丁汨江
Hi all,
This thread is about
http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
We recently find some interesting thing about UVD based playback on
loongson 3a plaform, and also find a way to fix the problem.
First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
caused the problem:
* If memcpy is implemented though 16B or 8B load/store instructions,
it will normally caused video mosaic. When insert a memcmp after the
copying code in memcpy, it will report the src and dest are not equal.
* If memcpy use 1B load/store instructions only, the memcmp after the
copying code reports equal.
Then we find the following changeset fixs out problem:
diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
b/src/gallium/drivers/radeon/radeon_uvd.c
index 2f98de2..f9599b6 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
unsigned size)
{
buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
- RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
+ RADEON_DOMAIN_GTT);
if (!buffer->buf)
return false;
The VRAM is mapped to an uncached area in out platform, so, my
question is what could go wrong while using >4B load/store
instructions in UVD workflow? Any idea?
-- Regards,
Chen Jie
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Update: UVD status on loongson 3a platform
2013-09-05 14:14 Update: UVD status on loongson 3a platform Chen Jie
@ 2013-09-05 19:29 ` Jerome Glisse
2013-09-05 19:50 ` Jerome Glisse
2013-09-06 3:19 ` cee1
0 siblings, 2 replies; 6+ messages in thread
From: Jerome Glisse @ 2013-09-05 19:29 UTC (permalink / raw)
To: Chen Jie
Cc: mesa-dev, 丁汨江, 王锐, dri-devel,
陈华才
On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
> Hi all,
>
> This thread is about
> http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
>
> We recently find some interesting thing about UVD based playback on
> loongson 3a plaform, and also find a way to fix the problem.
>
> First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
> caused the problem:
> * If memcpy is implemented though 16B or 8B load/store instructions,
> it will normally caused video mosaic. When insert a memcmp after the
> copying code in memcpy, it will report the src and dest are not equal.
> * If memcpy use 1B load/store instructions only, the memcmp after the
> copying code reports equal.
>
> Then we find the following changeset fixs out problem:
>
> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
> b/src/gallium/drivers/radeon/radeon_uvd.c
> index 2f98de2..f9599b6 100644
> --- a/src/gallium/drivers/radeon/radeon_uvd.c
> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
> unsigned size)
> {
> buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
> - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
> + RADEON_DOMAIN_GTT);
> if (!buffer->buf)
> return false;
>
> The VRAM is mapped to an uncached area in out platform, so, my
> question is what could go wrong while using >4B load/store
> instructions in UVD workflow? Any idea?
>
How do you map the VRAM into user process mapping ? ie do you have
something like Intel PAT or something like MTRR or something else.
In other word, can you map into process address space a region of
io memory (GPU VRAM in this case) and mark it as uncached so that
none of the access to it goes through CPU cache.
Cheers,
Jerome
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update: UVD status on loongson 3a platform
2013-09-05 19:29 ` Jerome Glisse
@ 2013-09-05 19:50 ` Jerome Glisse
2013-09-06 2:52 ` cee1
2013-09-06 3:19 ` cee1
1 sibling, 1 reply; 6+ messages in thread
From: Jerome Glisse @ 2013-09-05 19:50 UTC (permalink / raw)
To: Chen Jie
Cc: mesa-dev, 丁汨江, 王锐, dri-devel,
陈华才
On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
> On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
> > Hi all,
> >
> > This thread is about
> > http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
> >
> > We recently find some interesting thing about UVD based playback on
> > loongson 3a plaform, and also find a way to fix the problem.
> >
> > First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
> > caused the problem:
> > * If memcpy is implemented though 16B or 8B load/store instructions,
> > it will normally caused video mosaic. When insert a memcmp after the
> > copying code in memcpy, it will report the src and dest are not equal.
> > * If memcpy use 1B load/store instructions only, the memcmp after the
> > copying code reports equal.
> >
> > Then we find the following changeset fixs out problem:
> >
> > diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
> > b/src/gallium/drivers/radeon/radeon_uvd.c
> > index 2f98de2..f9599b6 100644
> > --- a/src/gallium/drivers/radeon/radeon_uvd.c
> > +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> > @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
> > unsigned size)
> > {
> > buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
> > - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
> > + RADEON_DOMAIN_GTT);
> > if (!buffer->buf)
> > return false;
> >
> > The VRAM is mapped to an uncached area in out platform, so, my
> > question is what could go wrong while using >4B load/store
> > instructions in UVD workflow? Any idea?
> >
>
> How do you map the VRAM into user process mapping ? ie do you have
> something like Intel PAT or something like MTRR or something else.
>
> In other word, can you map into process address space a region of
> io memory (GPU VRAM in this case) and mark it as uncached so that
> none of the access to it goes through CPU cache.
>
> Cheers,
> Jerome
Also it might be that you can't do write combining on your platform,
which would be a major drawback as it's assume by radeon userspace.
I would need to check the pcie specification, but write combining is
probably not mandatory meaning that your architecture might not have
it. This would explain why only memset with byte size copy works.
Don't think there is any easy way to work around that.
Cheers,
Jerome
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update: UVD status on loongson 3a platform
2013-09-05 19:50 ` Jerome Glisse
@ 2013-09-06 2:52 ` cee1
2013-09-06 8:56 ` Christian König
0 siblings, 1 reply; 6+ messages in thread
From: cee1 @ 2013-09-06 2:52 UTC (permalink / raw)
To: Jerome Glisse
Cc: 陈华才, mesa-dev, dri-devel, 王锐,
丁汨江
2013/9/6 Jerome Glisse <j.glisse@gmail.com>:
> On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
>> On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
>> > Hi all,
>> >
>> > This thread is about
>> > http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
>> >
>> > We recently find some interesting thing about UVD based playback on
>> > loongson 3a plaform, and also find a way to fix the problem.
>> >
>> > First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
>> > caused the problem:
>> > * If memcpy is implemented though 16B or 8B load/store instructions,
>> > it will normally caused video mosaic. When insert a memcmp after the
>> > copying code in memcpy, it will report the src and dest are not equal.
>> > * If memcpy use 1B load/store instructions only, the memcmp after the
>> > copying code reports equal.
>> >
>> > Then we find the following changeset fixs out problem:
>> >
>> > diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
>> > b/src/gallium/drivers/radeon/radeon_uvd.c
>> > index 2f98de2..f9599b6 100644
>> > --- a/src/gallium/drivers/radeon/radeon_uvd.c
>> > +++ b/src/gallium/drivers/radeon/radeon_uvd.c
>> > @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
>> > unsigned size)
>> > {
>> > buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
>> > - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
>> > + RADEON_DOMAIN_GTT);
>> > if (!buffer->buf)
>> > return false;
>> >
>> > The VRAM is mapped to an uncached area in out platform, so, my
>> > question is what could go wrong while using >4B load/store
>> > instructions in UVD workflow? Any idea?
>> >
>>
>> How do you map the VRAM into user process mapping ? ie do you have
>> something like Intel PAT or something like MTRR or something else.
>>
>> In other word, can you map into process address space a region of
>> io memory (GPU VRAM in this case) and mark it as uncached so that
>> none of the access to it goes through CPU cache.
>>
>> Cheers,
>> Jerome
>
> Also it might be that you can't do write combining on your platform,
> which would be a major drawback as it's assume by radeon userspace.
> I would need to check the pcie specification, but write combining is
> probably not mandatory meaning that your architecture might not have
> it. This would explain why only memset with byte size copy works.
>
> Don't think there is any easy way to work around that.
The original mesa code allows to allocate buffer in GTT and VRAM
domain. And we change it so that all buffers are allocated in GTT
domain, it seems fix our problem.
--
Regards,
- cee1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update: UVD status on loongson 3a platform
2013-09-05 19:29 ` Jerome Glisse
2013-09-05 19:50 ` Jerome Glisse
@ 2013-09-06 3:19 ` cee1
1 sibling, 0 replies; 6+ messages in thread
From: cee1 @ 2013-09-06 3:19 UTC (permalink / raw)
To: Jerome Glisse
Cc: 陈华才, mesa-dev, dri-devel, 王锐,
丁汨江
2013/9/6 Jerome Glisse <j.glisse@gmail.com>:
> On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
>> Hi all,
>>
>> This thread is about
>> http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
>>
>> We recently find some interesting thing about UVD based playback on
>> loongson 3a plaform, and also find a way to fix the problem.
>>
>> First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
>> caused the problem:
>> * If memcpy is implemented though 16B or 8B load/store instructions,
>> it will normally caused video mosaic. When insert a memcmp after the
>> copying code in memcpy, it will report the src and dest are not equal.
>> * If memcpy use 1B load/store instructions only, the memcmp after the
>> copying code reports equal.
>>
>> Then we find the following changeset fixs out problem:
>>
>> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
>> b/src/gallium/drivers/radeon/radeon_uvd.c
>> index 2f98de2..f9599b6 100644
>> --- a/src/gallium/drivers/radeon/radeon_uvd.c
>> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
>> @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
>> unsigned size)
>> {
>> buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
>> - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
>> + RADEON_DOMAIN_GTT);
>> if (!buffer->buf)
>> return false;
>>
>> The VRAM is mapped to an uncached area in out platform, so, my
>> question is what could go wrong while using >4B load/store
>> instructions in UVD workflow? Any idea?
>>
>
> How do you map the VRAM into user process mapping ? ie do you have
> something like Intel PAT or something like MTRR or something else.
>
> In other word, can you map into process address space a region of
> io memory (GPU VRAM in this case) and mark it as uncached so that
> none of the access to it goes through CPU cache.
Yes, of course.
On mips, there's a specific range of address space that is used to
access IO memory directly, and the address of VRAM BOs is just in this
range.
--
Regards,
- cee1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update: UVD status on loongson 3a platform
2013-09-06 2:52 ` cee1
@ 2013-09-06 8:56 ` Christian König
0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2013-09-06 8:56 UTC (permalink / raw)
To: cee1
Cc: 王锐, dri-devel, 丁汨江, mesa-dev,
陈华才
Am 06.09.2013 04:52, schrieb cee1:
> 2013/9/6 Jerome Glisse <j.glisse@gmail.com>:
>> On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
>>> On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
>>>> Hi all,
>>>>
>>>> This thread is about
>>>> http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
>>>>
>>>> We recently find some interesting thing about UVD based playback on
>>>> loongson 3a plaform, and also find a way to fix the problem.
>>>>
>>>> First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
>>>> caused the problem:
>>>> * If memcpy is implemented though 16B or 8B load/store instructions,
>>>> it will normally caused video mosaic. When insert a memcmp after the
>>>> copying code in memcpy, it will report the src and dest are not equal.
>>>> * If memcpy use 1B load/store instructions only, the memcmp after the
>>>> copying code reports equal.
>>>>
>>>> Then we find the following changeset fixs out problem:
>>>>
>>>> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
>>>> b/src/gallium/drivers/radeon/radeon_uvd.c
>>>> index 2f98de2..f9599b6 100644
>>>> --- a/src/gallium/drivers/radeon/radeon_uvd.c
>>>> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
>>>> @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
>>>> unsigned size)
>>>> {
>>>> buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
>>>> - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
>>>> + RADEON_DOMAIN_GTT);
>>>> if (!buffer->buf)
>>>> return false;
>>>>
>>>> The VRAM is mapped to an uncached area in out platform, so, my
>>>> question is what could go wrong while using >4B load/store
>>>> instructions in UVD workflow? Any idea?
>>>>
>>> How do you map the VRAM into user process mapping ? ie do you have
>>> something like Intel PAT or something like MTRR or something else.
>>>
>>> In other word, can you map into process address space a region of
>>> io memory (GPU VRAM in this case) and mark it as uncached so that
>>> none of the access to it goes through CPU cache.
>>>
>>> Cheers,
>>> Jerome
>> Also it might be that you can't do write combining on your platform,
>> which would be a major drawback as it's assume by radeon userspace.
>> I would need to check the pcie specification, but write combining is
>> probably not mandatory meaning that your architecture might not have
>> it. This would explain why only memset with byte size copy works.
>>
>> Don't think there is any easy way to work around that.
> The original mesa code allows to allocate buffer in GTT and VRAM
> domain. And we change it so that all buffers are allocated in GTT
> domain, it seems fix our problem.
Actually it's not a fix, but a quite ugly hack instead.
Depending on the UVD generation some buffer *must* be allocated in VRAM,
only starting with NI+ most buffers can be in GTT space instead and I'm
not even 100% sure that this feature is validated/working reliable.
Anyway, not having a reliable CPU access to VRAM is a quite critical
platform bug that should be fixed before even thinking about UVD support.
Christian.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-09-06 8:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-05 14:14 Update: UVD status on loongson 3a platform Chen Jie
2013-09-05 19:29 ` Jerome Glisse
2013-09-05 19:50 ` Jerome Glisse
2013-09-06 2:52 ` cee1
2013-09-06 8:56 ` Christian König
2013-09-06 3:19 ` cee1
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.