All of lore.kernel.org
 help / color / mirror / Atom feed
* Radeon R700 multi-ring bug
@ 2014-04-18 23:48 Marek Olšák
  2014-04-19  9:54 ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Olšák @ 2014-04-18 23:48 UTC (permalink / raw)
  To: dri-devel

Hi,

If you submit a lot of graphics and DMA IBs interleaved, the graphics
CS checker sometimes fails with this message:

[ 3846.435661] Forbidden register 0x0014 in cs at 9
[ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !

This error is only used for type-0 packets, but we don't use these
packets on R700 at all. Somehow, the graphics CS checker received
either the DMA IB or random garbage. My guess is there is memory
corruption happening during IB uploading and/or IB checking in the
kernel. Also, if you are unlucky, the GPU hangs instead.

The CS thread offloading was disabled in Mesa, so the user space was
single-threaded.

There are 2 ways to fix this:
- disable async DMA in Mesa
- call usleep(1) after the RADEON_CS ioctl returns

This is just a heads-up. In the worst case, we can disable async DMA
for R700 in Mesa.

Marek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Radeon R700 multi-ring bug
  2014-04-18 23:48 Radeon R700 multi-ring bug Marek Olšák
@ 2014-04-19  9:54 ` Christian König
  2014-04-19 14:43   ` Alex Deucher
  2014-04-19 15:07   ` Marek Olšák
  0 siblings, 2 replies; 6+ messages in thread
From: Christian König @ 2014-04-19  9:54 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Hi Marek,

I've noticed this before as well, and I agree that it looks like a 
memory corruption. Not sure if the async DMA on the GPU or the CPU is 
overwriting something because of a race condition or something like this.

Anyway, can you come up with a simple test case to reproduce the issue? 
For me it occurred only randomly while working on UVD support for R7xx. 
If you have something more reliable I could dig into it with my RV710.

Christian.

Am 19.04.2014 01:48, schrieb Marek Olšák:
> Hi,
>
> If you submit a lot of graphics and DMA IBs interleaved, the graphics
> CS checker sometimes fails with this message:
>
> [ 3846.435661] Forbidden register 0x0014 in cs at 9
> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
>
> This error is only used for type-0 packets, but we don't use these
> packets on R700 at all. Somehow, the graphics CS checker received
> either the DMA IB or random garbage. My guess is there is memory
> corruption happening during IB uploading and/or IB checking in the
> kernel. Also, if you are unlucky, the GPU hangs instead.
>
> The CS thread offloading was disabled in Mesa, so the user space was
> single-threaded.
>
> There are 2 ways to fix this:
> - disable async DMA in Mesa
> - call usleep(1) after the RADEON_CS ioctl returns
>
> This is just a heads-up. In the worst case, we can disable async DMA
> for R700 in Mesa.
>
> Marek
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Radeon R700 multi-ring bug
  2014-04-19  9:54 ` Christian König
@ 2014-04-19 14:43   ` Alex Deucher
  2014-04-19 15:07   ` Marek Olšák
  1 sibling, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2014-04-19 14:43 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Sat, Apr 19, 2014 at 5:54 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Hi Marek,
>
> I've noticed this before as well, and I agree that it looks like a memory
> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting
> something because of a race condition or something like this.
>
> Anyway, can you come up with a simple test case to reproduce the issue? For
> me it occurred only randomly while working on UVD support for R7xx. If you
> have something more reliable I could dig into it with my RV710.
>

Double check the code in the kernel that copies the IB from the user
copy to the kernel copy.  We had a bug when we first merged DMA
support where we weren't properly copying the across the IBs.  There
may still be a case where this is an issue:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=de0babd60d8d43b58fd06a7803151d32cb589af0

Alex

> Christian.
>
> Am 19.04.2014 01:48, schrieb Marek Olšák:
>
>> Hi,
>>
>> If you submit a lot of graphics and DMA IBs interleaved, the graphics
>> CS checker sometimes fails with this message:
>>
>> [ 3846.435661] Forbidden register 0x0014 in cs at 9
>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
>>
>> This error is only used for type-0 packets, but we don't use these
>> packets on R700 at all. Somehow, the graphics CS checker received
>> either the DMA IB or random garbage. My guess is there is memory
>> corruption happening during IB uploading and/or IB checking in the
>> kernel. Also, if you are unlucky, the GPU hangs instead.
>>
>> The CS thread offloading was disabled in Mesa, so the user space was
>> single-threaded.
>>
>> There are 2 ways to fix this:
>> - disable async DMA in Mesa
>> - call usleep(1) after the RADEON_CS ioctl returns
>>
>> This is just a heads-up. In the worst case, we can disable async DMA
>> for R700 in Mesa.
>>
>> Marek
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Radeon R700 multi-ring bug
  2014-04-19  9:54 ` Christian König
  2014-04-19 14:43   ` Alex Deucher
@ 2014-04-19 15:07   ` Marek Olšák
  2014-04-19 15:19     ` Alex Deucher
  1 sibling, 1 reply; 6+ messages in thread
From: Marek Olšák @ 2014-04-19 15:07 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

This test always reproduces the issue for me:

piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto

There are rejected IBs and it hangs sometimes.

It started to fail with this commit:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655

which is probably unrelated to the issue, but it makes the graphics IB
a little bit bigger.

Also, I think R700 is generally in a bad shape. I haven't been able to
run piglit with concurrency and without hangs, and I have already
disabled async DMA, geometry shaders, and pipelined buffer uploads.

Marek

On Sat, Apr 19, 2014 at 11:54 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Hi Marek,
>
> I've noticed this before as well, and I agree that it looks like a memory
> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting
> something because of a race condition or something like this.
>
> Anyway, can you come up with a simple test case to reproduce the issue? For
> me it occurred only randomly while working on UVD support for R7xx. If you
> have something more reliable I could dig into it with my RV710.
>
> Christian.
>
> Am 19.04.2014 01:48, schrieb Marek Olšák:
>>
>> Hi,
>>
>> If you submit a lot of graphics and DMA IBs interleaved, the graphics
>> CS checker sometimes fails with this message:
>>
>> [ 3846.435661] Forbidden register 0x0014 in cs at 9
>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
>>
>> This error is only used for type-0 packets, but we don't use these
>> packets on R700 at all. Somehow, the graphics CS checker received
>> either the DMA IB or random garbage. My guess is there is memory
>> corruption happening during IB uploading and/or IB checking in the
>> kernel. Also, if you are unlucky, the GPU hangs instead.
>>
>> The CS thread offloading was disabled in Mesa, so the user space was
>> single-threaded.
>>
>> There are 2 ways to fix this:
>> - disable async DMA in Mesa
>> - call usleep(1) after the RADEON_CS ioctl returns
>>
>> This is just a heads-up. In the worst case, we can disable async DMA
>> for R700 in Mesa.
>>
>> Marek
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Radeon R700 multi-ring bug
  2014-04-19 15:07   ` Marek Olšák
@ 2014-04-19 15:19     ` Alex Deucher
  2014-04-19 16:09       ` Marek Olšák
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Deucher @ 2014-04-19 15:19 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On Sat, Apr 19, 2014 at 11:07 AM, Marek Olšák <maraeo@gmail.com> wrote:
> This test always reproduces the issue for me:
>
> piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto
>
> There are rejected IBs and it hangs sometimes.
>
> It started to fail with this commit:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655
>
> which is probably unrelated to the issue, but it makes the graphics IB
> a little bit bigger.
>
> Also, I think R700 is generally in a bad shape. I haven't been able to
> run piglit with concurrency and without hangs, and I have already
> disabled async DMA, geometry shaders, and pipelined buffer uploads.

See if disabling dpm helps.

Alex

>
> Marek
>
> On Sat, Apr 19, 2014 at 11:54 AM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Hi Marek,
>>
>> I've noticed this before as well, and I agree that it looks like a memory
>> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting
>> something because of a race condition or something like this.
>>
>> Anyway, can you come up with a simple test case to reproduce the issue? For
>> me it occurred only randomly while working on UVD support for R7xx. If you
>> have something more reliable I could dig into it with my RV710.
>>
>> Christian.
>>
>> Am 19.04.2014 01:48, schrieb Marek Olšák:
>>>
>>> Hi,
>>>
>>> If you submit a lot of graphics and DMA IBs interleaved, the graphics
>>> CS checker sometimes fails with this message:
>>>
>>> [ 3846.435661] Forbidden register 0x0014 in cs at 9
>>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
>>>
>>> This error is only used for type-0 packets, but we don't use these
>>> packets on R700 at all. Somehow, the graphics CS checker received
>>> either the DMA IB or random garbage. My guess is there is memory
>>> corruption happening during IB uploading and/or IB checking in the
>>> kernel. Also, if you are unlucky, the GPU hangs instead.
>>>
>>> The CS thread offloading was disabled in Mesa, so the user space was
>>> single-threaded.
>>>
>>> There are 2 ways to fix this:
>>> - disable async DMA in Mesa
>>> - call usleep(1) after the RADEON_CS ioctl returns
>>>
>>> This is just a heads-up. In the worst case, we can disable async DMA
>>> for R700 in Mesa.
>>>
>>> Marek
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Radeon R700 multi-ring bug
  2014-04-19 15:19     ` Alex Deucher
@ 2014-04-19 16:09       ` Marek Olšák
  0 siblings, 0 replies; 6+ messages in thread
From: Marek Olšák @ 2014-04-19 16:09 UTC (permalink / raw)
  To: Alex Deucher; +Cc: dri-devel

Disabling DPM makes no difference.

Marek

On Sat, Apr 19, 2014 at 5:19 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Sat, Apr 19, 2014 at 11:07 AM, Marek Olšák <maraeo@gmail.com> wrote:
>> This test always reproduces the issue for me:
>>
>> piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto
>>
>> There are rejected IBs and it hangs sometimes.
>>
>> It started to fail with this commit:
>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655
>>
>> which is probably unrelated to the issue, but it makes the graphics IB
>> a little bit bigger.
>>
>> Also, I think R700 is generally in a bad shape. I haven't been able to
>> run piglit with concurrency and without hangs, and I have already
>> disabled async DMA, geometry shaders, and pipelined buffer uploads.
>
> See if disabling dpm helps.
>
> Alex
>
>>
>> Marek
>>
>> On Sat, Apr 19, 2014 at 11:54 AM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Hi Marek,
>>>
>>> I've noticed this before as well, and I agree that it looks like a memory
>>> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting
>>> something because of a race condition or something like this.
>>>
>>> Anyway, can you come up with a simple test case to reproduce the issue? For
>>> me it occurred only randomly while working on UVD support for R7xx. If you
>>> have something more reliable I could dig into it with my RV710.
>>>
>>> Christian.
>>>
>>> Am 19.04.2014 01:48, schrieb Marek Olšák:
>>>>
>>>> Hi,
>>>>
>>>> If you submit a lot of graphics and DMA IBs interleaved, the graphics
>>>> CS checker sometimes fails with this message:
>>>>
>>>> [ 3846.435661] Forbidden register 0x0014 in cs at 9
>>>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
>>>>
>>>> This error is only used for type-0 packets, but we don't use these
>>>> packets on R700 at all. Somehow, the graphics CS checker received
>>>> either the DMA IB or random garbage. My guess is there is memory
>>>> corruption happening during IB uploading and/or IB checking in the
>>>> kernel. Also, if you are unlucky, the GPU hangs instead.
>>>>
>>>> The CS thread offloading was disabled in Mesa, so the user space was
>>>> single-threaded.
>>>>
>>>> There are 2 ways to fix this:
>>>> - disable async DMA in Mesa
>>>> - call usleep(1) after the RADEON_CS ioctl returns
>>>>
>>>> This is just a heads-up. In the worst case, we can disable async DMA
>>>> for R700 in Mesa.
>>>>
>>>> Marek
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-19 16:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-18 23:48 Radeon R700 multi-ring bug Marek Olšák
2014-04-19  9:54 ` Christian König
2014-04-19 14:43   ` Alex Deucher
2014-04-19 15:07   ` Marek Olšák
2014-04-19 15:19     ` Alex Deucher
2014-04-19 16:09       ` Marek Olšák

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.