* Radeon R700 multi-ring bug @ 2014-04-18 23:48 Marek Olšák 2014-04-19 9:54 ` Christian König 0 siblings, 1 reply; 6+ messages in thread From: Marek Olšák @ 2014-04-18 23:48 UTC (permalink / raw) To: dri-devel Hi, If you submit a lot of graphics and DMA IBs interleaved, the graphics CS checker sometimes fails with this message: [ 3846.435661] Forbidden register 0x0014 in cs at 9 [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! This error is only used for type-0 packets, but we don't use these packets on R700 at all. Somehow, the graphics CS checker received either the DMA IB or random garbage. My guess is there is memory corruption happening during IB uploading and/or IB checking in the kernel. Also, if you are unlucky, the GPU hangs instead. The CS thread offloading was disabled in Mesa, so the user space was single-threaded. There are 2 ways to fix this: - disable async DMA in Mesa - call usleep(1) after the RADEON_CS ioctl returns This is just a heads-up. In the worst case, we can disable async DMA for R700 in Mesa. Marek ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Radeon R700 multi-ring bug 2014-04-18 23:48 Radeon R700 multi-ring bug Marek Olšák @ 2014-04-19 9:54 ` Christian König 2014-04-19 14:43 ` Alex Deucher 2014-04-19 15:07 ` Marek Olšák 0 siblings, 2 replies; 6+ messages in thread From: Christian König @ 2014-04-19 9:54 UTC (permalink / raw) To: Marek Olšák, dri-devel Hi Marek, I've noticed this before as well, and I agree that it looks like a memory corruption. Not sure if the async DMA on the GPU or the CPU is overwriting something because of a race condition or something like this. Anyway, can you come up with a simple test case to reproduce the issue? For me it occurred only randomly while working on UVD support for R7xx. If you have something more reliable I could dig into it with my RV710. Christian. Am 19.04.2014 01:48, schrieb Marek Olšák: > Hi, > > If you submit a lot of graphics and DMA IBs interleaved, the graphics > CS checker sometimes fails with this message: > > [ 3846.435661] Forbidden register 0x0014 in cs at 9 > [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! > > This error is only used for type-0 packets, but we don't use these > packets on R700 at all. Somehow, the graphics CS checker received > either the DMA IB or random garbage. My guess is there is memory > corruption happening during IB uploading and/or IB checking in the > kernel. Also, if you are unlucky, the GPU hangs instead. > > The CS thread offloading was disabled in Mesa, so the user space was > single-threaded. > > There are 2 ways to fix this: > - disable async DMA in Mesa > - call usleep(1) after the RADEON_CS ioctl returns > > This is just a heads-up. In the worst case, we can disable async DMA > for R700 in Mesa. > > Marek > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Radeon R700 multi-ring bug 2014-04-19 9:54 ` Christian König @ 2014-04-19 14:43 ` Alex Deucher 2014-04-19 15:07 ` Marek Olšák 1 sibling, 0 replies; 6+ messages in thread From: Alex Deucher @ 2014-04-19 14:43 UTC (permalink / raw) To: Christian König; +Cc: dri-devel On Sat, Apr 19, 2014 at 5:54 AM, Christian König <deathsimple@vodafone.de> wrote: > Hi Marek, > > I've noticed this before as well, and I agree that it looks like a memory > corruption. Not sure if the async DMA on the GPU or the CPU is overwriting > something because of a race condition or something like this. > > Anyway, can you come up with a simple test case to reproduce the issue? For > me it occurred only randomly while working on UVD support for R7xx. If you > have something more reliable I could dig into it with my RV710. > Double check the code in the kernel that copies the IB from the user copy to the kernel copy. We had a bug when we first merged DMA support where we weren't properly copying the across the IBs. There may still be a case where this is an issue: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=de0babd60d8d43b58fd06a7803151d32cb589af0 Alex > Christian. > > Am 19.04.2014 01:48, schrieb Marek Olšák: > >> Hi, >> >> If you submit a lot of graphics and DMA IBs interleaved, the graphics >> CS checker sometimes fails with this message: >> >> [ 3846.435661] Forbidden register 0x0014 in cs at 9 >> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! >> >> This error is only used for type-0 packets, but we don't use these >> packets on R700 at all. Somehow, the graphics CS checker received >> either the DMA IB or random garbage. My guess is there is memory >> corruption happening during IB uploading and/or IB checking in the >> kernel. Also, if you are unlucky, the GPU hangs instead. >> >> The CS thread offloading was disabled in Mesa, so the user space was >> single-threaded. >> >> There are 2 ways to fix this: >> - disable async DMA in Mesa >> - call usleep(1) after the RADEON_CS ioctl returns >> >> This is just a heads-up. In the worst case, we can disable async DMA >> for R700 in Mesa. >> >> Marek >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Radeon R700 multi-ring bug 2014-04-19 9:54 ` Christian König 2014-04-19 14:43 ` Alex Deucher @ 2014-04-19 15:07 ` Marek Olšák 2014-04-19 15:19 ` Alex Deucher 1 sibling, 1 reply; 6+ messages in thread From: Marek Olšák @ 2014-04-19 15:07 UTC (permalink / raw) To: Christian König; +Cc: dri-devel This test always reproduces the issue for me: piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto There are rejected IBs and it hangs sometimes. It started to fail with this commit: http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655 which is probably unrelated to the issue, but it makes the graphics IB a little bit bigger. Also, I think R700 is generally in a bad shape. I haven't been able to run piglit with concurrency and without hangs, and I have already disabled async DMA, geometry shaders, and pipelined buffer uploads. Marek On Sat, Apr 19, 2014 at 11:54 AM, Christian König <deathsimple@vodafone.de> wrote: > Hi Marek, > > I've noticed this before as well, and I agree that it looks like a memory > corruption. Not sure if the async DMA on the GPU or the CPU is overwriting > something because of a race condition or something like this. > > Anyway, can you come up with a simple test case to reproduce the issue? For > me it occurred only randomly while working on UVD support for R7xx. If you > have something more reliable I could dig into it with my RV710. > > Christian. > > Am 19.04.2014 01:48, schrieb Marek Olšák: >> >> Hi, >> >> If you submit a lot of graphics and DMA IBs interleaved, the graphics >> CS checker sometimes fails with this message: >> >> [ 3846.435661] Forbidden register 0x0014 in cs at 9 >> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! >> >> This error is only used for type-0 packets, but we don't use these >> packets on R700 at all. Somehow, the graphics CS checker received >> either the DMA IB or random garbage. My guess is there is memory >> corruption happening during IB uploading and/or IB checking in the >> kernel. Also, if you are unlucky, the GPU hangs instead. >> >> The CS thread offloading was disabled in Mesa, so the user space was >> single-threaded. >> >> There are 2 ways to fix this: >> - disable async DMA in Mesa >> - call usleep(1) after the RADEON_CS ioctl returns >> >> This is just a heads-up. In the worst case, we can disable async DMA >> for R700 in Mesa. >> >> Marek >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Radeon R700 multi-ring bug 2014-04-19 15:07 ` Marek Olšák @ 2014-04-19 15:19 ` Alex Deucher 2014-04-19 16:09 ` Marek Olšák 0 siblings, 1 reply; 6+ messages in thread From: Alex Deucher @ 2014-04-19 15:19 UTC (permalink / raw) To: Marek Olšák; +Cc: dri-devel On Sat, Apr 19, 2014 at 11:07 AM, Marek Olšák <maraeo@gmail.com> wrote: > This test always reproduces the issue for me: > > piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto > > There are rejected IBs and it hangs sometimes. > > It started to fail with this commit: > http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655 > > which is probably unrelated to the issue, but it makes the graphics IB > a little bit bigger. > > Also, I think R700 is generally in a bad shape. I haven't been able to > run piglit with concurrency and without hangs, and I have already > disabled async DMA, geometry shaders, and pipelined buffer uploads. See if disabling dpm helps. Alex > > Marek > > On Sat, Apr 19, 2014 at 11:54 AM, Christian König > <deathsimple@vodafone.de> wrote: >> Hi Marek, >> >> I've noticed this before as well, and I agree that it looks like a memory >> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting >> something because of a race condition or something like this. >> >> Anyway, can you come up with a simple test case to reproduce the issue? For >> me it occurred only randomly while working on UVD support for R7xx. If you >> have something more reliable I could dig into it with my RV710. >> >> Christian. >> >> Am 19.04.2014 01:48, schrieb Marek Olšák: >>> >>> Hi, >>> >>> If you submit a lot of graphics and DMA IBs interleaved, the graphics >>> CS checker sometimes fails with this message: >>> >>> [ 3846.435661] Forbidden register 0x0014 in cs at 9 >>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! >>> >>> This error is only used for type-0 packets, but we don't use these >>> packets on R700 at all. Somehow, the graphics CS checker received >>> either the DMA IB or random garbage. My guess is there is memory >>> corruption happening during IB uploading and/or IB checking in the >>> kernel. Also, if you are unlucky, the GPU hangs instead. >>> >>> The CS thread offloading was disabled in Mesa, so the user space was >>> single-threaded. >>> >>> There are 2 ways to fix this: >>> - disable async DMA in Mesa >>> - call usleep(1) after the RADEON_CS ioctl returns >>> >>> This is just a heads-up. In the worst case, we can disable async DMA >>> for R700 in Mesa. >>> >>> Marek >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Radeon R700 multi-ring bug 2014-04-19 15:19 ` Alex Deucher @ 2014-04-19 16:09 ` Marek Olšák 0 siblings, 0 replies; 6+ messages in thread From: Marek Olšák @ 2014-04-19 16:09 UTC (permalink / raw) To: Alex Deucher; +Cc: dri-devel Disabling DPM makes no difference. Marek On Sat, Apr 19, 2014 at 5:19 PM, Alex Deucher <alexdeucher@gmail.com> wrote: > On Sat, Apr 19, 2014 at 11:07 AM, Marek Olšák <maraeo@gmail.com> wrote: >> This test always reproduces the issue for me: >> >> piglit/bin/arb_vertex_buffer_object-vbo-subdata-many drawarrays -fbo -auto >> >> There are rejected IBs and it hangs sometimes. >> >> It started to fail with this commit: >> http://cgit.freedesktop.org/mesa/mesa/commit/?id=6d434252e239bc872549e59c64eb3d0e5dab0655 >> >> which is probably unrelated to the issue, but it makes the graphics IB >> a little bit bigger. >> >> Also, I think R700 is generally in a bad shape. I haven't been able to >> run piglit with concurrency and without hangs, and I have already >> disabled async DMA, geometry shaders, and pipelined buffer uploads. > > See if disabling dpm helps. > > Alex > >> >> Marek >> >> On Sat, Apr 19, 2014 at 11:54 AM, Christian König >> <deathsimple@vodafone.de> wrote: >>> Hi Marek, >>> >>> I've noticed this before as well, and I agree that it looks like a memory >>> corruption. Not sure if the async DMA on the GPU or the CPU is overwriting >>> something because of a race condition or something like this. >>> >>> Anyway, can you come up with a simple test case to reproduce the issue? For >>> me it occurred only randomly while working on UVD support for R7xx. If you >>> have something more reliable I could dig into it with my RV710. >>> >>> Christian. >>> >>> Am 19.04.2014 01:48, schrieb Marek Olšák: >>>> >>>> Hi, >>>> >>>> If you submit a lot of graphics and DMA IBs interleaved, the graphics >>>> CS checker sometimes fails with this message: >>>> >>>> [ 3846.435661] Forbidden register 0x0014 in cs at 9 >>>> [ 3846.435664] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! >>>> >>>> This error is only used for type-0 packets, but we don't use these >>>> packets on R700 at all. Somehow, the graphics CS checker received >>>> either the DMA IB or random garbage. My guess is there is memory >>>> corruption happening during IB uploading and/or IB checking in the >>>> kernel. Also, if you are unlucky, the GPU hangs instead. >>>> >>>> The CS thread offloading was disabled in Mesa, so the user space was >>>> single-threaded. >>>> >>>> There are 2 ways to fix this: >>>> - disable async DMA in Mesa >>>> - call usleep(1) after the RADEON_CS ioctl returns >>>> >>>> This is just a heads-up. In the worst case, we can disable async DMA >>>> for R700 in Mesa. >>>> >>>> Marek >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-19 16:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-18 23:48 Radeon R700 multi-ring bug Marek Olšák 2014-04-19 9:54 ` Christian König 2014-04-19 14:43 ` Alex Deucher 2014-04-19 15:07 ` Marek Olšák 2014-04-19 15:19 ` Alex Deucher 2014-04-19 16:09 ` Marek Olšák
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.