* [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
@ 2018-10-25 20:16 Andrey Grodzovsky
[not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Andrey Grodzovsky @ 2018-10-25 20:16 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Alexander.Deucher-5C7GfCeVMHo, Andrey Grodzovsky,
Hawking.Zhang-5C7GfCeVMHo
Problem: After GPU reset on dGPUs with gfx8 compute ring
1.0.0 fails to pass the ring test. Ring registers inspection
shows that it's active and no hang is observed (rptr == wptr)
No significant diffs were observed between CP_HQD* registers
for the ring in good and bad shape.
Fix: No clear reason why but reversing the order of ring tests
fixes the problem.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b2e1376..02f8ca5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
if (r)
goto done;
- /* Test KCQs */
- for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+ /* Test KCQs - reversing the order of rings seems to fix ring test failure
+ * after GPU reset
+ */
+ for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
ring = &adev->gfx.compute_ring[i];
r = amdgpu_ring_test_helper(ring);
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 3+ messages in thread[parent not found: <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset [not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> @ 2018-10-26 8:05 ` Christian König [not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Christian König @ 2018-10-26 8:05 UTC (permalink / raw) To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: Alexander.Deucher-5C7GfCeVMHo, Hawking.Zhang-5C7GfCeVMHo Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky: > Problem: After GPU reset on dGPUs with gfx8 compute ring > 1.0.0 fails to pass the ring test. Ring registers inspection > shows that it's active and no hang is observed (rptr == wptr) > No significant diffs were observed between CP_HQD* registers > for the ring in good and bad shape. > > Fix: No clear reason why but reversing the order of ring tests > fixes the problem. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Mhm, maybe try adding a delay before the ring test? Could be that the rings are started in reverse order as well and for some reason the first one is start tested to quickly after a reset. Anyway patch is Acked-by: Christian König <christian.koenig@amd.com> Thanks, Christian. > --- > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > index b2e1376..02f8ca5 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev) > if (r) > goto done; > > - /* Test KCQs */ > - for (i = 0; i < adev->gfx.num_compute_rings; i++) { > + /* Test KCQs - reversing the order of rings seems to fix ring test failure > + * after GPU reset > + */ > + for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) { > ring = &adev->gfx.compute_ring[i]; > r = amdgpu_ring_test_helper(ring); > } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset [not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2018-10-26 15:00 ` Grodzovsky, Andrey 0 siblings, 0 replies; 3+ messages in thread From: Grodzovsky, Andrey @ 2018-10-26 15:00 UTC (permalink / raw) To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: Deucher, Alexander, Zhang, Hawking On 10/26/2018 04:05 AM, Christian König wrote: > Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky: >> Problem: After GPU reset on dGPUs with gfx8 compute ring >> 1.0.0 fails to pass the ring test. Ring registers inspection >> shows that it's active and no hang is observed (rptr == wptr) >> No significant diffs were observed between CP_HQD* registers >> for the ring in good and bad shape. >> >> Fix: No clear reason why but reversing the order of ring tests >> fixes the problem. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > > Mhm, maybe try adding a delay before the ring test? First thing I tried, didn't help. > > Could be that the rings are started in reverse order as well and for > some reason the first one is start tested to quickly after a reset. No, KCQ queues mapping just before the test goes in 0..max order. Andrey > > Anyway patch is Acked-by: Christian König <christian.koenig@amd.com> > > Thanks, > Christian. > >> --- >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> index b2e1376..02f8ca5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct >> amdgpu_device *adev) >> if (r) >> goto done; >> - /* Test KCQs */ >> - for (i = 0; i < adev->gfx.num_compute_rings; i++) { >> + /* Test KCQs - reversing the order of rings seems to fix ring >> test failure >> + * after GPU reset >> + */ >> + for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) { >> ring = &adev->gfx.compute_ring[i]; >> r = amdgpu_ring_test_helper(ring); >> } > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-10-26 15:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-25 20:16 [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset Andrey Grodzovsky
[not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-10-26 8:05 ` Christian König
[not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-10-26 15:00 ` Grodzovsky, Andrey
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.