* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
@ 2019-04-01 16:02 ` bugzilla-daemon
2019-04-02 7:38 ` bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-01 16:02 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |alexdeucher@gmail.com
--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) ---
This is probably a mesa bug. I'd suggest trying a new version of mesa or
filing a mesa bug.
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
2019-04-01 16:02 ` [Bug 203111] " bugzilla-daemon
@ 2019-04-02 7:38 ` bugzilla-daemon
2019-04-05 18:44 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-02 7:38 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
Thomas (v10lator@myway.de) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |INVALID
--- Comment #2 from Thomas (v10lator@myway.de) ---
(In reply to Alex Deucher from comment #1)
> This is probably a mesa bug. I'd suggest trying a new version of mesa
That helped, thank you.
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
2019-04-01 16:02 ` [Bug 203111] " bugzilla-daemon
2019-04-02 7:38 ` bugzilla-daemon
@ 2019-04-05 18:44 ` bugzilla-daemon
2019-04-05 20:31 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-05 18:44 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
Thomas (v10lator@myway.de) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|INVALID |---
--- Comment #3 from Thomas (v10lator@myway.de) ---
(In reply to Alex Deucher from comment #1)
> I'd suggest trying a new version of mesa
I was too fast with closing this: It crashes with newer mesa, too, just
(subjective) less frequent.
Here's a log from mesa 19.0.1:
> [178793.032358] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled
> seq=12332054, emitted seq=12332056
> [178793.032362] [drm:amdgpu_job_timedout] *ERROR* Process information:
> process Dirt4 pid 31348 thread WebViewRenderer pid 31422
> [178793.032365] amdgpu 0000:01:00.0: GPU reset begin!
> [178803.262008] [drm:amdgpu_dm_atomic_check] *ERROR* [CRTC:47:crtc-0] hw_done
> or flip_done timed out
And from git (26e161b1e9):
> [ 7819.095648] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled
> seq=2652771, emitted seq=2652773
> [ 7819.095652] [drm:amdgpu_job_timedout] *ERROR* Process information: process
> Dirt4 pid 3075 thread WebViewRenderer pid 3152
> [ 7819.095655] amdgpu 0000:01:00.0: GPU reset begin!
> [ 7829.315220] [drm:amdgpu_dm_atomic_check] *ERROR* [CRTC:47:crtc-0] hw_done
> or flip_done timed out
Not sure if the log is shorter cause of new mesa or new kernel (updated from
5.0.4 to 5.0.5).
Are you sure this could be a mesa bug? Just asking cause for me a hanging
kernel sounds like a kernel bug.
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
` (2 preceding siblings ...)
2019-04-05 18:44 ` bugzilla-daemon
@ 2019-04-05 20:31 ` bugzilla-daemon
2019-04-05 21:15 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-05 20:31 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
--- Comment #4 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Thomas from comment #3)
>
> Are you sure this could be a mesa bug? Just asking cause for me a hanging
> kernel sounds like a kernel bug.
Likely a mesa bug. Mesa submits gfx/video/compute jobs to the kernel driver.
If there are subtle bugs in those jobs, the GPU can hang. The kernel driver
can reset the GPU, but the display server needs to catch the reset and properly
re-initialize it's context and buffers. At the moment, none of the display
servers do this so you need to restart them after a GPU reset.
The:
[drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
error is because userspace tried to submit more work to the kernel after a
reset without re-initializing it's context, so the kernel rejects it.
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
` (3 preceding siblings ...)
2019-04-05 20:31 ` bugzilla-daemon
@ 2019-04-05 21:15 ` bugzilla-daemon
2019-04-05 21:15 ` bugzilla-daemon
2019-04-09 1:39 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-05 21:15 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
--- Comment #5 from Thomas (v10lator@myway.de) ---
Thanks a lot for the detailed answer. I'm still not sure if I understand
everything correctly (shouldn't the kernel driver validate the command stream
from userspace/mesa and stop bad things before they hit hardware / hang the
GPU?) but I'll close this now and check for or open a new mesa bug report
tomorrow (I really need sleep now).
Damn, if this wouldn't be the wrong place I would ask for more details about
your last reply (the thing about the display servers not catching up with the
GPU reset - aren't there drivers which perform GPU resets just nice under X11
already? What about Wayland?). It's so freaking nice, I bet I would learn a lot
if we wold continue the discussion... Anyway, thanks again for explaining and
sorry for me going a bit off topic in this reply.
One last thing... It's exremely off topic but I already derailed this reply and
it has to be told: Thank you Alex for being the guy you are. I bet AMD doesn't
pay you to explain technical details to stupid end users like me but that's
very appreciated. You're a hero, keep on rockin'!
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
` (4 preceding siblings ...)
2019-04-05 21:15 ` bugzilla-daemon
@ 2019-04-05 21:15 ` bugzilla-daemon
2019-04-09 1:39 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-05 21:15 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
Thomas (v10lator@myway.de) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution|--- |INVALID
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* [Bug 203111] Unrecoverable GPU crash with DiRT 4
2019-03-30 9:29 [Bug 203111] New: Unrecoverable GPU crash with DiRT 4 bugzilla-daemon
` (5 preceding siblings ...)
2019-04-05 21:15 ` bugzilla-daemon
@ 2019-04-09 1:39 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-04-09 1:39 UTC (permalink / raw)
To: dri-devel
https://bugzilla.kernel.org/show_bug.cgi?id=203111
--- Comment #6 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Thomas from comment #5)
> Thanks a lot for the detailed answer. I'm still not sure if I understand
> everything correctly (shouldn't the kernel driver validate the command
> stream from userspace/mesa and stop bad things before they hit hardware /
> hang the GPU?)
It's not really feasible. For one, it adds a lot of CPU overhead. There is
also so much state in the 3D pipeline it's nearly impossible to validate all of
the possible cases that could cause a hang. In some cases, you may not even
know that a particular combination is bad until it gets hit.
>
> Damn, if this wouldn't be the wrong place I would ask for more details about
> your last reply (the thing about the display servers not catching up with
> the GPU reset - aren't there drivers which perform GPU resets just nice
> under X11 already? What about Wayland?). It's so freaking nice, I bet I
> would learn a lot if we wold continue the discussion... Anyway, thanks again
> for explaining and sorry for me going a bit off topic in this reply.
I'm not sure if other drivers silently reset the GPU when they encounter a
hang. It's generally easier to deal with on integrated GPUs since they operate
on system memory. On dGPUs, the contents of vram might be lost after a GPU
reset as the memory controller is reset. If vram is lost, the application that
is running needs to reload it's vram state. Also for reliability, applications
should really be made aware of a GPU reset so they can validate their data.
E.g., you don't want a scientific application to silently get bad data because
the GPU was reset silently in the background.
>
>
> One last thing... It's exremely off topic but I already derailed this reply
> and it has to be told: Thank you Alex for being the guy you are. I bet AMD
> doesn't pay you to explain technical details to stupid end users like me but
> that's very appreciated. You're a hero, keep on rockin'!
Thanks! Glad to help.
--
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread