* "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 @ 2013-01-03 20:46 J. Bruce Fields 2013-01-03 21:16 ` Josh Boyer 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2013-01-03 20:46 UTC (permalink / raw) To: Daniel Vetter; +Cc: linux-kernel, dri-devel I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg: [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip. Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem. dmesg, config, and i915_error_state available from: http://fieldses.org/~bfields/3.8-hang/ --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-03 20:46 "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 J. Bruce Fields @ 2013-01-03 21:16 ` Josh Boyer 2013-01-03 23:11 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: Josh Boyer @ 2013-01-03 21:16 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Daniel Vetter, linux-kernel, dri-devel On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > I got a crash after a few minutes of running 3.8.0-rc2, was able to > switch to a vt and look at dmesg: > > [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state > [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! > [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip. > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such > problem. I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it. josh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-03 21:16 ` Josh Boyer @ 2013-01-03 23:11 ` J. Bruce Fields 2013-01-06 18:06 ` Daniel Vetter 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2013-01-03 23:11 UTC (permalink / raw) To: Josh Boyer; +Cc: Daniel Vetter, linux-kernel, dri-devel On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote: > On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > I got a crash after a few minutes of running 3.8.0-rc2, was able to > > switch to a vt and look at dmesg: > > > > [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state > > [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! > > [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip. > > > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such > > problem. > > I'm not questioning that you haven't seen that error in F17, but we have > had quite a few bug reports with similar error messages for a while now. > Apparently there are lots of ways GPUs can get hung, so they might be > different from what you're seeing. Just wanted to point out that it > might not be a new 3.8 change that caused it. OK, sure. It reproduced very quickly after the upgrade, so I assumed it was a regression. I'm running 3.7.0 now which hasn't shown any problem. I'll try a newer kernel again to see if it's really that easy for me to reproduce. --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-03 23:11 ` J. Bruce Fields @ 2013-01-06 18:06 ` Daniel Vetter 2013-01-08 14:37 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: Daniel Vetter @ 2013-01-06 18:06 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Josh Boyer, Daniel Vetter, linux-kernel, dri-devel On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote: > On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote: > > On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > I got a crash after a few minutes of running 3.8.0-rc2, was able to > > > switch to a vt and look at dmesg: > > > > > > [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > > [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state > > > [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > > [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! > > > [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip. > > > > > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such > > > problem. > > > > I'm not questioning that you haven't seen that error in F17, but we have > > had quite a few bug reports with similar error messages for a while now. > > Apparently there are lots of ways GPUs can get hung, so they might be > > different from what you're seeing. Just wanted to point out that it > > might not be a new 3.8 change that caused it. > > OK, sure. It reproduced very quickly after the upgrade, so I assumed it > was a regression. > > I'm running 3.7.0 now which hasn't shown any problem. > > I'll try a newer kernel again to see if it's really that easy for me to > reproduce. If you hit this again (even better if you have a way to reproduce) please grab the i915_error_state file from debugfs and file a bug on bugs.freedesktop.org against DRM - DRI (Intel). We do know of a few recent issues introduced around 3.7 kernels, preliminary patches are floating around. The error state should be good enough to decide whether you're hitting the same issues. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-06 18:06 ` Daniel Vetter @ 2013-01-08 14:37 ` J. Bruce Fields 2013-01-09 11:27 ` Daniel Vetter 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2013-01-08 14:37 UTC (permalink / raw) To: Josh Boyer, linux-kernel, dri-devel On Sun, Jan 06, 2013 at 07:06:52PM +0100, Daniel Vetter wrote: > On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote: > > On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote: > > > On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > I got a crash after a few minutes of running 3.8.0-rc2, was able to > > > > switch to a vt and look at dmesg: > > > > > > > > [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > > > [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state > > > > [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung > > > > [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! > > > > [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip. > > > > > > > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such > > > > problem. > > > > > > I'm not questioning that you haven't seen that error in F17, but we have > > > had quite a few bug reports with similar error messages for a while now. > > > Apparently there are lots of ways GPUs can get hung, so they might be > > > different from what you're seeing. Just wanted to point out that it > > > might not be a new 3.8 change that caused it. > > > > OK, sure. It reproduced very quickly after the upgrade, so I assumed it > > was a regression. > > > > I'm running 3.7.0 now which hasn't shown any problem. > > > > I'll try a newer kernel again to see if it's really that easy for me to > > reproduce. > > If you hit this again (even better if you have a way to reproduce) Unfortunately I wasn't able to reproduce after working a couple more hours on 3.8 again. However: > please grab the i915_error_state file from debugfs As I said in the original mail, I've already done that: http://fieldses.org/~bfields/3.8-hang/ > and file a bug on > bugs.freedesktop.org against DRM - DRI (Intel). Would it still be useful for me to file a bug? (Just going through the new-account confirmation dance now.) --b. > We do know of a few recent > issues introduced around 3.7 kernels, preliminary patches are floating > around. The error state should be good enough to decide whether you're > hitting the same issues. > > Thanks, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-08 14:37 ` J. Bruce Fields @ 2013-01-09 11:27 ` Daniel Vetter 2013-01-09 14:18 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: Daniel Vetter @ 2013-01-09 11:27 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Josh Boyer, linux-kernel, dri-devel On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >> please grab the i915_error_state file from debugfs > > As I said in the original mail, I've already done that: > > http://fieldses.org/~bfields/3.8-hang/ Sorry, missed that the first time around. >> and file a bug on >> bugs.freedesktop.org against DRM - DRI (Intel). > > Would it still be useful for me to file a bug? (Just going through the > new-account confirmation dance now.) Looks like the ilk bug tracked at https://bugs.freedesktop.org/show_bug.cgi?id=55984 -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 2013-01-09 11:27 ` Daniel Vetter @ 2013-01-09 14:18 ` J. Bruce Fields 0 siblings, 0 replies; 7+ messages in thread From: J. Bruce Fields @ 2013-01-09 14:18 UTC (permalink / raw) To: Daniel Vetter; +Cc: Josh Boyer, linux-kernel, dri-devel On Wed, Jan 09, 2013 at 12:27:22PM +0100, Daniel Vetter wrote: > On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> please grab the i915_error_state file from debugfs > > > > As I said in the original mail, I've already done that: > > > > http://fieldses.org/~bfields/3.8-hang/ > > Sorry, missed that the first time around. > > >> and file a bug on > >> bugs.freedesktop.org against DRM - DRI (Intel). > > > > Would it still be useful for me to file a bug? (Just going through the > > new-account confirmation dance now.) > > Looks like the ilk bug tracked at > https://bugs.freedesktop.org/show_bug.cgi?id=55984 OK, I'll add something there if I'm available to find a reproducer. Thanks! --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-09 14:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-03 20:46 "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 J. Bruce Fields 2013-01-03 21:16 ` Josh Boyer 2013-01-03 23:11 ` J. Bruce Fields 2013-01-06 18:06 ` Daniel Vetter 2013-01-08 14:37 ` J. Bruce Fields 2013-01-09 11:27 ` Daniel Vetter 2013-01-09 14:18 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox