"Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
@ 2013-01-03 20:46 J. Bruce Fields
  2013-01-03 21:16 ` Josh Boyer
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2013-01-03 20:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linux-kernel, dri-devel

I got a crash after a few minutes of running 3.8.0-rc2, was able to
switch to a vt and look at dmesg:

  [  490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  [  490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
  [  492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  [  492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
  [  492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.  

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
problem.

dmesg, config, and i915_error_state available from:

	http://fieldses.org/~bfields/3.8-hang/

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-03 20:46 "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 J. Bruce Fields
@ 2013-01-03 21:16 ` Josh Boyer
  2013-01-03 23:11   ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Boyer @ 2013-01-03 21:16 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Daniel Vetter, linux-kernel, dri-devel

On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> I got a crash after a few minutes of running 3.8.0-rc2, was able to
> switch to a vt and look at dmesg:
>
>   [  490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>   [  490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
>   [  492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>   [  492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
>   [  492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
>
> Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
> problem.

I'm not questioning that you haven't seen that error in F17, but we have
had quite a few bug reports with similar error messages for a while now.
Apparently there are lots of ways GPUs can get hung, so they might be
different from what you're seeing.  Just wanted to point out that it
might not be a new 3.8 change that caused it.

josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-03 21:16 ` Josh Boyer
@ 2013-01-03 23:11   ` J. Bruce Fields
  2013-01-06 18:06     ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2013-01-03 23:11 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Daniel Vetter, linux-kernel, dri-devel

On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:
> On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > I got a crash after a few minutes of running 3.8.0-rc2, was able to
> > switch to a vt and look at dmesg:
> >
> >   [  490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> >   [  490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> >   [  492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> >   [  492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> >   [  492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
> >
> > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
> > problem.
> 
> I'm not questioning that you haven't seen that error in F17, but we have
> had quite a few bug reports with similar error messages for a while now.
> Apparently there are lots of ways GPUs can get hung, so they might be
> different from what you're seeing.  Just wanted to point out that it
> might not be a new 3.8 change that caused it.

OK, sure.  It reproduced very quickly after the upgrade, so I assumed it
was a regression.

I'm running 3.7.0 now which hasn't shown any problem.

I'll try a newer kernel again to see if it's really that easy for me to
reproduce.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-03 23:11   ` J. Bruce Fields
@ 2013-01-06 18:06     ` Daniel Vetter
  2013-01-08 14:37       ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2013-01-06 18:06 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Josh Boyer, Daniel Vetter, linux-kernel, dri-devel

On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:
> > On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > I got a crash after a few minutes of running 3.8.0-rc2, was able to
> > > switch to a vt and look at dmesg:
> > >
> > >   [  490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > >   [  490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> > >   [  492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > >   [  492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> > >   [  492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
> > >
> > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
> > > problem.
> > 
> > I'm not questioning that you haven't seen that error in F17, but we have
> > had quite a few bug reports with similar error messages for a while now.
> > Apparently there are lots of ways GPUs can get hung, so they might be
> > different from what you're seeing.  Just wanted to point out that it
> > might not be a new 3.8 change that caused it.
> 
> OK, sure.  It reproduced very quickly after the upgrade, so I assumed it
> was a regression.
> 
> I'm running 3.7.0 now which hasn't shown any problem.
> 
> I'll try a newer kernel again to see if it's really that easy for me to
> reproduce.

If you hit this again (even better if you have a way to reproduce) please
grab the i915_error_state file from debugfs and file a bug on
bugs.freedesktop.org against DRM - DRI (Intel). We do know of a few recent
issues introduced around 3.7 kernels, preliminary patches are floating
around. The error state should be good enough to decide whether you're
hitting the same issues.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-06 18:06     ` Daniel Vetter
@ 2013-01-08 14:37       ` J. Bruce Fields
  2013-01-09 11:27         ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2013-01-08 14:37 UTC (permalink / raw)
  To: Josh Boyer, linux-kernel, dri-devel

On Sun, Jan 06, 2013 at 07:06:52PM +0100, Daniel Vetter wrote:
> On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote:
> > On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:
> > > On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > I got a crash after a few minutes of running 3.8.0-rc2, was able to
> > > > switch to a vt and look at dmesg:
> > > >
> > > >   [  490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > > >   [  490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> > > >   [  492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > > >   [  492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> > > >   [  492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
> > > >
> > > > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
> > > > problem.
> > > 
> > > I'm not questioning that you haven't seen that error in F17, but we have
> > > had quite a few bug reports with similar error messages for a while now.
> > > Apparently there are lots of ways GPUs can get hung, so they might be
> > > different from what you're seeing.  Just wanted to point out that it
> > > might not be a new 3.8 change that caused it.
> > 
> > OK, sure.  It reproduced very quickly after the upgrade, so I assumed it
> > was a regression.
> > 
> > I'm running 3.7.0 now which hasn't shown any problem.
> > 
> > I'll try a newer kernel again to see if it's really that easy for me to
> > reproduce.
> 
> If you hit this again (even better if you have a way to reproduce)

Unfortunately I wasn't able to reproduce after working a couple more
hours on 3.8 again.  However:

> please grab the i915_error_state file from debugfs

As I said in the original mail, I've already done that:

	http://fieldses.org/~bfields/3.8-hang/

> and file a bug on
> bugs.freedesktop.org against DRM - DRI (Intel).

Would it still be useful for me to file a bug?  (Just going through the
new-account confirmation dance now.)

--b.

> We do know of a few recent
> issues introduced around 3.7 kernels, preliminary patches are floating
> around. The error state should be good enough to decide whether you're
> hitting the same issues.
> 
> Thanks, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-08 14:37       ` J. Bruce Fields
@ 2013-01-09 11:27         ` Daniel Vetter
  2013-01-09 14:18           ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2013-01-09 11:27 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Josh Boyer, linux-kernel, dri-devel

On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> please grab the i915_error_state file from debugfs
>
> As I said in the original mail, I've already done that:
>
>         http://fieldses.org/~bfields/3.8-hang/

Sorry, missed that the first time around.

>> and file a bug on
>> bugs.freedesktop.org against DRM - DRI (Intel).
>
> Would it still be useful for me to file a bug?  (Just going through the
> new-account confirmation dance now.)

Looks like the ilk bug tracked at
https://bugs.freedesktop.org/show_bug.cgi?id=55984

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2
  2013-01-09 11:27         ` Daniel Vetter
@ 2013-01-09 14:18           ` J. Bruce Fields
  0 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2013-01-09 14:18 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Josh Boyer, linux-kernel, dri-devel

On Wed, Jan 09, 2013 at 12:27:22PM +0100, Daniel Vetter wrote:
> On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >> please grab the i915_error_state file from debugfs
> >
> > As I said in the original mail, I've already done that:
> >
> >         http://fieldses.org/~bfields/3.8-hang/
> 
> Sorry, missed that the first time around.
> 
> >> and file a bug on
> >> bugs.freedesktop.org against DRM - DRI (Intel).
> >
> > Would it still be useful for me to file a bug?  (Just going through the
> > new-account confirmation dance now.)
> 
> Looks like the ilk bug tracked at
> https://bugs.freedesktop.org/show_bug.cgi?id=55984

OK, I'll add something there if I'm available to find a reproducer.
Thanks!

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-09 14:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-03 20:46 "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2 J. Bruce Fields
2013-01-03 21:16 ` Josh Boyer
2013-01-03 23:11   ` J. Bruce Fields
2013-01-06 18:06     ` Daniel Vetter
2013-01-08 14:37       ` J. Bruce Fields
2013-01-09 11:27         ` Daniel Vetter
2013-01-09 14:18           ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox