public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
@ 2013-01-08 22:36 Greg KH
  2013-01-09  0:38 ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2013-01-08 22:36 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter; +Cc: intel-gfx, linux-kernel, Jesse Barnes

Hi all,

I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:

[11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
[11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]

When it happens, gnome-shell dies a horrible death and it requires a
reboot in order to get xorg working properly again (probably because
gnome-shell is hosed.)

The machine does still work to do other things from a text console (I'm
writing this on the machine after the last time this happened.)

It seems to happen when doing a "stressful" thing on the machine (i.e.
multiple kernel builds at the same time).

I also seem to be able to hit this on 3.7.1, but not as regularly, and
not at all on 3.6.y.

Any hints or ideas of what to try out?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-08 22:36 i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree) Greg KH
@ 2013-01-09  0:38 ` Greg KH
  2013-01-09  3:42   ` Dave Airlie
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2013-01-09  0:38 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter; +Cc: intel-gfx, linux-kernel, Jesse Barnes

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

On Tue, Jan 08, 2013 at 02:36:11PM -0800, Greg KH wrote:
> Hi all,
> 
> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
> 
> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
> [11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]

I just hit this again.  And, as the kernel was asking for it, attached
is the i915_error_state file, compressed due to the size of it.

thanks,

greg k-h

[-- Attachment #2: i915_error_state.gz --]
[-- Type: application/x-gzip, Size: 200230 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09  0:38 ` Greg KH
@ 2013-01-09  3:42   ` Dave Airlie
  2013-01-09  4:25     ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Airlie @ 2013-01-09  3:42 UTC (permalink / raw)
  To: Greg KH; +Cc: Chris Wilson, daniel.vetter, intel-gfx, linux-kernel,
	Jesse Barnes

>> Hi all,
>>
>> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
>>
>> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
>> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
>> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
>> [11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]
>
> I just hit this again.  And, as the kernel was asking for it, attached
> is the i915_error_state file, compressed due to the size of it.
>
Welcome to sink hole that is
https://bugs.freedesktop.org/show_bug.cgi?id=55984

3 months and ticking, Intel guys are all running away from it saying
they can't reproduce, everyone else on planet seems to reproduce quite
easily.

Its generally considered a bug in the relocation/shrinker/no idea category,

Assuming you have an Ironlake machine which I'm going to guess you do.

Dave.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09  3:42   ` Dave Airlie
@ 2013-01-09  4:25     ` Greg KH
  2013-01-09  5:31       ` Dave Airlie
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2013-01-09  4:25 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Chris Wilson, daniel.vetter, intel-gfx, linux-kernel,
	Jesse Barnes

On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote:
> >> Hi all,
> >>
> >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
> >>
> >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> >> [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
> >> [11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]
> >
> > I just hit this again.  And, as the kernel was asking for it, attached
> > is the i915_error_state file, compressed due to the size of it.
> >
> Welcome to sink hole that is
> https://bugs.freedesktop.org/show_bug.cgi?id=55984
> 
> 3 months and ticking, Intel guys are all running away from it saying
> they can't reproduce, everyone else on planet seems to reproduce quite
> easily.
> 
> Its generally considered a bug in the relocation/shrinker/no idea category,

Ugh, what a mess.

> Assuming you have an Ironlake machine which I'm going to guess you do.

I don't know, it's an old i5 machine that has never had any video
problems for many years now.  How do I tell?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09  4:25     ` Greg KH
@ 2013-01-09  5:31       ` Dave Airlie
  2013-01-09  7:28         ` Lijo Antony
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Airlie @ 2013-01-09  5:31 UTC (permalink / raw)
  To: Greg KH; +Cc: Chris Wilson, daniel.vetter, intel-gfx, linux-kernel,
	Jesse Barnes

On Wed, Jan 9, 2013 at 2:25 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote:
>> >> Hi all,
>> >>
>> >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
>> >>
>> >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> >> [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
>> >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
>> >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
>> >> [11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]
>> >
>> > I just hit this again.  And, as the kernel was asking for it, attached
>> > is the i915_error_state file, compressed due to the size of it.
>> >
>> Welcome to sink hole that is
>> https://bugs.freedesktop.org/show_bug.cgi?id=55984
>>
>> 3 months and ticking, Intel guys are all running away from it saying
>> they can't reproduce, everyone else on planet seems to reproduce quite
>> easily.
>>
>> Its generally considered a bug in the relocation/shrinker/no idea category,
>
> Ugh, what a mess.
>
>> Assuming you have an Ironlake machine which I'm going to guess you do.
>
> I don't know, it's an old i5 machine that has never had any video
> problems for many years now.  How do I tell?

lspci -nn probably an 8086:0046 device.

Old i5 probably means original i5 which means ironlake.

Dave.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09  5:31       ` Dave Airlie
@ 2013-01-09  7:28         ` Lijo Antony
  2013-01-09 19:44           ` Dave Kleikamp
  0 siblings, 1 reply; 18+ messages in thread
From: Lijo Antony @ 2013-01-09  7:28 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Greg KH, Chris Wilson, daniel.vetter, intel-gfx, linux-kernel,
	Jesse Barnes

On 01/09/2013 09:31 AM, Dave Airlie wrote:
> On Wed, Jan 9, 2013 at 2:25 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote:
>>>>> Hi all,
>>>>>
>>>>> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
>>>>>
>>>>> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>>>>> [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
>>>>> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>>>>> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
>>>>> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
>>>>> [11883.083225] gnome-shell[19396]: segfault at 218 ip 00007feef5f32333 sp 00007ffffc1dc930 error 4 in i965_dri.so[7feef5ecb000+d0000]
>>>>
>>>> I just hit this again.  And, as the kernel was asking for it, attached
>>>> is the i915_error_state file, compressed due to the size of it.
>>>>
>>> Welcome to sink hole that is
>>> https://bugs.freedesktop.org/show_bug.cgi?id=55984
>>>
>>> 3 months and ticking, Intel guys are all running away from it saying
>>> they can't reproduce, everyone else on planet seems to reproduce quite
>>> easily.
>>>
>>> Its generally considered a bug in the relocation/shrinker/no idea category,
>>
>> Ugh, what a mess.
>>
>>> Assuming you have an Ironlake machine which I'm going to guess you do.
>>
>> I don't know, it's an old i5 machine that has never had any video
>> problems for many years now.  How do I tell?
>
> lspci -nn probably an 8086:0046 device.
>
> Old i5 probably means original i5 which means ironlake.
>

I have also seen this a couple of times on 3.7 and 3.8-rc1.
Most of the times I was watching youtube video in chrome. Nothing 
crashed though(I am not running gnome shell). System recovered after few 
seconds.

I didn't see this on 3.8-rc2 yet, probably because I haven't watched any 
video.

-lijo




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09  7:28         ` Lijo Antony
@ 2013-01-09 19:44           ` Dave Kleikamp
  2013-01-09 20:12             ` Dave Kleikamp
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Kleikamp @ 2013-01-09 19:44 UTC (permalink / raw)
  To: Lijo Antony
  Cc: Dave Airlie, Greg KH, Chris Wilson, daniel.vetter, intel-gfx,
	linux-kernel, Jesse Barnes

On 01/09/2013 01:28 AM, Lijo Antony wrote:
> On 01/09/2013 09:31 AM, Dave Airlie wrote:
>> On Wed, Jan 9, 2013 at 2:25 PM, Greg KH <gregkh@linuxfoundation.org>
>> wrote:
>>> On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree:
>>>>>>
>>>>>> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>>>>>> elapsed... GPU hung
>>>>>> [11868.414655] [drm] capturing error event; look for more
>>>>>> information in /debug/dri/0/i915_error_state
>>>>>> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>>>>>> elapsed... GPU hung
>>>>>> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast,
>>>>>> declaring wedged!
>>>>>> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip.
>>>>>> [11883.083225] gnome-shell[19396]: segfault at 218 ip
>>>>>> 00007feef5f32333 sp 00007ffffc1dc930 error 4 in
>>>>>> i965_dri.so[7feef5ecb000+d0000]
>>>>>
>>>>> I just hit this again.  And, as the kernel was asking for it, attached
>>>>> is the i915_error_state file, compressed due to the size of it.
>>>>>
>>>> Welcome to sink hole that is
>>>> https://bugs.freedesktop.org/show_bug.cgi?id=55984
>>>>
>>>> 3 months and ticking, Intel guys are all running away from it saying
>>>> they can't reproduce, everyone else on planet seems to reproduce quite
>>>> easily.
>>>>
>>>> Its generally considered a bug in the relocation/shrinker/no idea
>>>> category,
>>>
>>> Ugh, what a mess.
>>>
>>>> Assuming you have an Ironlake machine which I'm going to guess you do.
>>>
>>> I don't know, it's an old i5 machine that has never had any video
>>> problems for many years now.  How do I tell?
>>
>> lspci -nn probably an 8086:0046 device.
>>
>> Old i5 probably means original i5 which means ironlake.
>>
> 
> I have also seen this a couple of times on 3.7 and 3.8-rc1.
> Most of the times I was watching youtube video in chrome. Nothing
> crashed though(I am not running gnome shell). System recovered after few
> seconds.
> 
> I didn't see this on 3.8-rc2 yet, probably because I haven't watched any
> video.

I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.

00:02.0 VGA compatible controller [0300]: Intel Corporation Core
Processor Integrated Graphics Controller [8086:0046] (rev 02)

Thinkpad T410

Shaggy

> 
> -lijo
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09 19:44           ` Dave Kleikamp
@ 2013-01-09 20:12             ` Dave Kleikamp
  2013-01-09 21:08               ` Greg KH
                                 ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Dave Kleikamp @ 2013-01-09 20:12 UTC (permalink / raw)
  Cc: Lijo Antony, Dave Airlie, Greg KH, Chris Wilson, daniel.vetter,
	intel-gfx, linux-kernel, Jesse Barnes

On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> 
> I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> 
> 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> Processor Integrated Graphics Controller [8086:0046] (rev 02)
> 
> Thinkpad T410
> 
> Shaggy

Daniel's patch:

drm/i915: Revert shrinker changes from "Track unbound pages"

fixes the problem for me.

Thanks,
Shaggy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09 20:12             ` Dave Kleikamp
@ 2013-01-09 21:08               ` Greg KH
  2013-01-10  0:40               ` Greg KH
  2013-01-11 17:26               ` Nikola Pajkovsky
  2 siblings, 0 replies; 18+ messages in thread
From: Greg KH @ 2013-01-09 21:08 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Lijo Antony, Dave Airlie, Chris Wilson, daniel.vetter, intel-gfx,
	linux-kernel, Jesse Barnes

On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
> On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> > 
> > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> > 
> > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> > Processor Integrated Graphics Controller [8086:0046] (rev 02)
> > 
> > Thinkpad T410
> > 
> > Shaggy
> 
> Daniel's patch:
> 
> drm/i915: Revert shrinker changes from "Track unbound pages"
> 
> fixes the problem for me.

Thanks for the hint, I'll go try that right now...

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09 20:12             ` Dave Kleikamp
  2013-01-09 21:08               ` Greg KH
@ 2013-01-10  0:40               ` Greg KH
  2013-01-10  1:07                 ` Chris Wilson
  2013-01-11 17:26               ` Nikola Pajkovsky
  2 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2013-01-10  0:40 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Lijo Antony, Dave Airlie, Chris Wilson, daniel.vetter, intel-gfx,
	linux-kernel, Jesse Barnes

On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
> On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> > 
> > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> > 
> > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> > Processor Integrated Graphics Controller [8086:0046] (rev 02)
> > 
> > Thinkpad T410
> > 
> > Shaggy
> 
> Daniel's patch:
> 
> drm/i915: Revert shrinker changes from "Track unbound pages"
> 
> fixes the problem for me.

After an afternoon of multiple kernel builds and other stressful things,
it looks like it fixes it for me as well.  Chris, this will be going to
Linus soon, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-10  0:40               ` Greg KH
@ 2013-01-10  1:07                 ` Chris Wilson
  2013-01-10  1:19                   ` Dave Airlie
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2013-01-10  1:07 UTC (permalink / raw)
  To: Greg KH, Dave Kleikamp
  Cc: Lijo Antony, Dave Airlie, daniel.vetter, intel-gfx, linux-kernel,
	Jesse Barnes

On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> > > 
> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> > > 
> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> > > Processor Integrated Graphics Controller [8086:0046] (rev 02)
> > > 
> > > Thinkpad T410
> > > 
> > > Shaggy
> > 
> > Daniel's patch:
> > 
> > drm/i915: Revert shrinker changes from "Track unbound pages"
> > 
> > fixes the problem for me.
> 
> After an afternoon of multiple kernel builds and other stressful things,
> it looks like it fixes it for me as well.  Chris, this will be going to
> Linus soon, right?

Daniel will send it on. I hope before he does so, he will clarify the
changelog to note that it is just papering over the issue. If the
conjecture is right, it will not prevent that path from triggering the
hang, nor does it prevent other eviction paths from potentially causing
the same issue.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-10  1:07                 ` Chris Wilson
@ 2013-01-10  1:19                   ` Dave Airlie
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2013-01-10  1:19 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Greg KH, Dave Kleikamp, Lijo Antony, daniel.vetter, intel-gfx,
	linux-kernel, Jesse Barnes

On Thu, Jan 10, 2013 at 11:07 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH <gregkh@linuxfoundation.org> wrote:
>> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
>> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
>> > >
>> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
>> > >
>> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
>> > > Processor Integrated Graphics Controller [8086:0046] (rev 02)
>> > >
>> > > Thinkpad T410
>> > >
>> > > Shaggy
>> >
>> > Daniel's patch:
>> >
>> > drm/i915: Revert shrinker changes from "Track unbound pages"
>> >
>> > fixes the problem for me.
>>
>> After an afternoon of multiple kernel builds and other stressful things,
>> it looks like it fixes it for me as well.  Chris, this will be going to
>> Linus soon, right?
>
> Daniel will send it on. I hope before he does so, he will clarify the
> changelog to note that it is just papering over the issue. If the
> conjecture is right, it will not prevent that path from triggering the
> hang, nor does it prevent other eviction paths from potentially causing
> the same issue.

In this case since the issue was papered over all the kernel up until
3.7, I think repapering is the answer for now. I have a novel idea
maybe someone could spend some time working out what is broken in
private on a test box instead of making everyone who runs 3.7 and 3.8
on ILK deal with it. I of course know this won't happen and I'll be
reverting patches from you guys that cause Ironlake flakyness for
ever.

Dave.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-09 20:12             ` Dave Kleikamp
  2013-01-09 21:08               ` Greg KH
  2013-01-10  0:40               ` Greg KH
@ 2013-01-11 17:26               ` Nikola Pajkovsky
  2013-01-11 18:42                 ` Daniel Vetter
  2 siblings, 1 reply; 18+ messages in thread
From: Nikola Pajkovsky @ 2013-01-11 17:26 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Lijo Antony, Dave Airlie, Greg KH, Chris Wilson, daniel.vetter,
	intel-gfx, linux-kernel, Jesse Barnes

Dave Kleikamp <dave.kleikamp@oracle.com> writes:

> On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
>> 
>> I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
>> 
>> 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
>> Processor Integrated Graphics Controller [8086:0046] (rev 02)
>> 
>> Thinkpad T410
>> 
>> Shaggy
>
> Daniel's patch:
>
> drm/i915: Revert shrinker changes from "Track unbound pages"
>
> fixes the problem for me.

bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
unbound pages")

$ glxgears

[  429.656459] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  429.656463] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  429.665762] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring

-- 
Nikola

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-11 17:26               ` Nikola Pajkovsky
@ 2013-01-11 18:42                 ` Daniel Vetter
  2013-01-14  6:58                   ` Nikola Pajkovsky
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2013-01-11 18:42 UTC (permalink / raw)
  To: Nikola Pajkovsky
  Cc: Dave Kleikamp, Lijo Antony, Dave Airlie, Greg KH, Chris Wilson,
	intel-gfx, linux-kernel, Jesse Barnes

On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
> unbound pages")

Could be a different bug, can you please attach the error_state somewhere?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-11 18:42                 ` Daniel Vetter
@ 2013-01-14  6:58                   ` Nikola Pajkovsky
  2013-01-14  9:06                     ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Nikola Pajkovsky @ 2013-01-14  6:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Dave Kleikamp, Lijo Antony, Dave Airlie, Greg KH, Chris Wilson,
	intel-gfx, linux-kernel, Jesse Barnes

[-- Attachment #1: Type: text/plain, Size: 412 bytes --]

Daniel Vetter <daniel.vetter@ffwll.ch> writes:

> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
>> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
>> unbound pages")
>
> Could be a different bug, can you please attach the error_state somewhere?

yep, i915_error_state is attached. btw, I'm going to bisect kernel, so
hopefully I will bring some commit.


[-- Attachment #2: i915_error_state --]
[-- Type: application/octet-stream, Size: 278740 bytes --]

[-- Attachment #3: Type: text/plain, Size: 12 bytes --]


-- 
Nikola

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-14  6:58                   ` Nikola Pajkovsky
@ 2013-01-14  9:06                     ` Daniel Vetter
  2013-01-14  9:49                       ` Nikola Pajkovsky
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2013-01-14  9:06 UTC (permalink / raw)
  To: Nikola Pajkovsky
  Cc: Dave Kleikamp, Lijo Antony, Dave Airlie, Greg KH, Chris Wilson,
	intel-gfx, linux-kernel, Jesse Barnes

On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
> Daniel Vetter <daniel.vetter@ffwll.ch> writes:
>
>> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
>>> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
>>> unbound pages")
>>
>> Could be a different bug, can you please attach the error_state somewhere?
>
> yep, i915_error_state is attached. btw, I'm going to bisect kernel, so
> hopefully I will bring some commit.

Different bug, on a quick lock this could be a dupe of
https://bugzilla.kernel.org/show_bug.cgi?id=52311

Chris should know the details.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-14  9:06                     ` Daniel Vetter
@ 2013-01-14  9:49                       ` Nikola Pajkovsky
  2013-01-14 12:47                         ` Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Nikola Pajkovsky @ 2013-01-14  9:49 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Dave Kleikamp, Lijo Antony, Dave Airlie, Greg KH, Chris Wilson,
	intel-gfx, linux-kernel, Jesse Barnes

Daniel Vetter <daniel.vetter@ffwll.ch> writes:

> On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
>> Daniel Vetter <daniel.vetter@ffwll.ch> writes:
>>
>>> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
>>>> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
>>>> unbound pages")
>>>
>>> Could be a different bug, can you please attach the error_state somewhere?
>>
>> yep, i915_error_state is attached. btw, I'm going to bisect kernel, so
>> hopefully I will bring some commit.
>
> Different bug, on a quick lock this could be a dupe of
> https://bugzilla.kernel.org/show_bug.cgi?id=52311

ok

> Chris should know the details.

thanks, bisection leads me to commit d7d4eed ("drm/i915: Allow
DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not
possible to simply revert/test commit and I have no idea how i915 works.

Chris any ideas?

-- 
Nikola

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
  2013-01-14  9:49                       ` Nikola Pajkovsky
@ 2013-01-14 12:47                         ` Chris Wilson
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2013-01-14 12:47 UTC (permalink / raw)
  To: Nikola Pajkovsky, Daniel Vetter
  Cc: Dave Kleikamp, Lijo Antony, Dave Airlie, Greg KH, intel-gfx,
	linux-kernel, Jesse Barnes

On Mon, 14 Jan 2013 10:49:08 +0100, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
> Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> 
> > On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
> >> Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> >>
> >>> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky <npajkovs@redhat.com> wrote:
> >>>> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track
> >>>> unbound pages")
> >>>
> >>> Could be a different bug, can you please attach the error_state somewhere?
> >>
> >> yep, i915_error_state is attached. btw, I'm going to bisect kernel, so
> >> hopefully I will bring some commit.
> >
> > Different bug, on a quick lock this could be a dupe of
> > https://bugzilla.kernel.org/show_bug.cgi?id=52311
> 
> ok
> 
> > Chris should know the details.
> 
> thanks, bisection leads me to commit d7d4eed ("drm/i915: Allow
> DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not
> possible to simply revert/test commit and I have no idea how i915 works.
> 
> Chris any ideas?

Userspace is failing to prepare the GPU to execute a WAIT_FOR_EVENT
command, which it can only try if the kernel allows execution of
privileged batch buffers.

Option "SwapbuffersWait" "false" in xorg.conf will prevent the ddx from
issuing the hanging command sequence. It is not clear yet what the
missing ingredient is, I suspect the ddx needs to be more careful about
not setting conditions that can never be met.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-01-14 12:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-08 22:36 i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree) Greg KH
2013-01-09  0:38 ` Greg KH
2013-01-09  3:42   ` Dave Airlie
2013-01-09  4:25     ` Greg KH
2013-01-09  5:31       ` Dave Airlie
2013-01-09  7:28         ` Lijo Antony
2013-01-09 19:44           ` Dave Kleikamp
2013-01-09 20:12             ` Dave Kleikamp
2013-01-09 21:08               ` Greg KH
2013-01-10  0:40               ` Greg KH
2013-01-10  1:07                 ` Chris Wilson
2013-01-10  1:19                   ` Dave Airlie
2013-01-11 17:26               ` Nikola Pajkovsky
2013-01-11 18:42                 ` Daniel Vetter
2013-01-14  6:58                   ` Nikola Pajkovsky
2013-01-14  9:06                     ` Daniel Vetter
2013-01-14  9:49                       ` Nikola Pajkovsky
2013-01-14 12:47                         ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox