linux-tegra.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: i915 freakout with latest 3.7 git
       [not found]               ` <20121207204406.GA26309-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
@ 2012-12-07 21:08                 ` Daniel Vetter
       [not found]                   ` <CAKMK7uEd_KY+FS9UKqRcFnKuUo5_8TnWnAVgLm7ogtbKU7OVNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2012-12-07 21:08 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: Chris Wilson, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On Fri, Dec 7, 2012 at 9:44 PM, Heinz Diehl <htd-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org> wrote:
> On 07.12.2012, Daniel Vetter wrote:
>> > I think I can reliably reproduce the hang on my machine now. I have to
>> > try some HD-videos on Youtube while writing a big file with dd. The
>> > hang often occurs withing max. 5 min.
>
>> That sounds pretty awesome: Just to check, is this already with rc6
>> disable? Also, which gpu chip?
>
> This is with latest 3.7-git and i915.i915_enable_rc6=0.
> Attached is a logfile/dmesg after booting with debug options on which
> hopefully shows you the gpu chip.
>
>> Sure, always glad to help excellent bug reporters along. My usual
>> kernel bisect howto is:
>> http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
>> It seems to server rather well thus far.

ilk with rc6 disabled, and the two hangs you've attached both die on
the MI_FLUSH in between a 3D primitive and a 2D blit, like all the
other non-rc6 hangs we've seen thus far that indicate that 3.7
regressed. This A looks _very_ good. I'm adding lists again so that
people are updated and can check whether I've analyzed the
error_states correctly. For reference I've uploaded your dmesg and
error_states at

http://people.freedesktop.org/~danvet/stuff/gpu-hang.tar.bz2

> Thanks, I've read it and think that will be pretty easy (technically).
> Am I right to download Linus' tree first, and so compile an 3.7.0-rc1,
> and if I can reproduce the bug with it, it should be a little bit of a
> shorter way to get the offending patch bisected?
>
>> Good luck with the exams!
>
> Thanks! :-)
>
>> Yeah, something in 3.7 seems to have blown up - we have a few reports
>> all claiming that 3.6 is solid, while 3.7 is not :(
>
> I'll try my very best to detect the offending patch. So stay tuned ;-)

Yeah, this would be very good information to move forward with this bug.

Thanks a lot for your hard work in helping with reproducing this bug.

Yours, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                   ` <CAKMK7uEd_KY+FS9UKqRcFnKuUo5_8TnWnAVgLm7ogtbKU7OVNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-07 21:53                     ` Chris Wilson
       [not found]                       ` <f5ae8a$7ojl8o-vB7d4uKqMByEUgXnM9ftUFDQ4js95KgL@public.gmane.org>
  2012-12-08 13:06                     ` Heinz Diehl
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2012-12-07 21:53 UTC (permalink / raw)
  To: Daniel Vetter, Heinz Diehl
  Cc: dri-devel, intel-gfx, linux-tegra-u79uwXL29TY76Z2rM5mHXA

On Fri, 7 Dec 2012 22:08:13 +0100, Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org> wrote:
> ilk with rc6 disabled, and the two hangs you've attached both die on
> the MI_FLUSH in between a 3D primitive and a 2D blit, like all the
> other non-rc6 hangs we've seen thus far that indicate that 3.7
> regressed. This A looks _very_ good. I'm adding lists again so that
> people are updated and can check whether I've analyzed the
> error_states correctly. For reference I've uploaded your dmesg and
> error_states at

The error states do disappear into a black hole during the execution of
a 3DPRIMITIVE. The similarity between the two appear that the WM kernel
loaded for the 3DPRIMITIVE both appear to be recently bound, and were
the last kernels to be bound in the batch. Coincidence? Maybe, the
INSTDONE in both cases is again the same highly unusual condition
suggesting that the EU died.  However, both error-states also suggest
that a fresh surface was uploaded for the same 3DPRIMITIVE - but I'm
having to guess since the error-state doesn't include the auxiliary
state for me to check. One thing you can try is SNA, which packs its
batches differently with the advantage that more auxiliary state is
included in the error-state. It also packs all the kernels into a
single buffer which will reduce the frequency at which it is paged
out/in. So if you can reproduce with SNA (use Option "AccelMethod"
"SNA" in a device section of your xorg.conf snippet) I expect the
error-state to be quite different and hopefully shed some more light on
the issue.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                   ` <CAKMK7uEd_KY+FS9UKqRcFnKuUo5_8TnWnAVgLm7ogtbKU7OVNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-12-07 21:53                     ` Chris Wilson
@ 2012-12-08 13:06                     ` Heinz Diehl
       [not found]                       ` <20121208130648.GA3311-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Heinz Diehl @ 2012-12-08 13:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Chris Wilson, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On 07.12.2012, Daniel Vetter wrote: 

[....]

I did a "git bisect" betweeb 3.6 and 3.7-rc8 and ended up with
this. Unfortunately, git can't revert this patch on top of master, sp
I have not been able to test if a revert will cure the problem.

After reading on the net that Peter (Lekensteyn) already ended up with
bisecting the same patch and it didn't work for him reverting it on
top of 3-7-rc4, I'm somewhat clueless..

What else can I do to help finding the cause?

Heinz


[root@wildsau linux-git]# git bisect good
6c085a728cf000ac1865d66f8c9b52935558b328 is the first bad commit
commit 6c085a728cf000ac1865d66f8c9b52935558b328
Author: Chris Wilson <chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>
Date:   Mon Aug 20 11:40:46 2012 +0200

    drm/i915: Track unbound pages
    
    When dealing with a working set larger than the GATT, or even the
    mappable aperture when touching through the GTT, we end up with
    evicting
    objects only to rebind them at a new offset again later. Moving an
    object into and out of the GTT requires clflushing the pages, thus
    causing a double-clflush penalty for rebinding.
    
    To avoid having to clflush on rebinding, we can track the pages as
    they
    are evicted from the GTT and only relinquish those pages on memory
    pressure.
    
    As usual, if it were not for the handling of out-of-memory
    condition and
    having to manually shrink our own bo caches, it would be a net
    reduction
    of code. Alas.
    
    Note: The patch also contains a few changes to the last-hope
    evict_everything logic in i916_gem_execbuffer.c - we no longer try
    to
    only evict the purgeable stuff in a first try (since that's
    superflous
    and only helps in OOM corner-cases, not fragmented-gtt trashing
    situations).
    
    Also, the extraction of the get_pages retry loop from bind_to_gtt
    (and
    other callsites) to get_pages should imo have been a separate
    patch.
    
    v2: Ditch the newly added put_pages (for unbound objects only) in
    i915_gem_reset. A quick irc discussion hasn't revealed any
    important
    reason for this, so if we need this, I'd like to have a git
    blame'able
    explanation for it.
    
    v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris
    noticed.
    
    Signed-off-by: Chris Wilson <chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>
    [danvet: Split out code movements and rant a bit in the commit
    message
    with a few Notes. Done v2]
    Signed-off-by: Daniel Vetter <daniel.vetter-/w4YWyX8dFk@public.gmane.org>

:040000 040000 c4f02e0d05a570d0baf9d2f19a6c276c06a55142
df93a56308637e3840353c3c9425ec96c3422dcc M	drivers
[root@wildsau linux-git]# 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                       ` <f5ae8a$7ojl8o-vB7d4uKqMByEUgXnM9ftUFDQ4js95KgL@public.gmane.org>
@ 2012-12-08 14:30                         ` Heinz Diehl
       [not found]                           ` <20121208143053.GA3376-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Heinz Diehl @ 2012-12-08 14:30 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On 08.12.2012, Chris Wilson wrote: 

> One thing you can try is SNA, which packs its
> batches differently with the advantage that more auxiliary state is
> included in the error-state. It also packs all the kernels into a
> single buffer which will reduce the frequency at which it is paged
> out/in. So if you can reproduce with SNA (use Option "AccelMethod"
> "SNA" in a device section of your xorg.conf snippet) I expect the
> error-state to be quite different and hopefully shed some more light on
> the issue.

I tried this with latest 3.7-rc8 git, but no matter how hard I try, I
can't get the gpu to hang (with i915.915_enable_rc6=0). Will use this 
as my default kernel the next few days and see if the hang occurs by
chance.

Heinz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                       ` <20121208130648.GA3311-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
@ 2012-12-11 10:22                         ` Daniel Vetter
       [not found]                           ` <20121211102225.GP11556-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2012-12-11 10:22 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: Daniel Vetter, Chris Wilson, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On Sat, Dec 08, 2012 at 02:06:48PM +0100, Heinz Diehl wrote:
> On 07.12.2012, Daniel Vetter wrote: 
> 
> [....]
> 
> I did a "git bisect" betweeb 3.6 and 3.7-rc8 and ended up with
> this. Unfortunately, git can't revert this patch on top of master, sp
> I have not been able to test if a revert will cure the problem.
> 
> After reading on the net that Peter (Lekensteyn) already ended up with
> bisecting the same patch and it didn't work for him reverting it on
> top of 3-7-rc4, I'm somewhat clueless..
> 
> What else can I do to help finding the cause?

Can you please test the patch at

https://bugs.freedesktop.org/attachment.cgi?id=70111

That one should disable all effects of the unbound tracking, since a
revert of the below commit conflicts.

Thanks, Daniel
> 
> Heinz
> 
> 
> [root@wildsau linux-git]# git bisect good
> 6c085a728cf000ac1865d66f8c9b52935558b328 is the first bad commit
> commit 6c085a728cf000ac1865d66f8c9b52935558b328
> Author: Chris Wilson <chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>
> Date:   Mon Aug 20 11:40:46 2012 +0200
> 
>     drm/i915: Track unbound pages
>     
>     When dealing with a working set larger than the GATT, or even the
>     mappable aperture when touching through the GTT, we end up with
>     evicting
>     objects only to rebind them at a new offset again later. Moving an
>     object into and out of the GTT requires clflushing the pages, thus
>     causing a double-clflush penalty for rebinding.
>     
>     To avoid having to clflush on rebinding, we can track the pages as
>     they
>     are evicted from the GTT and only relinquish those pages on memory
>     pressure.
>     
>     As usual, if it were not for the handling of out-of-memory
>     condition and
>     having to manually shrink our own bo caches, it would be a net
>     reduction
>     of code. Alas.
>     
>     Note: The patch also contains a few changes to the last-hope
>     evict_everything logic in i916_gem_execbuffer.c - we no longer try
>     to
>     only evict the purgeable stuff in a first try (since that's
>     superflous
>     and only helps in OOM corner-cases, not fragmented-gtt trashing
>     situations).
>     
>     Also, the extraction of the get_pages retry loop from bind_to_gtt
>     (and
>     other callsites) to get_pages should imo have been a separate
>     patch.
>     
>     v2: Ditch the newly added put_pages (for unbound objects only) in
>     i915_gem_reset. A quick irc discussion hasn't revealed any
>     important
>     reason for this, so if we need this, I'd like to have a git
>     blame'able
>     explanation for it.
>     
>     v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris
>     noticed.
>     
>     Signed-off-by: Chris Wilson <chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>
>     [danvet: Split out code movements and rant a bit in the commit
>     message
>     with a few Notes. Done v2]
>     Signed-off-by: Daniel Vetter <daniel.vetter-/w4YWyX8dFk@public.gmane.org>
> 
> :040000 040000 c4f02e0d05a570d0baf9d2f19a6c276c06a55142
> df93a56308637e3840353c3c9425ec96c3422dcc M	drivers
> [root@wildsau linux-git]# 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                           ` <20121208143053.GA3376-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
@ 2012-12-11 10:24                             ` Chris Wilson
       [not found]                               ` <84c8a8$6tcpes-zyQnk7H6ZEMLll3ZsUKC9FDQ4js95KgL@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2012-12-11 10:24 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: Daniel Vetter, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On Sat, 8 Dec 2012 15:30:53 +0100, Heinz Diehl <htd-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org> wrote:
> On 08.12.2012, Chris Wilson wrote: 
> 
> > One thing you can try is SNA, which packs its
> > batches differently with the advantage that more auxiliary state is
> > included in the error-state. It also packs all the kernels into a
> > single buffer which will reduce the frequency at which it is paged
> > out/in. So if you can reproduce with SNA (use Option "AccelMethod"
> > "SNA" in a device section of your xorg.conf snippet) I expect the
> > error-state to be quite different and hopefully shed some more light on
> > the issue.
> 
> I tried this with latest 3.7-rc8 git, but no matter how hard I try, I
> can't get the gpu to hang (with i915.915_enable_rc6=0). Will use this 
> as my default kernel the next few days and see if the hang occurs by
> chance.

Can you confirm one thing: are you able to reproduce the hangs at all on
3.7-rc8, using your original setup?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                               ` <84c8a8$6tcpes-zyQnk7H6ZEMLll3ZsUKC9FDQ4js95KgL@public.gmane.org>
@ 2012-12-11 15:34                                 ` Heinz Diehl
  0 siblings, 0 replies; 8+ messages in thread
From: Heinz Diehl @ 2012-12-11 15:34 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On 11.12.2012, Chris Wilson wrote: 

> Can you confirm one thing: are you able to reproduce the hangs at all on
> 3.7-rc8, using your original setup?

I can reproduce the hang with both 3.7-rc8 and 3.7 final inkl. latest
Linus-git. All with i915.i915_enable_rc6=0.

Heinz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i915 freakout with latest 3.7 git
       [not found]                           ` <20121211102225.GP11556-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2012-12-11 16:11                             ` Heinz Diehl
  0 siblings, 0 replies; 8+ messages in thread
From: Heinz Diehl @ 2012-12-11 16:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Chris Wilson, dri-devel, intel-gfx,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On 11.12.2012, Daniel Vetter wrote: 

> Can you please test the patch at
> 
> https://bugs.freedesktop.org/attachment.cgi?id=70111
> 
> That one should disable all effects of the unbound tracking, since a
> revert of the below commit conflicts.

I applied this patch to Linus' git from today. "Boom" after about 1
min.

The errorstate file is here:

http://www.fritha.org/i915/errorstate3.tar.bz2

Heinz

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-12-11 16:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20121203173951.GA25384@fancy-poultry.org>
     [not found] ` <CAKMK7uEGTWgPO-wf7aNLC+y9zLYPyy6f_djn6D-Gx-n6bFqcJA@mail.gmail.com>
     [not found]   ` <20121204123522.GA32419@fritha.org>
     [not found]     ` <1498848.rM1KKK5er8@al>
     [not found]       ` <CAKMK7uHH8U64w5kWVp1oOnhTHkStGoJHJx1KazCQ2-gaCAm0Sg@mail.gmail.com>
     [not found]         ` <20121207170704.GA24395@fancy-poultry.org>
     [not found]           ` <CAKMK7uHNn9b1TdUaUn=4DFhF=zBtZToOtF2uSRf2doVCTOdzaQ@mail.gmail.com>
     [not found]             ` <20121207204406.GA26309@fritha.org>
     [not found]               ` <20121207204406.GA26309-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
2012-12-07 21:08                 ` i915 freakout with latest 3.7 git Daniel Vetter
     [not found]                   ` <CAKMK7uEd_KY+FS9UKqRcFnKuUo5_8TnWnAVgLm7ogtbKU7OVNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-07 21:53                     ` Chris Wilson
     [not found]                       ` <f5ae8a$7ojl8o-vB7d4uKqMByEUgXnM9ftUFDQ4js95KgL@public.gmane.org>
2012-12-08 14:30                         ` Heinz Diehl
     [not found]                           ` <20121208143053.GA3376-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
2012-12-11 10:24                             ` Chris Wilson
     [not found]                               ` <84c8a8$6tcpes-zyQnk7H6ZEMLll3ZsUKC9FDQ4js95KgL@public.gmane.org>
2012-12-11 15:34                                 ` Heinz Diehl
2012-12-08 13:06                     ` Heinz Diehl
     [not found]                       ` <20121208130648.GA3311-iEI8Y0CNJBYdnm+yROfE0A@public.gmane.org>
2012-12-11 10:22                         ` Daniel Vetter
     [not found]                           ` <20121211102225.GP11556-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2012-12-11 16:11                             ` Heinz Diehl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).