g33: GPU hangs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* g33: GPU hangs
@ 2011-12-01 12:30 Jiri Slaby
  2011-12-01 12:47 ` Chris Wilson
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Slaby @ 2011-12-01 12:30 UTC (permalink / raw)
  To: Keith Packard; +Cc: dri-devel, LKML, Jiri Slaby

Hi,

both yesterday and today, my GPU hung. Both happened when I opened
google front page in firefox.

I'm running 3.2.0-rc3-next-20111130. Given it happened twice in the past
24 hours, it looks like a regression from next-20111124. Or is this a
userspace issue (I might updated some packages)?

i915_error_state dumps from the two hangs are here:
http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_0
http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_second

# rpm -q Mesa libdrm xorg-x11-server
Mesa-7.11.1-5.1.x86_64
libdrm-2.4.27-2.1.x86_64
xorg-x11-server-7.6_1.10.4-39.1.x86_64

00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31
Express Integrated Graphics Controller [8086:29c2] (rev 02) (prog-if 00
[VGA controller])
        Subsystem: Intel Corporation 82G33/G31 Express Integrated
Graphics Controller [8086:29c2]
        Flags: bus master, fast devsel, latency 0, IRQ 42
        Memory at feb80000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at ec00 [size=8]
        Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Memory at fea00000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Kernel driver in use: i915

regards,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: g33: GPU hangs
  2011-12-01 12:30 g33: GPU hangs Jiri Slaby
@ 2011-12-01 12:47 ` Chris Wilson
  2011-12-03 22:58   ` Jiri Slaby
  2012-01-18  9:31   ` Jiri Slaby
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Wilson @ 2011-12-01 12:47 UTC (permalink / raw)
  To: Jiri Slaby, Keith Packard; +Cc: LKML, dri-devel, Jiri Slaby

On Thu, 01 Dec 2011 13:30:18 +0100, Jiri Slaby <jslaby@suse.cz> wrote:
> Hi,
> 
> both yesterday and today, my GPU hung. Both happened when I opened
> google front page in firefox.
> 
> I'm running 3.2.0-rc3-next-20111130. Given it happened twice in the past
> 24 hours, it looks like a regression from next-20111124. Or is this a
> userspace issue (I might updated some packages)?
> 
> i915_error_state dumps from the two hangs are here:
> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_0
> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_second

Both error states contain the same bug: a fence register in conflict
with the command stream. The batch is using the buffer at 0x03d0000 
as an untiled 40x40 rgba buffer with pitch 192. However, a fence
register is programmed to
  fence[3] = 03d00001
    valid, x-tiled, pitch: 512, start: 0x03d00000, size: 1048576

Also note that buffer is also not listed as currently active, so
presumably we reused the buffer as tiled (and so reprogrammed the
fence registered) before the GPU retired the batch. That sounds eerily
similar to this bug:

>From 2b76187d2f5fc2352e391914b1828f91f93bb356 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 29 Nov 2011 15:12:16 +0000
Subject: [PATCH] drm/i915: Only clear the GPU domains upon a successful
 finish

By clearing the GPU read domains before waiting upon the buffer, we run
the risk of the wait being interrupted and the domains prematurely
cleared. The next time we attempt to wait upon the buffer (after
userspace handles the signal), we believe that the buffer is idle and so
skip the wait.

There are a number of bugs across all generations which show signs of an
overly haste reuse of active buffers.

Such as:

  https://bugs.freedesktop.org/show_bug.cgi?id=29046
  https://bugs.freedesktop.org/show_bug.cgi?id=35863
  https://bugs.freedesktop.org/show_bug.cgi?id=38952
  https://bugs.freedesktop.org/show_bug.cgi?id=40282
  https://bugs.freedesktop.org/show_bug.cgi?id=41098
  https://bugs.freedesktop.org/show_bug.cgi?id=41102
  https://bugs.freedesktop.org/show_bug.cgi?id=41284
  https://bugs.freedesktop.org/show_bug.cgi?id=42141

A couple of those pre-date i915_gem_object_finish_gpu(), so may be
unrelated (such as a wild write from a userspace command buffer), but
this does look like a convincing cause for most of those bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d560175..036bc58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3087,10 +3087,13 @@ i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj)
 			return ret;
 	}
 
+	ret = i915_gem_object_wait_rendering(obj);
+	if (ret)
+		return ret;
+
 	/* Ensure that we invalidate the GPU's caches and TLBs. */
 	obj->base.read_domains &= ~I915_GEM_GPU_DOMAINS;
-
-	return i915_gem_object_wait_rendering(obj);
+	return 0;
 }
 
 /**
-- 
1.7.7.3

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: g33: GPU hangs
  2011-12-01 12:47 ` Chris Wilson
@ 2011-12-03 22:58   ` Jiri Slaby
  2012-01-18  9:31   ` Jiri Slaby
  1 sibling, 0 replies; 6+ messages in thread
From: Jiri Slaby @ 2011-12-03 22:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Jiri Slaby, Keith Packard, LKML, dri-devel

On 12/01/2011 01:47 PM, Chris Wilson wrote:
> On Thu, 01 Dec 2011 13:30:18 +0100, Jiri Slaby <jslaby@suse.cz> wrote:
>> Hi,
>>
>> both yesterday and today, my GPU hung. Both happened when I opened
>> google front page in firefox.
>>
>> I'm running 3.2.0-rc3-next-20111130. Given it happened twice in the past
>> 24 hours, it looks like a regression from next-20111124. Or is this a
>> userspace issue (I might updated some packages)?
>>
>> i915_error_state dumps from the two hangs are here:
>> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_0
>> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_second
> 
> Both error states contain the same bug: a fence register in conflict
> with the command stream. The batch is using the buffer at 0x03d0000 
> as an untiled 40x40 rgba buffer with pitch 192. However, a fence
> register is programmed to
>   fence[3] = 03d00001
>     valid, x-tiled, pitch: 512, start: 0x03d00000, size: 1048576
> 
> Also note that buffer is also not listed as currently active, so
> presumably we reused the buffer as tiled (and so reprogrammed the
> fence registered) before the GPU retired the batch. That sounds eerily
> similar to this bug:

Hi, it seems like it's fixed by the patch. Thanks.

> From 2b76187d2f5fc2352e391914b1828f91f93bb356 Mon Sep 17 00:00:00 2001
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Tue, 29 Nov 2011 15:12:16 +0000
> Subject: [PATCH] drm/i915: Only clear the GPU domains upon a successful
>  finish

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: g33: GPU hangs
  2011-12-01 12:47 ` Chris Wilson
  2011-12-03 22:58   ` Jiri Slaby
@ 2012-01-18  9:31   ` Jiri Slaby
  2012-01-18 11:43     ` Daniel Vetter
  1 sibling, 1 reply; 6+ messages in thread
From: Jiri Slaby @ 2012-01-18  9:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Jiri Slaby, Keith Packard, LKML, dri-devel

On 12/01/2011 01:47 PM, Chris Wilson wrote:
> On Thu, 01 Dec 2011 13:30:18 +0100, Jiri Slaby <jslaby@suse.cz> wrote:
>> Hi,
>>
>> both yesterday and today, my GPU hung. Both happened when I opened
>> google front page in firefox.
>>
>> I'm running 3.2.0-rc3-next-20111130. Given it happened twice in the past
>> 24 hours, it looks like a regression from next-20111124. Or is this a
>> userspace issue (I might updated some packages)?
>>
>> i915_error_state dumps from the two hangs are here:
>> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_0
>> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_second
> 
> Both error states contain the same bug: a fence register in conflict
> with the command stream. The batch is using the buffer at 0x03d0000 
> as an untiled 40x40 rgba buffer with pitch 192. However, a fence
> register is programmed to
>   fence[3] = 03d00001
>     valid, x-tiled, pitch: 512, start: 0x03d00000, size: 1048576
> 
> Also note that buffer is also not listed as currently active, so
> presumably we reused the buffer as tiled (and so reprogrammed the
> fence registered) before the GPU retired the batch. That sounds eerily
> similar to this bug:
> 
> From 2b76187d2f5fc2352e391914b1828f91f93bb356 Mon Sep 17 00:00:00 2001
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Tue, 29 Nov 2011 15:12:16 +0000
> Subject: [PATCH] drm/i915: Only clear the GPU domains upon a successful
>  finish

Hi, do you plan to push this patch upstream? Or am I supposed to not use
it anymore?

> By clearing the GPU read domains before waiting upon the buffer, we run
> the risk of the wait being interrupted and the domains prematurely
> cleared. The next time we attempt to wait upon the buffer (after
> userspace handles the signal), we believe that the buffer is idle and so
> skip the wait.
> 
> There are a number of bugs across all generations which show signs of an
> overly haste reuse of active buffers.
> 
> Such as:
> 
>   https://bugs.freedesktop.org/show_bug.cgi?id=29046
>   https://bugs.freedesktop.org/show_bug.cgi?id=35863
>   https://bugs.freedesktop.org/show_bug.cgi?id=38952
>   https://bugs.freedesktop.org/show_bug.cgi?id=40282
>   https://bugs.freedesktop.org/show_bug.cgi?id=41098
>   https://bugs.freedesktop.org/show_bug.cgi?id=41102
>   https://bugs.freedesktop.org/show_bug.cgi?id=41284
>   https://bugs.freedesktop.org/show_bug.cgi?id=42141
> 
> A couple of those pre-date i915_gem_object_finish_gpu(), so may be
> unrelated (such as a wild write from a userspace command buffer), but
> this does look like a convincing cause for most of those bugs.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@kernel.org
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d560175..036bc58 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3087,10 +3087,13 @@ i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj)
>  			return ret;
>  	}
>  
> +	ret = i915_gem_object_wait_rendering(obj);
> +	if (ret)
> +		return ret;
> +
>  	/* Ensure that we invalidate the GPU's caches and TLBs. */
>  	obj->base.read_domains &= ~I915_GEM_GPU_DOMAINS;
> -
> -	return i915_gem_object_wait_rendering(obj);
> +	return 0;
>  }
>  
>  /**


-- 
js

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: g33: GPU hangs
  2012-01-18  9:31   ` Jiri Slaby
@ 2012-01-18 11:43     ` Daniel Vetter
  2012-01-24 14:51       ` new GPU hang [was: g33: GPU hangs] Jiri Slaby
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2012-01-18 11:43 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Chris Wilson, Jiri Slaby, LKML, dri-devel

On Wed, Jan 18, 2012 at 10:31:42AM +0100, Jiri Slaby wrote:
> On 12/01/2011 01:47 PM, Chris Wilson wrote:
> > On Thu, 01 Dec 2011 13:30:18 +0100, Jiri Slaby <jslaby@suse.cz> wrote:
> >> Hi,
> >>
> >> both yesterday and today, my GPU hung. Both happened when I opened
> >> google front page in firefox.
> >>
> >> I'm running 3.2.0-rc3-next-20111130. Given it happened twice in the past
> >> 24 hours, it looks like a regression from next-20111124. Or is this a
> >> userspace issue (I might updated some packages)?
> >>
> >> i915_error_state dumps from the two hangs are here:
> >> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_0
> >> http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state_second
> > 
> > Both error states contain the same bug: a fence register in conflict
> > with the command stream. The batch is using the buffer at 0x03d0000 
> > as an untiled 40x40 rgba buffer with pitch 192. However, a fence
> > register is programmed to
> >   fence[3] = 03d00001
> >     valid, x-tiled, pitch: 512, start: 0x03d00000, size: 1048576
> > 
> > Also note that buffer is also not listed as currently active, so
> > presumably we reused the buffer as tiled (and so reprogrammed the
> > fence registered) before the GPU retired the batch. That sounds eerily
> > similar to this bug:
> > 
> > From 2b76187d2f5fc2352e391914b1828f91f93bb356 Mon Sep 17 00:00:00 2001
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Tue, 29 Nov 2011 15:12:16 +0000
> > Subject: [PATCH] drm/i915: Only clear the GPU domains upon a successful
> >  finish
> 
> Hi, do you plan to push this patch upstream? Or am I supposed to not use
> it anymore?

It's on track to get merged to drm-intel-next. I'll probably pick it up in a week
or so.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 6+ messages in thread

* new GPU hang [was: g33: GPU hangs]
  2012-01-18 11:43     ` Daniel Vetter
@ 2012-01-24 14:51       ` Jiri Slaby
  0 siblings, 0 replies; 6+ messages in thread
From: Jiri Slaby @ 2012-01-24 14:51 UTC (permalink / raw)
  To: Chris Wilson, Jiri Slaby, LKML, dri-devel, daniel

On 01/18/2012 12:43 PM, Daniel Vetter wrote:
>>> From 2b76187d2f5fc2352e391914b1828f91f93bb356 Mon Sep 17 00:00:00 2001
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>> Date: Tue, 29 Nov 2011 15:12:16 +0000
>>> Subject: [PATCH] drm/i915: Only clear the GPU domains upon a successful
>>>  finish
>>
>> Hi, do you plan to push this patch upstream? Or am I supposed to not use
>> it anymore?
> 
> It's on track to get merged to drm-intel-next. I'll probably pick it up in a week
> or so.

OK, thanks.

Even though I use the patch, today another hang occurred:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in
/debug/dri/0/i915_error_state
[drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting
910502 at 910475, next 910519)
[drm:i915_reset] *ERROR* Failed to reset chip.

I was cruising through a map at openstreetmaps.org. This was
3.2.0-next-20120118_64+. FWIW I updated to 3.3.0-rc1-next-20120124_64+
right now.

Is it related to the bug I reported at:
https://bugs.freedesktop.org/show_bug.cgi?id=43427
?

dmesg:
http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state.dmesg

error_state:
http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-24 14:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-01 12:30 g33: GPU hangs Jiri Slaby
2011-12-01 12:47 ` Chris Wilson
2011-12-03 22:58   ` Jiri Slaby
2012-01-18  9:31   ` Jiri Slaby
2012-01-18 11:43     ` Daniel Vetter
2012-01-24 14:51       ` new GPU hang [was: g33: GPU hangs] Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox