public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
@ 2014-09-30 23:04 Daniel Vetter
  2014-10-01  6:28 ` Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-09-30 23:04 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

This seems to have been accidentally lost in

commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Aug 30 16:19:28 2013 +0300

    drm/i915: ban badly behaving contexts

Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

--

Note, totally untested since it's a bit late here ;-)
-Daniel
---
 drivers/gpu/drm/i915/i915_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6948877c881c..1870759a5ed8 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
 				 "error for simulated gpu hangs\n");
 			ret = 0;
 		}
+	} else {
+		DRM_ERROR("Reset chip after gpu hang\n");
 	}
 
 	if (ret) {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
@ 2014-10-01  6:28 ` Chris Wilson
  2014-10-01  8:13   ` Daniel Vetter
  2014-10-01  9:28 ` Daniel Vetter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01  6:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
> 
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date:   Fri Aug 30 16:19:28 2013 +0300
> 
>     drm/i915: ban badly behaving contexts
> 
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.

A successful GPU hang is not an error. Might be a warn or a notice, but
it certainly isn't a driver error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  6:28 ` Chris Wilson
@ 2014-10-01  8:13   ` Daniel Vetter
  2014-10-01  8:19     ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  8:13 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala, Kenneth Graunke, Daniel Vetter

On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > This seems to have been accidentally lost in
> > 
> > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Date:   Fri Aug 30 16:19:28 2013 +0300
> > 
> >     drm/i915: ban badly behaving contexts
> > 
> > Without this real gpu hangs only log output at info level, which gets
> > filtered away by piglit's testrunner.
> 
> A successful GPU hang is not an error. Might be a warn or a notice, but
> it certainly isn't a driver error.

Well not of the kernel driver, but might very well be a bug in the
userspace driver. With this piglit marks tests that hung the gpu as
dmesg-fail, without this they might even pass. Ken raised this on irc and
I agree that it's a must-have feature for developers that their testsuite
can tell them when stuff broke. Provding this some other way is a lot more
work and imo should be done in a separate patch, this here is just the
minimal fix for this regression.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  8:13   ` Daniel Vetter
@ 2014-10-01  8:19     ` Chris Wilson
  2014-10-01  8:29       ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01  8:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala

On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > This seems to have been accidentally lost in
> > > 
> > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > 
> > >     drm/i915: ban badly behaving contexts
> > > 
> > > Without this real gpu hangs only log output at info level, which gets
> > > filtered away by piglit's testrunner.
> > 
> > A successful GPU hang is not an error. Might be a warn or a notice, but
> > it certainly isn't a driver error.
> 
> Well not of the kernel driver, but might very well be a bug in the
> userspace driver. With this piglit marks tests that hung the gpu as
> dmesg-fail, without this they might even pass. Ken raised this on irc and
> I agree that it's a must-have feature for developers that their testsuite
> can tell them when stuff broke. Provding this some other way is a lot more
> work and imo should be done in a separate patch, this here is just the
> minimal fix for this regression.

I strongly disagree that we should be working around self-imposed
limitations of the test suite by making users believe their kernel is
broken.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  8:19     ` Chris Wilson
@ 2014-10-01  8:29       ` Daniel Vetter
  2014-10-01  8:52         ` Kenneth Graunke
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  8:29 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Daniel Vetter,
	Intel Graphics Development, Mika Kuoppala, Kenneth Graunke,
	Daniel Vetter

On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > This seems to have been accidentally lost in
> > > > 
> > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > > 
> > > >     drm/i915: ban badly behaving contexts
> > > > 
> > > > Without this real gpu hangs only log output at info level, which gets
> > > > filtered away by piglit's testrunner.
> > > 
> > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > it certainly isn't a driver error.
> > 
> > Well not of the kernel driver, but might very well be a bug in the
> > userspace driver. With this piglit marks tests that hung the gpu as
> > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > I agree that it's a must-have feature for developers that their testsuite
> > can tell them when stuff broke. Provding this some other way is a lot more
> > work and imo should be done in a separate patch, this here is just the
> > minimal fix for this regression.
> 
> I strongly disagree that we should be working around self-imposed
> limitations of the test suite by making users believe their kernel is
> broken.

So what else should piglit do then?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  8:29       ` Daniel Vetter
@ 2014-10-01  8:52         ` Kenneth Graunke
  2014-10-01  9:10           ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Kenneth Graunke @ 2014-10-01  8:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala


[-- Attachment #1.1: Type: text/plain, Size: 2057 bytes --]

On Wednesday, October 01, 2014 10:29:07 AM Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > > This seems to have been accidentally lost in
> > > > > 
> > > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > > > 
> > > > >     drm/i915: ban badly behaving contexts
> > > > > 
> > > > > Without this real gpu hangs only log output at info level, which gets
> > > > > filtered away by piglit's testrunner.
> > > > 
> > > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > > it certainly isn't a driver error.
> > > 
> > > Well not of the kernel driver, but might very well be a bug in the
> > > userspace driver. With this piglit marks tests that hung the gpu as
> > > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > > I agree that it's a must-have feature for developers that their testsuite
> > > can tell them when stuff broke. Provding this some other way is a lot more
> > > work and imo should be done in a separate patch, this here is just the
> > > minimal fix for this regression.
> > 
> > I strongly disagree that we should be working around self-imposed
> > limitations of the test suite by making users believe their kernel is
> > broken.
> 
> So what else should piglit do then?
> -Daniel

Your GPU hanging is clearly more severe than "info" - it may impact your system stability, and likely represents a bug somewhere in the graphics drivers (whether kernel or userspace).  I think we all agree on that.

Piglit runs "dmesg --level emerg,alert,crit,err,warn,notice", which covers everything except "info" and "debug".  So anything other than info/debug would be just fine.

--Ken

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  8:52         ` Kenneth Graunke
@ 2014-10-01  9:10           ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01  9:10 UTC (permalink / raw)
  To: Kenneth Graunke
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala

On Wed, Oct 01, 2014 at 01:52:20AM -0700, Kenneth Graunke wrote:
> On Wednesday, October 01, 2014 10:29:07 AM Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> > > On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > > > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > > > This seems to have been accidentally lost in
> > > > > > 
> > > > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > > > > 
> > > > > >     drm/i915: ban badly behaving contexts
> > > > > > 
> > > > > > Without this real gpu hangs only log output at info level, which gets
> > > > > > filtered away by piglit's testrunner.
> > > > > 
> > > > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > > > it certainly isn't a driver error.
> > > > 
> > > > Well not of the kernel driver, but might very well be a bug in the
> > > > userspace driver. With this piglit marks tests that hung the gpu as
> > > > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > > > I agree that it's a must-have feature for developers that their testsuite
> > > > can tell them when stuff broke. Provding this some other way is a lot more
> > > > work and imo should be done in a separate patch, this here is just the
> > > > minimal fix for this regression.
> > > 
> > > I strongly disagree that we should be working around self-imposed
> > > limitations of the test suite by making users believe their kernel is
> > > broken.
> > 
> > So what else should piglit do then?
> > -Daniel
> 
> Your GPU hanging is clearly more severe than "info" - it may impact your system stability, and likely represents a bug somewhere in the graphics drivers (whether kernel or userspace).  I think we all agree on that.
> 
> Piglit runs "dmesg --level emerg,alert,crit,err,warn,notice", which covers everything except "info" and "debug".  So anything other than info/debug would be just fine.

If we are happy with KERN_NOTICE (normal, but significant condition), that
is what I would suggest. Actually, we should make the GPU hang detection
itself the notice (to aide regular users). But that probably runs into
complications with the simulated hangs with igt causing a WARN test
failure - but again, I'd rather our interface with the user (and
userfacing bug reporting tools) be correct.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
       [not found] <1412118259-4860-1-git-send-email-daniel.vetter@ffwll.c>
@ 2014-10-01  9:15 ` Daniel Vetter
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  9:15 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

This seems to have been accidentally lost in

commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Aug 30 16:19:28 2013 +0300

    drm/i915: ban badly behaving contexts

Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.

v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..57c1f5608462 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
 				 "error for simulated gpu hangs\n");
 			ret = 0;
 		}
+	} else {
+		DRM_ERROR("Reset chip after gpu hang\n");
 	}
 
 	if (ret) {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
  2014-10-01  6:28 ` Chris Wilson
@ 2014-10-01  9:28 ` Daniel Vetter
  2014-10-01  9:37   ` Chris Wilson
  2014-10-01  9:32 ` Daniel Vetter
  2014-10-01 12:03 ` Daniel Vetter
  3 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  9:28 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

This seems to have been accidentally lost in

commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Aug 30 16:19:28 2013 +0300

    drm/i915: ban badly behaving contexts

Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.

v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.

v3: git add and lack of coffee don't mix well.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..c9703816da28 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
 				 "error for simulated gpu hangs\n");
 			ret = 0;
 		}
+	} else {
+		pr_notice("drm/i915: Reset chip after gpu hang\n");
 	}
 
 	if (ret) {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
  2014-10-01  6:28 ` Chris Wilson
  2014-10-01  9:28 ` Daniel Vetter
@ 2014-10-01  9:32 ` Daniel Vetter
  2014-10-01 12:03 ` Daniel Vetter
  3 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  9:32 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

This seems to have been accidentally lost in

commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Aug 30 16:19:28 2013 +0300

    drm/i915: ban badly behaving contexts

Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.

v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.

v3: git add and lack of coffee don't mix well.

v4: Message is in between hw and sw reset, so switch verb to
continuous form.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..a77b6608a271 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
 				 "error for simulated gpu hangs\n");
 			ret = 0;
 		}
+	} else {
+		pr_notice("drm/i915: Resetting chip after gpu hang\n");
 	}
 
 	if (ret) {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  9:28 ` Daniel Vetter
@ 2014-10-01  9:37   ` Chris Wilson
  2014-10-01  9:41     ` Daniel Vetter
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01  9:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
> 
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date:   Fri Aug 30 16:19:28 2013 +0300
> 
>     drm/i915: ban badly behaving contexts
> 
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
> 
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
> 
> v3: git add and lack of coffee don't mix well.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
it is itself incompatible with the tests. This now causes those to fail
with a WARN.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  9:37   ` Chris Wilson
@ 2014-10-01  9:41     ` Daniel Vetter
  2014-10-01  9:50       ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01  9:41 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala, Kenneth Graunke, Daniel Vetter

On Wed, Oct 01, 2014 at 10:37:38AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> > This seems to have been accidentally lost in
> > 
> > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Date:   Fri Aug 30 16:19:28 2013 +0300
> > 
> >     drm/i915: ban badly behaving contexts
> > 
> > Without this real gpu hangs only log output at info level, which gets
> > filtered away by piglit's testrunner.
> > 
> > v2: Tune down to notice level. Note that we need to add drm/i915 so
> > that at least the automatic igt dmesg filtering still picks it up.
> > 
> > v3: git add and lack of coffee don't mix well.
> > 
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
> it is itself incompatible with the tests. This now causes those to fail
> with a WARN.

In Mika's reset stats test we've set stop_rings after execbuf so that we
could submit fancy stuff like endless loops with batch-chaining while
still shutting up the kernel's output for real hangs.

The nice benefit is that looking at stop_rings then gives you an easy way
to double-check from the test that it all worked out since it's getting
auto-cleared.

In any case we have a fairly great mess here :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01  9:41     ` Daniel Vetter
@ 2014-10-01  9:50       ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01  9:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
	Mika Kuoppala

On Wed, Oct 01, 2014 at 11:41:31AM +0200, Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 10:37:38AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> > > This seems to have been accidentally lost in
> > > 
> > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > 
> > >     drm/i915: ban badly behaving contexts
> > > 
> > > Without this real gpu hangs only log output at info level, which gets
> > > filtered away by piglit's testrunner.
> > > 
> > > v2: Tune down to notice level. Note that we need to add drm/i915 so
> > > that at least the automatic igt dmesg filtering still picks it up.
> > > 
> > > v3: git add and lack of coffee don't mix well.
> > > 
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > > Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > 
> > Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
> > it is itself incompatible with the tests. This now causes those to fail
> > with a WARN.
> 
> In Mika's reset stats test we've set stop_rings after execbuf so that we
> could submit fancy stuff like endless loops with batch-chaining while
> still shutting up the kernel's output for real hangs.

That doesn't work for me when I try hanging from one context and running
normal coherency tests in another...
 
> The nice benefit is that looking at stop_rings then gives you an easy way
> to double-check from the test that it all worked out since it's getting
> auto-cleared.

A single global value when using multiple concurrent rendering contexts
is not so nice. Plus the interface is a pita.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
                   ` (2 preceding siblings ...)
  2014-10-01  9:32 ` Daniel Vetter
@ 2014-10-01 12:03 ` Daniel Vetter
  2014-10-01 13:54   ` Chris Wilson
  2014-10-01 14:40   ` Mika Kuoppala
  3 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 12:03 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

This seems to have been accidentally lost in

commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Fri Aug 30 16:19:28 2013 +0300

    drm/i915: ban badly behaving contexts

Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.

v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.

v3: git add and lack of coffee don't mix well.

v4: Message is in between hw and sw reset, so switch verb to
continuous form.

v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
injecting lots of hangs I guess we need to revamp this all anyway when
merging. For now this should plug the regression for piglit testing
mesa.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..fec4afe526c7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -844,6 +844,9 @@ int i915_reset(struct drm_device *dev)
 		}
 	}
 
+	if (i915_stop_rings_allow_warn(dev_priv))
+		pr_notice("drm/i915: Resetting chip after gpu hang\n");
+
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
 		mutex_unlock(&dev->struct_mutex);
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01 12:03 ` Daniel Vetter
@ 2014-10-01 13:54   ` Chris Wilson
  2014-10-01 14:40   ` Mika Kuoppala
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 13:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Wed, Oct 01, 2014 at 02:03:10PM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
> 
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date:   Fri Aug 30 16:19:28 2013 +0300
> 
>     drm/i915: ban badly behaving contexts
> 
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
> 
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
> 
> v3: git add and lack of coffee don't mix well.
> 
> v4: Message is in between hw and sw reset, so switch verb to
> continuous form.
> 
> v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
> injecting lots of hangs I guess we need to revamp this all anyway when
> merging. For now this should plug the regression for piglit testing
> mesa.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Whilst I am unhappy with stop_rings in principle, I like this compromise.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
  2014-10-01 12:03 ` Daniel Vetter
  2014-10-01 13:54   ` Chris Wilson
@ 2014-10-01 14:40   ` Mika Kuoppala
  1 sibling, 0 replies; 16+ messages in thread
From: Mika Kuoppala @ 2014-10-01 14:40 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter

Daniel Vetter <daniel.vetter@ffwll.ch> writes:

> This seems to have been accidentally lost in
>
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date:   Fri Aug 30 16:19:28 2013 +0300
>
>     drm/i915: ban badly behaving contexts
>
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
>
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
>
> v3: git add and lack of coffee don't mix well.
>
> v4: Message is in between hw and sw reset, so switch verb to
> continuous form.
>
> v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
> injecting lots of hangs I guess we need to revamp this all anyway when
> merging. For now this should plug the regression for piglit testing
> mesa.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index ea93ff151a74..fec4afe526c7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -844,6 +844,9 @@ int i915_reset(struct drm_device *dev)
>  		}
>  	}
>  
> +	if (i915_stop_rings_allow_warn(dev_priv))
> +		pr_notice("drm/i915: Resetting chip after gpu hang\n");
> +

I would have added also:

"As of now, further functionality or performance testing beyond this point is
utterly pointless."

Perhaps in caps.

-Mika

>  	if (ret) {
>  		DRM_ERROR("Failed to reset chip: %i\n", ret);
>  		mutex_unlock(&dev->struct_mutex);
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-01 14:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
2014-10-01  6:28 ` Chris Wilson
2014-10-01  8:13   ` Daniel Vetter
2014-10-01  8:19     ` Chris Wilson
2014-10-01  8:29       ` Daniel Vetter
2014-10-01  8:52         ` Kenneth Graunke
2014-10-01  9:10           ` Chris Wilson
2014-10-01  9:28 ` Daniel Vetter
2014-10-01  9:37   ` Chris Wilson
2014-10-01  9:41     ` Daniel Vetter
2014-10-01  9:50       ` Chris Wilson
2014-10-01  9:32 ` Daniel Vetter
2014-10-01 12:03 ` Daniel Vetter
2014-10-01 13:54   ` Chris Wilson
2014-10-01 14:40   ` Mika Kuoppala
     [not found] <1412118259-4860-1-git-send-email-daniel.vetter@ffwll.c>
2014-10-01  9:15 ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox