* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
@ 2014-09-30 23:04 Daniel Vetter
2014-10-01 6:28 ` Chris Wilson
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-09-30 23:04 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
This seems to have been accidentally lost in
commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 30 16:19:28 2013 +0300
drm/i915: ban badly behaving contexts
Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
Note, totally untested since it's a bit late here ;-)
-Daniel
---
drivers/gpu/drm/i915/i915_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6948877c881c..1870759a5ed8 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
"error for simulated gpu hangs\n");
ret = 0;
}
+ } else {
+ DRM_ERROR("Reset chip after gpu hang\n");
}
if (ret) {
--
2.1.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
@ 2014-10-01 6:28 ` Chris Wilson
2014-10-01 8:13 ` Daniel Vetter
2014-10-01 9:28 ` Daniel Vetter
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 6:28 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala
On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
>
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date: Fri Aug 30 16:19:28 2013 +0300
>
> drm/i915: ban badly behaving contexts
>
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
A successful GPU hang is not an error. Might be a warn or a notice, but
it certainly isn't a driver error.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 6:28 ` Chris Wilson
@ 2014-10-01 8:13 ` Daniel Vetter
2014-10-01 8:19 ` Chris Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 8:13 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala, Kenneth Graunke, Daniel Vetter
On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > This seems to have been accidentally lost in
> >
> > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Date: Fri Aug 30 16:19:28 2013 +0300
> >
> > drm/i915: ban badly behaving contexts
> >
> > Without this real gpu hangs only log output at info level, which gets
> > filtered away by piglit's testrunner.
>
> A successful GPU hang is not an error. Might be a warn or a notice, but
> it certainly isn't a driver error.
Well not of the kernel driver, but might very well be a bug in the
userspace driver. With this piglit marks tests that hung the gpu as
dmesg-fail, without this they might even pass. Ken raised this on irc and
I agree that it's a must-have feature for developers that their testsuite
can tell them when stuff broke. Provding this some other way is a lot more
work and imo should be done in a separate patch, this here is just the
minimal fix for this regression.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 8:13 ` Daniel Vetter
@ 2014-10-01 8:19 ` Chris Wilson
2014-10-01 8:29 ` Daniel Vetter
0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 8:19 UTC (permalink / raw)
To: Daniel Vetter
Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala
On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > This seems to have been accidentally lost in
> > >
> > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Date: Fri Aug 30 16:19:28 2013 +0300
> > >
> > > drm/i915: ban badly behaving contexts
> > >
> > > Without this real gpu hangs only log output at info level, which gets
> > > filtered away by piglit's testrunner.
> >
> > A successful GPU hang is not an error. Might be a warn or a notice, but
> > it certainly isn't a driver error.
>
> Well not of the kernel driver, but might very well be a bug in the
> userspace driver. With this piglit marks tests that hung the gpu as
> dmesg-fail, without this they might even pass. Ken raised this on irc and
> I agree that it's a must-have feature for developers that their testsuite
> can tell them when stuff broke. Provding this some other way is a lot more
> work and imo should be done in a separate patch, this here is just the
> minimal fix for this regression.
I strongly disagree that we should be working around self-imposed
limitations of the test suite by making users believe their kernel is
broken.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 8:19 ` Chris Wilson
@ 2014-10-01 8:29 ` Daniel Vetter
2014-10-01 8:52 ` Kenneth Graunke
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 8:29 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Daniel Vetter,
Intel Graphics Development, Mika Kuoppala, Kenneth Graunke,
Daniel Vetter
On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > This seems to have been accidentally lost in
> > > >
> > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Date: Fri Aug 30 16:19:28 2013 +0300
> > > >
> > > > drm/i915: ban badly behaving contexts
> > > >
> > > > Without this real gpu hangs only log output at info level, which gets
> > > > filtered away by piglit's testrunner.
> > >
> > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > it certainly isn't a driver error.
> >
> > Well not of the kernel driver, but might very well be a bug in the
> > userspace driver. With this piglit marks tests that hung the gpu as
> > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > I agree that it's a must-have feature for developers that their testsuite
> > can tell them when stuff broke. Provding this some other way is a lot more
> > work and imo should be done in a separate patch, this here is just the
> > minimal fix for this regression.
>
> I strongly disagree that we should be working around self-imposed
> limitations of the test suite by making users believe their kernel is
> broken.
So what else should piglit do then?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 8:29 ` Daniel Vetter
@ 2014-10-01 8:52 ` Kenneth Graunke
2014-10-01 9:10 ` Chris Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Kenneth Graunke @ 2014-10-01 8:52 UTC (permalink / raw)
To: Daniel Vetter
Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala
[-- Attachment #1.1: Type: text/plain, Size: 2057 bytes --]
On Wednesday, October 01, 2014 10:29:07 AM Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > > This seems to have been accidentally lost in
> > > > >
> > > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > > Date: Fri Aug 30 16:19:28 2013 +0300
> > > > >
> > > > > drm/i915: ban badly behaving contexts
> > > > >
> > > > > Without this real gpu hangs only log output at info level, which gets
> > > > > filtered away by piglit's testrunner.
> > > >
> > > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > > it certainly isn't a driver error.
> > >
> > > Well not of the kernel driver, but might very well be a bug in the
> > > userspace driver. With this piglit marks tests that hung the gpu as
> > > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > > I agree that it's a must-have feature for developers that their testsuite
> > > can tell them when stuff broke. Provding this some other way is a lot more
> > > work and imo should be done in a separate patch, this here is just the
> > > minimal fix for this regression.
> >
> > I strongly disagree that we should be working around self-imposed
> > limitations of the test suite by making users believe their kernel is
> > broken.
>
> So what else should piglit do then?
> -Daniel
Your GPU hanging is clearly more severe than "info" - it may impact your system stability, and likely represents a bug somewhere in the graphics drivers (whether kernel or userspace). I think we all agree on that.
Piglit runs "dmesg --level emerg,alert,crit,err,warn,notice", which covers everything except "info" and "debug". So anything other than info/debug would be just fine.
--Ken
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 8:52 ` Kenneth Graunke
@ 2014-10-01 9:10 ` Chris Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 9:10 UTC (permalink / raw)
To: Kenneth Graunke
Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala
On Wed, Oct 01, 2014 at 01:52:20AM -0700, Kenneth Graunke wrote:
> On Wednesday, October 01, 2014 10:29:07 AM Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> > > On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > > > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > > > This seems to have been accidentally lost in
> > > > > >
> > > > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > > > Date: Fri Aug 30 16:19:28 2013 +0300
> > > > > >
> > > > > > drm/i915: ban badly behaving contexts
> > > > > >
> > > > > > Without this real gpu hangs only log output at info level, which gets
> > > > > > filtered away by piglit's testrunner.
> > > > >
> > > > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > > > it certainly isn't a driver error.
> > > >
> > > > Well not of the kernel driver, but might very well be a bug in the
> > > > userspace driver. With this piglit marks tests that hung the gpu as
> > > > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > > > I agree that it's a must-have feature for developers that their testsuite
> > > > can tell them when stuff broke. Provding this some other way is a lot more
> > > > work and imo should be done in a separate patch, this here is just the
> > > > minimal fix for this regression.
> > >
> > > I strongly disagree that we should be working around self-imposed
> > > limitations of the test suite by making users believe their kernel is
> > > broken.
> >
> > So what else should piglit do then?
> > -Daniel
>
> Your GPU hanging is clearly more severe than "info" - it may impact your system stability, and likely represents a bug somewhere in the graphics drivers (whether kernel or userspace). I think we all agree on that.
>
> Piglit runs "dmesg --level emerg,alert,crit,err,warn,notice", which covers everything except "info" and "debug". So anything other than info/debug would be just fine.
If we are happy with KERN_NOTICE (normal, but significant condition), that
is what I would suggest. Actually, we should make the GPU hang detection
itself the notice (to aide regular users). But that probably runs into
complications with the simulated hangs with igt causing a WARN test
failure - but again, I'd rather our interface with the user (and
userfacing bug reporting tools) be correct.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
[not found] <1412118259-4860-1-git-send-email-daniel.vetter@ffwll.c>
@ 2014-10-01 9:15 ` Daniel Vetter
0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 9:15 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
This seems to have been accidentally lost in
commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 30 16:19:28 2013 +0300
drm/i915: ban badly behaving contexts
Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.
v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..57c1f5608462 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
"error for simulated gpu hangs\n");
ret = 0;
}
+ } else {
+ DRM_ERROR("Reset chip after gpu hang\n");
}
if (ret) {
--
2.1.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
2014-10-01 6:28 ` Chris Wilson
@ 2014-10-01 9:28 ` Daniel Vetter
2014-10-01 9:37 ` Chris Wilson
2014-10-01 9:32 ` Daniel Vetter
2014-10-01 12:03 ` Daniel Vetter
3 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 9:28 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
This seems to have been accidentally lost in
commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 30 16:19:28 2013 +0300
drm/i915: ban badly behaving contexts
Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.
v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.
v3: git add and lack of coffee don't mix well.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..c9703816da28 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
"error for simulated gpu hangs\n");
ret = 0;
}
+ } else {
+ pr_notice("drm/i915: Reset chip after gpu hang\n");
}
if (ret) {
--
2.1.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
2014-10-01 6:28 ` Chris Wilson
2014-10-01 9:28 ` Daniel Vetter
@ 2014-10-01 9:32 ` Daniel Vetter
2014-10-01 12:03 ` Daniel Vetter
3 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 9:32 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
This seems to have been accidentally lost in
commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 30 16:19:28 2013 +0300
drm/i915: ban badly behaving contexts
Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.
v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.
v3: git add and lack of coffee don't mix well.
v4: Message is in between hw and sw reset, so switch verb to
continuous form.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..a77b6608a271 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -842,6 +842,8 @@ int i915_reset(struct drm_device *dev)
"error for simulated gpu hangs\n");
ret = 0;
}
+ } else {
+ pr_notice("drm/i915: Resetting chip after gpu hang\n");
}
if (ret) {
--
2.1.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 9:28 ` Daniel Vetter
@ 2014-10-01 9:37 ` Chris Wilson
2014-10-01 9:41 ` Daniel Vetter
0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 9:37 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala
On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
>
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date: Fri Aug 30 16:19:28 2013 +0300
>
> drm/i915: ban badly behaving contexts
>
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
>
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
>
> v3: git add and lack of coffee don't mix well.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
it is itself incompatible with the tests. This now causes those to fail
with a WARN.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 9:37 ` Chris Wilson
@ 2014-10-01 9:41 ` Daniel Vetter
2014-10-01 9:50 ` Chris Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 9:41 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala, Kenneth Graunke, Daniel Vetter
On Wed, Oct 01, 2014 at 10:37:38AM +0100, Chris Wilson wrote:
> On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> > This seems to have been accidentally lost in
> >
> > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Date: Fri Aug 30 16:19:28 2013 +0300
> >
> > drm/i915: ban badly behaving contexts
> >
> > Without this real gpu hangs only log output at info level, which gets
> > filtered away by piglit's testrunner.
> >
> > v2: Tune down to notice level. Note that we need to add drm/i915 so
> > that at least the automatic igt dmesg filtering still picks it up.
> >
> > v3: git add and lack of coffee don't mix well.
> >
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>
> Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
> it is itself incompatible with the tests. This now causes those to fail
> with a WARN.
In Mika's reset stats test we've set stop_rings after execbuf so that we
could submit fancy stuff like endless loops with batch-chaining while
still shutting up the kernel's output for real hangs.
The nice benefit is that looking at stop_rings then gives you an easy way
to double-check from the test that it all worked out since it's getting
auto-cleared.
In any case we have a fairly great mess here :(
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 9:41 ` Daniel Vetter
@ 2014-10-01 9:50 ` Chris Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 9:50 UTC (permalink / raw)
To: Daniel Vetter
Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala
On Wed, Oct 01, 2014 at 11:41:31AM +0200, Daniel Vetter wrote:
> On Wed, Oct 01, 2014 at 10:37:38AM +0100, Chris Wilson wrote:
> > On Wed, Oct 01, 2014 at 11:28:47AM +0200, Daniel Vetter wrote:
> > > This seems to have been accidentally lost in
> > >
> > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Date: Fri Aug 30 16:19:28 2013 +0300
> > >
> > > drm/i915: ban badly behaving contexts
> > >
> > > Without this real gpu hangs only log output at info level, which gets
> > > filtered away by piglit's testrunner.
> > >
> > > v2: Tune down to notice level. Note that we need to add drm/i915 so
> > > that at least the automatic igt dmesg filtering still picks it up.
> > >
> > > v3: git add and lack of coffee don't mix well.
> > >
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > > Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >
> > Hmm, in my igt hang tests, I don't use dev_priv->gpu_error.stop_rings as
> > it is itself incompatible with the tests. This now causes those to fail
> > with a WARN.
>
> In Mika's reset stats test we've set stop_rings after execbuf so that we
> could submit fancy stuff like endless loops with batch-chaining while
> still shutting up the kernel's output for real hangs.
That doesn't work for me when I try hanging from one context and running
normal coherency tests in another...
> The nice benefit is that looking at stop_rings then gives you an easy way
> to double-check from the test that it all worked out since it's getting
> auto-cleared.
A single global value when using multiple concurrent rendering contexts
is not so nice. Plus the interface is a pita.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
` (2 preceding siblings ...)
2014-10-01 9:32 ` Daniel Vetter
@ 2014-10-01 12:03 ` Daniel Vetter
2014-10-01 13:54 ` Chris Wilson
2014-10-01 14:40 ` Mika Kuoppala
3 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2014-10-01 12:03 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
This seems to have been accidentally lost in
commit be62acb4cce1389a28296852737e3917d9cc5b25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Fri Aug 30 16:19:28 2013 +0300
drm/i915: ban badly behaving contexts
Without this real gpu hangs only log output at info level, which gets
filtered away by piglit's testrunner.
v2: Tune down to notice level. Note that we need to add drm/i915 so
that at least the automatic igt dmesg filtering still picks it up.
v3: git add and lack of coffee don't mix well.
v4: Message is in between hw and sw reset, so switch verb to
continuous form.
v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
injecting lots of hangs I guess we need to revamp this all anyway when
merging. For now this should plug the regression for piglit testing
mesa.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ea93ff151a74..fec4afe526c7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -844,6 +844,9 @@ int i915_reset(struct drm_device *dev)
}
}
+ if (i915_stop_rings_allow_warn(dev_priv))
+ pr_notice("drm/i915: Resetting chip after gpu hang\n");
+
if (ret) {
DRM_ERROR("Failed to reset chip: %i\n", ret);
mutex_unlock(&dev->struct_mutex);
--
2.1.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 12:03 ` Daniel Vetter
@ 2014-10-01 13:54 ` Chris Wilson
2014-10-01 14:40 ` Mika Kuoppala
1 sibling, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-01 13:54 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala
On Wed, Oct 01, 2014 at 02:03:10PM +0200, Daniel Vetter wrote:
> This seems to have been accidentally lost in
>
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date: Fri Aug 30 16:19:28 2013 +0300
>
> drm/i915: ban badly behaving contexts
>
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
>
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
>
> v3: git add and lack of coffee don't mix well.
>
> v4: Message is in between hw and sw reset, so switch verb to
> continuous form.
>
> v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
> injecting lots of hangs I guess we need to revamp this all anyway when
> merging. For now this should plug the regression for piglit testing
> mesa.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Whilst I am unhappy with stop_rings in principle, I like this compromise.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs
2014-10-01 12:03 ` Daniel Vetter
2014-10-01 13:54 ` Chris Wilson
@ 2014-10-01 14:40 ` Mika Kuoppala
1 sibling, 0 replies; 16+ messages in thread
From: Mika Kuoppala @ 2014-10-01 14:40 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter
Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> This seems to have been accidentally lost in
>
> commit be62acb4cce1389a28296852737e3917d9cc5b25
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date: Fri Aug 30 16:19:28 2013 +0300
>
> drm/i915: ban badly behaving contexts
>
> Without this real gpu hangs only log output at info level, which gets
> filtered away by piglit's testrunner.
>
> v2: Tune down to notice level. Note that we need to add drm/i915 so
> that at least the automatic igt dmesg filtering still picks it up.
>
> v3: git add and lack of coffee don't mix well.
>
> v4: Message is in between hw and sw reset, so switch verb to
> continuous form.
>
> v5: Use i915_stop_rings_allow_warn for consistency. For Chris' case of
> injecting lots of hangs I guess we need to revamp this all anyway when
> merging. For now this should plug the regression for piglit testing
> mesa.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index ea93ff151a74..fec4afe526c7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -844,6 +844,9 @@ int i915_reset(struct drm_device *dev)
> }
> }
>
> + if (i915_stop_rings_allow_warn(dev_priv))
> + pr_notice("drm/i915: Resetting chip after gpu hang\n");
> +
I would have added also:
"As of now, further functionality or performance testing beyond this point is
utterly pointless."
Perhaps in caps.
-Mika
> if (ret) {
> DRM_ERROR("Failed to reset chip: %i\n", ret);
> mutex_unlock(&dev->struct_mutex);
> --
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-10-01 14:41 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-30 23:04 [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs Daniel Vetter
2014-10-01 6:28 ` Chris Wilson
2014-10-01 8:13 ` Daniel Vetter
2014-10-01 8:19 ` Chris Wilson
2014-10-01 8:29 ` Daniel Vetter
2014-10-01 8:52 ` Kenneth Graunke
2014-10-01 9:10 ` Chris Wilson
2014-10-01 9:28 ` Daniel Vetter
2014-10-01 9:37 ` Chris Wilson
2014-10-01 9:41 ` Daniel Vetter
2014-10-01 9:50 ` Chris Wilson
2014-10-01 9:32 ` Daniel Vetter
2014-10-01 12:03 ` Daniel Vetter
2014-10-01 13:54 ` Chris Wilson
2014-10-01 14:40 ` Mika Kuoppala
[not found] <1412118259-4860-1-git-send-email-daniel.vetter@ffwll.c>
2014-10-01 9:15 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox