public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Stop requesting error_state reports.
@ 2015-02-10 19:27 Rodrigo Vivi
  2015-02-10 21:14 ` Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rodrigo Vivi @ 2015-02-10 19:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: Rodrigo Vivi

These error states are great to know gpu state when it hangs.

But since we don't have automated tools to do analysis we are
facing much noise on bugzilla with end users reporting just
because "log asked to", while gpu reset worked and users probably
never notice any screen issue. Most of these reportes don't know
when it happened or how to retrigger the issue and somethimes
they are not even on the mood to retest again.

So, let's minimize our and end user's noise and protect this smaller
message with drm.debug. Developers, OSVs and users that face
real screen issue (should) always enabled this debug and will see
the message when error state got dumped.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 48ddbf4..77d63be 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1297,11 +1297,7 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
 	}
 
 	if (!warned) {
-		DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
-		DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
-		DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");
-		DRM_INFO("The gpu crash dump is required to analyze gpu hangs, so please always attach it.\n");
-		DRM_INFO("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
+		DRM_DEBUG_DRIVER("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
 		warned = true;
 	}
 }
-- 
1.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 19:27 [PATCH] drm/i915: Stop requesting error_state reports Rodrigo Vivi
@ 2015-02-10 21:14 ` Chris Wilson
  2015-02-10 22:02 ` Daniel Vetter
  2015-02-11  5:45 ` shuang.he
  2 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2015-02-10 21:14 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Tue, Feb 10, 2015 at 11:27:30AM -0800, Rodrigo Vivi wrote:
> These error states are great to know gpu state when it hangs.
> 
> But since we don't have automated tools to do analysis we are
> facing much noise on bugzilla with end users reporting just
> because "log asked to", while gpu reset worked and users probably
> never notice any screen issue.

Other than the corruption and the machine stuttering for a number of
seconds, sometimes many times per hour. I think that justifies having
some form of user visible message about what just happened.

> Most of these reportes don't know
> when it happened or how to retrigger the issue and somethimes
> they are not even on the mood to retest again.

It is the goal of the error state to capture exactly enough information
for post-mortem analysis. As you feel it is inadequate, please augment
it.
 
> So, let's minimize our and end user's noise and protect this smaller
> message with drm.debug. Developers, OSVs and users that face
> real screen issue (should) always enabled this debug and will see
> the message when error state got dumped.

I don't think the tradeoff is worth it. If you have automatic bug
reporting, then yes you can remove the plea to have the user do it, but
not before.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 19:27 [PATCH] drm/i915: Stop requesting error_state reports Rodrigo Vivi
  2015-02-10 21:14 ` Chris Wilson
@ 2015-02-10 22:02 ` Daniel Vetter
  2015-02-10 22:32   ` Rodrigo Vivi
  2015-02-11  5:45 ` shuang.he
  2 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2015-02-10 22:02 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Tue, Feb 10, 2015 at 11:27:30AM -0800, Rodrigo Vivi wrote:
> These error states are great to know gpu state when it hangs.
> 
> But since we don't have automated tools to do analysis we are
> facing much noise on bugzilla with end users reporting just
> because "log asked to", while gpu reset worked and users probably
> never notice any screen issue. Most of these reportes don't know
> when it happened or how to retrigger the issue and somethimes
> they are not even on the mood to retest again.

Hm, maybe we should reword it to make sure we only get good testers?

Instead of "Please file ..." do a "If you can build&test kernels and see
other issues together with this gpu hang notice file ..."?

I agree with Chris that we can't just mute these, overall the information
is imo valuable. We just need to get better at filtering them, and have
better information in the error states. E.g. with dri3 the pid/commi is
always the one for X, Mika has a small patch to fix that.

Closing our eyes won't make the bugs go away.
-Daniel

> 
> So, let's minimize our and end user's noise and protect this smaller
> message with drm.debug. Developers, OSVs and users that face
> real screen issue (should) always enabled this debug and will see
> the message when error state got dumped.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 48ddbf4..77d63be 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1297,11 +1297,7 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
>  	}
>  
>  	if (!warned) {
> -		DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
> -		DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
> -		DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");
> -		DRM_INFO("The gpu crash dump is required to analyze gpu hangs, so please always attach it.\n");
> -		DRM_INFO("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
> +		DRM_DEBUG_DRIVER("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
>  		warned = true;
>  	}
>  }
> -- 
> 1.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 22:02 ` Daniel Vetter
@ 2015-02-10 22:32   ` Rodrigo Vivi
  2015-02-10 22:51     ` Daniel Vetter
  2015-02-11  9:14     ` Jani Nikula
  0 siblings, 2 replies; 7+ messages in thread
From: Rodrigo Vivi @ 2015-02-10 22:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 2015-02-10 at 23:02 +0100, Daniel Vetter wrote:
> On Tue, Feb 10, 2015 at 11:27:30AM -0800, Rodrigo Vivi wrote:
> > These error states are great to know gpu state when it hangs.
> > 
> > But since we don't have automated tools to do analysis we are
> > facing much noise on bugzilla with end users reporting just
> > because "log asked to", while gpu reset worked and users probably
> > never notice any screen issue. Most of these reportes don't know
> > when it happened or how to retrigger the issue and somethimes
> > they are not even on the mood to retest again.
> 
> Hm, maybe we should reword it to make sure we only get good testers?
> 
> Instead of "Please file ..." do a "If you can build&test kernels and see
> other issues together with this gpu hang notice file ..."?

This is just one thing. Some OSVs complains we have to much noise for
end users. So 2 noises: bugzilla and logs.

> 
> I agree with Chris that we can't just mute these, overall the information
> is imo valuable. We just need to get better at filtering them, and have
> better information in the error states. E.g. with dri3 the pid/commi is
> always the one for X, Mika has a small patch to fix that.

I do agree this error state is valuable. I really like it.

> 
> Closing our eyes won't make the bugs go away.

Indeed. But this patch doesn't intend to close the eyes, but just open
when it has to be opened, i.e. when drm.debug is set.

When users face a issue that bother/matter he would enabled debug and
than we would receive the report. And QA/OSVs/Devs should always let
drm.debug enabled in certain level anyway.

> -Daniel

Thanks,
Rodrigo.
> 
> > 
> > So, let's minimize our and end user's noise and protect this smaller
> > message with drm.debug. Developers, OSVs and users that face
> > real screen issue (should) always enabled this debug and will see
> > the message when error state got dumped.
> > 
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 48ddbf4..77d63be 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1297,11 +1297,7 @@ void i915_capture_error_state(struct drm_device *dev, bool wedged,
> >  	}
> >  
> >  	if (!warned) {
> > -		DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
> > -		DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
> > -		DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");
> > -		DRM_INFO("The gpu crash dump is required to analyze gpu hangs, so please always attach it.\n");
> > -		DRM_INFO("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
> > +		DRM_DEBUG_DRIVER("GPU crash dump saved to /sys/class/drm/card%d/error\n", dev->primary->index);
> >  		warned = true;
> >  	}
> >  }
> > -- 
> > 1.9.3
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 22:32   ` Rodrigo Vivi
@ 2015-02-10 22:51     ` Daniel Vetter
  2015-02-11  9:14     ` Jani Nikula
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2015-02-10 22:51 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Tue, Feb 10, 2015 at 02:32:48PM -0800, Rodrigo Vivi wrote:
> On Tue, 2015-02-10 at 23:02 +0100, Daniel Vetter wrote:
> > On Tue, Feb 10, 2015 at 11:27:30AM -0800, Rodrigo Vivi wrote:
> > > These error states are great to know gpu state when it hangs.
> > > 
> > > But since we don't have automated tools to do analysis we are
> > > facing much noise on bugzilla with end users reporting just
> > > because "log asked to", while gpu reset worked and users probably
> > > never notice any screen issue. Most of these reportes don't know
> > > when it happened or how to retrigger the issue and somethimes
> > > they are not even on the mood to retest again.
> > 
> > Hm, maybe we should reword it to make sure we only get good testers?
> > 
> > Instead of "Please file ..." do a "If you can build&test kernels and see
> > other issues together with this gpu hang notice file ..."?
> 
> This is just one thing. Some OSVs complains we have to much noise for
> end users. So 2 noises: bugzilla and logs.

OSV noise is a different issue. We have already have
i915.verbose_state_checks for that, maybe we need to add another one for
gpu hangs. But imo that should actually come from OSV, not development
(like Rob Clark has done for the state checker).

> > I agree with Chris that we can't just mute these, overall the information
> > is imo valuable. We just need to get better at filtering them, and have
> > better information in the error states. E.g. with dri3 the pid/commi is
> > always the one for X, Mika has a small patch to fix that.
> 
> I do agree this error state is valuable. I really like it.
> 
> > 
> > Closing our eyes won't make the bugs go away.
> 
> Indeed. But this patch doesn't intend to close the eyes, but just open
> when it has to be opened, i.e. when drm.debug is set.

Imo drm.debug is too silent, we've had that ages ago and it resulted in
lots of unecessary roundtrips in bug reports because reporters almost
never attach the error state if there's a gpu hang. So that imo doesn't
help either with reducing bug team workload. The message is this verbose
because a single line was not good enough ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 19:27 [PATCH] drm/i915: Stop requesting error_state reports Rodrigo Vivi
  2015-02-10 21:14 ` Chris Wilson
  2015-02-10 22:02 ` Daniel Vetter
@ 2015-02-11  5:45 ` shuang.he
  2 siblings, 0 replies; 7+ messages in thread
From: shuang.he @ 2015-02-11  5:45 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, rodrigo.vivi

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5751
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV              +4-4              275/283              275/283
ILK                                  310/315              310/315
SNB              +3                 320/346              323/346
IVB                 -1              380/384              379/384
BYT                                  296/296              296/296
HSW              +3-1              422/428              424/428
BDW                                  318/333              318/333
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gem_fence_thrash_bo-write-verify-none      PASS(4, M7)      FAIL(1, M7)
*PNV  igt_gem_fence_thrash_bo-write-verify-x      PASS(4, M7)      FAIL(1, M7)
*PNV  igt_gem_fence_thrash_bo-write-verify-y      PASS(5, M7)      FAIL(1, M7)
 PNV  igt_gem_userptr_blits_coherency-sync      CRASH(4, M7)PASS(2, M7)      PASS(1, M7)
 PNV  igt_gem_userptr_blits_coherency-unsync      CRASH(4, M7)PASS(3, M7)      PASS(1, M7)
 PNV  igt_gem_userptr_blits_create-destroy-sync      NRUN(1, M7)PASS(7, M7)      PASS(1, M7)
*PNV  igt_gem_userptr_blits_forked-unsync-swapping-mempressure-interruptible      PASS(2, M7)      NO_RESULT(1, M7)
 PNV  igt_gen3_render_tiledx_blits      FAIL(3, M7)TIMEOUT(1, M7)PASS(4, M7)      FAIL(1, M7)
 PNV  igt_gen3_render_tiledy_blits      FAIL(3, M7)PASS(3, M7)      PASS(1, M7)
*SNB  igt_kms_flip_bo-too-big      BLACKLIST(1, M35)      PASS(1, M35)
*SNB  igt_kms_flip_bo-too-big-interruptible      BLACKLIST(1, M35)      PASS(1, M35)
*SNB  igt_kms_flip_event_leak      NSPT(5, M35)      PASS(1, M35)
*IVB  igt_gem_storedw_batches_loop_secure-dispatch      PASS(2, M4)      DMESG_WARN(1, M4)
*HSW  igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance      PASS(3, M40)      DMESG_WARN(1, M40)
*HSW  igt_kms_flip_bo-too-big      BLACKLIST(1, M40)      PASS(1, M40)
*HSW  igt_kms_flip_bo-too-big-interruptible      BLACKLIST(1, M40)      PASS(1, M40)
 HSW  igt_kms_flip_plain-flip-fb-recreate-interruptible      TIMEOUT(5, M40)PASS(4, M40)      PASS(1, M40)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Stop requesting error_state reports.
  2015-02-10 22:32   ` Rodrigo Vivi
  2015-02-10 22:51     ` Daniel Vetter
@ 2015-02-11  9:14     ` Jani Nikula
  1 sibling, 0 replies; 7+ messages in thread
From: Jani Nikula @ 2015-02-11  9:14 UTC (permalink / raw)
  To: Rodrigo Vivi, Daniel Vetter; +Cc: intel-gfx

On Wed, 11 Feb 2015, Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> When users face a issue that bother/matter he would enabled debug and
> than we would receive the report. And QA/OSVs/Devs should always let
> drm.debug enabled in certain level anyway.

Judging by the bug reports, most users won't enable drm.debug when they
face issues and report bugs, if they attach dmesg at all.

If a GPU hang does not warrant an error in the logs, what does?


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-11  9:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-10 19:27 [PATCH] drm/i915: Stop requesting error_state reports Rodrigo Vivi
2015-02-10 21:14 ` Chris Wilson
2015-02-10 22:02 ` Daniel Vetter
2015-02-10 22:32   ` Rodrigo Vivi
2015-02-10 22:51     ` Daniel Vetter
2015-02-11  9:14     ` Jani Nikula
2015-02-11  5:45 ` shuang.he

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox