From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] [RFC] drm/i915: Generate a hang error code Date: Wed, 5 Feb 2014 17:18:30 +0100 Message-ID: <20140205161830.GJ17001@phenom.ffwll.local> References: <1391516335-2723-1-git-send-email-benjamin.widawsky@intel.com> <20140205145908.50978978@jbarnes-t420> <20140205151502.GG17001@phenom.ffwll.local> <20140205160345.7cc020ca@jbarnes-t420> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f50.google.com (mail-ee0-f50.google.com [74.125.83.50]) by gabe.freedesktop.org (Postfix) with ESMTP id B128840FC for ; Wed, 5 Feb 2014 08:18:43 -0800 (PST) Received: by mail-ee0-f50.google.com with SMTP id d17so334633eek.37 for ; Wed, 05 Feb 2014 08:18:43 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140205160345.7cc020ca@jbarnes-t420> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Jesse Barnes Cc: Intel GFX , Ben Widawsky , Ben Widawsky List-Id: intel-gfx@lists.freedesktop.org On Wed, Feb 05, 2014 at 04:03:45PM +0000, Jesse Barnes wrote: > On Wed, 5 Feb 2014 16:15:02 +0100 > Daniel Vetter wrote: > > > On Wed, Feb 05, 2014 at 02:59:08PM +0000, Jesse Barnes wrote: > > > On Tue, 4 Feb 2014 12:18:55 +0000 > > > Ben Widawsky wrote: > > > > > > > We get a large number of bugs which have a, "hey I have that too" > > > > because they see a GPU hang in dmesg. While two machines of the same > > > > model having a GPU hang is indeed a coincidence, it is far from enough > > > > evidence to suggest they are the same. > > > > > > > > In order to reduce this effect, and hopefully get people to file new bug > > > > reports, clearly the error message itself has been insufficient (see ref > > > > at the bottom for a new bug report with this characteristic). > > > > > > > > The algorithm is purposely pretty naive. I don't think we need much in > > > > order to avoid the problem I am trying to solve, and keeping it naive > > > > gives us some ability to make a decent test case. > > > > > > I like the direction of this. If we can get some basic info into the > > > dmesg part of things (the only part regular users will actually look > > > at) we can probably avoid some of the "me too" action we see on general > > > GPU hangs. Having PID, comm, and some sort of hang signature are all > > > good steps in that direction imo. > > > > tbh I don't see much value in regular users trying to triage gpu hang. If > > they're not damn sure that they have a dupe (which means same platform, > > versions of the software stack and crashing games) I much prefer if they > > just send in a duplicate bug for us to triage. > > > > With the mis-design of bugzilla it's much harder to untangle a wrong > > me-too than mark something as duplicate. And especially long-running bugs > > are a royal pain if there's too much wrong me-too noise in there. > > > > Not a comment on the patch itself, just a general comment wrt avoiding > > me-too gpu hang reports. > > So you're saying the GPU error decode tool should create a bug template > for people so we don't get the "me too" reports? > > What I see above is that it's really important to avoid the "me too" > stuff, and to do it in such a way that false positives are minimized > (e.g. the IPEHR bit Ubuntu used to use). So I guess I don't see what's > unconvincing here. Today we have no way of differentiating w/o digging > in to the error record, which users definitely won't do, and this patch > seems like it could only help with that... so count me confused. We have a full paragraph explaining to users exactly what they need to do. They still me-too and fail to attach the error state. I don't how adding even more helps, since it never really did. Anyway, patch merged since meh. I'd still like to see the same information dumped into the error state though. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch