From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed] Date: Fri, 12 Jul 2013 10:27:06 -0700 Message-ID: <20130712172706.GA15384@bwidawsk.net> References: <20130712171239.GA15328@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: mesa-dev@lists.freedesktop.org, Intel GFX , "Tamminen, Eero T" , Ben Widawsky List-Id: intel-gfx@lists.freedesktop.org On Fri, Jul 12, 2013 at 07:16:37PM +0200, Daniel Vetter wrote: > On Fri, Jul 12, 2013 at 7:12 PM, Ben Widawsky > wrote: > > FWD'd from our internal list now that we have more insight. > > ----- Forwarded message from Ben Widawsky ----- > > > > Date: Thu, 11 Jul 2013 10:32:03 -0700 > > From: Ben Widawsky > > To: linux-gfx@linux.intel.com > > Subject: intel_gpu_top broken for HSW. Ideas needed > > Message-ID: <20130711173202.GB8802@intel.com> > > > > Hi everybody. > > > > While investigating a hard hang on Haswell. Eero noticed that > > intel_gpu_top helped to invoke the hang faster. I used this in my test > > case to validation, and they are suspecting it is a known issue which we > > have not yet worked around (and cannot reasonably workaround). > > > > [internal bug sighting redacted] > > > > To sum up, we cannot concurrently access registers within the same > > cacheline. It has the potential to hit a known bug. > > > > I see some choices: > > 1. Don't do anything. > > 2. Try to eliminate shared registers as much as possible. Instdone is > > used by the hangcheck, and we can eliminate hangcheck with a > > module parameter. Eero, can you try this as a workaround, btw? > > 3. Somehow make the kernel collect the top data and serialize access > > there. > > > > Anyone else have input? I personally do not use top very much, so I > > won't be volunteering to do any of these. > > > For now I'd just vote for a warning on gen6+ on the intel-gpu-top > screen that this might hang hw. If anyone cares we could add a debugfs > interface (or finally get real approval for the performance counters > the hw has an expose them properly). Not a intel_gpu_top user myself > though. > -Daniel > Eero: I meant to add by the way, ring head/tail are also used as much as instdone. So Maybe we can get rid of that for the ring fullness check. We're *very* likely to hit that one. -- Ben Widawsky, Intel Open Source Technology Center