From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: I've got the RC6 bug Date: Wed, 18 Jan 2012 21:09:37 +0100 Message-ID: <20120118200937.GE4002@phenom.ffwll.local> References: <20120116163338.GA3627@phenom.ffwll.local> <20120118002426.GB4093@phenom.ffwll.local> <87lip4ew65.fsf@eliezer.anholt.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-we0-f177.google.com (mail-we0-f177.google.com [74.125.82.177]) by gabe.freedesktop.org (Postfix) with ESMTP id 1070B9EFD2 for ; Wed, 18 Jan 2012 12:09:40 -0800 (PST) Received: by werm13 with SMTP id m13so3690205wer.36 for ; Wed, 18 Jan 2012 12:09:40 -0800 (PST) Content-Disposition: inline In-Reply-To: <87lip4ew65.fsf@eliezer.anholt.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Eric Anholt Cc: intel-gfx@lists.freedesktop.org, Ben Widawsky List-Id: intel-gfx@lists.freedesktop.org On Wed, Jan 18, 2012 at 09:51:30AM -0800, Eric Anholt wrote: > On Wed, 18 Jan 2012 11:17:52 +0000, Chris Wilson wrote: > > On Wed, 18 Jan 2012 01:24:26 +0100, Daniel Vetter wrote: > > > On Wed, Jan 18, 2012 at 01:16:02AM +0100, CC wrote: > > > > I attached the error state. > > > > > > Nice one, your gpu seems to have simply disappeared. And the ringbuffer > > > contains a rather peculiar cmd sequence. Putting Chris (maybe he > > > recognizes the pattern) and Ben (he's got a patch in the works to dump a > > > debug register that might be interesting here) on cc. It's too late atm > > > for me to think about this some more. > > > > Not simply disappeared, someone clobbered it with an extremely large > > hammer. The GPU was killed by a stray write to address 0 which took out > > the render ring buffer and its hws page. So my first thought is a > > missing relocation, and i965g springs to mind. > > -Chris > > At one point there was a bug in Mesa that wrote to 0: > > commit dfada714f8db3deea2fea3583c3c166a78db1117 > Author: Eric Anholt > Date: Fri Jun 17 18:20:36 2011 -0700 > > i965/gen6: Use an BO instead of writing to address 0 for PIPE_CONTROL W/A. > > This was spectacularly unsafe. On my system, address 0 happens to be > the hardware status page for the render ring, and the first quadword > of that happens to contain nothing we ever look at, but I sure didn't > look forward to having to debug some day when, for example, the kernel > happened to bind the ringbuffer before binding the hwsp. Unfortunately the error_state contains more garbage than just one stray 0 write. So yeah, if this is due to the i965g gallium driver, that would explain things - otherwise I'm hoping for Ben's reworked gt fifo patch. The CS regs are all 0, indicating that the gpu isn't getting out of deep sleep anymore. -Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48