From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: [PATCH 5/5] drm/i915: Fix error capture on BYT/BDW Date: Mon, 27 Jan 2014 12:31:08 -0800 Message-ID: <20140127203108.GC5369@bwidawsk.net> References: <1390616265-4329-1-git-send-email-benjamin.widawsky@intel.com> <1390616265-4329-5-git-send-email-benjamin.widawsky@intel.com> <20140126114740.GF23557@nuc-i3427.alporthouse.com> <20140126190540.GE894@bwidawsk.net> <20140126195559.GB5258@nuc-i3427.alporthouse.com> <20140126214728.GA1089@bwidawsk.net> <20140127134522.GD2508@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail.bwidawsk.net (bwidawsk.net [166.78.191.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 48A06FBC47 for ; Mon, 27 Jan 2014 12:31:20 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140127134522.GD2508@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Chris Wilson , Ben Widawsky , Intel GFX , Mika Kuoppala List-Id: intel-gfx@lists.freedesktop.org On Mon, Jan 27, 2014 at 01:45:22PM +0000, Chris Wilson wrote: > On Sun, Jan 26, 2014 at 01:47:29PM -0800, Ben Widawsky wrote: > > On Sun, Jan 26, 2014 at 07:55:59PM +0000, Chris Wilson wrote: > > > On Sun, Jan 26, 2014 at 11:05:40AM -0800, Ben Widawsky wrote: > > > > On Sun, Jan 26, 2014 at 11:47:40AM +0000, Chris Wilson wrote: > > > > > On Fri, Jan 24, 2014 at 06:17:45PM -0800, Ben Widawsky wrote: > > > > > > The previous check during error capture of whether or not the current VM > > > > > > should be scanned used, gen < 7. That was more or less trying to > > > > > > determine if there was a full PPGTT. At the time, this was sort of what > > > > > > I meant to do because I was more interested in working backwards from > > > > > > hardware state. However, this is incorrect because it will not include > > > > > > platforms that are greater than gen7, and not having PPGTT. Example > > > > > > would be BYT which is gen7 but doesn't have PPGTT, BDW, or any platform > > > > > > greater than gen7 with the PPGTT module parameter invoked. > > > > > > > > > > > > I am /assuming/ BYT was broken, I have not actually checked. > > > > > > > > > > > > While here, clean up the file a bit to avoid duplicate reads (now that > > > > > > the PPGTT info is in the error state). > > > > > > > > > > > > I think Mika/Chris may have been looking at this too. > > > > > > > > > > Sure, we are looking (for identifying the guilty request/batch) by using > > > > > the older, simpler mechanism of finding the first incomplete request. I > > > > > think that search is now definite since we preallocate the request and no > > > > > longer do request collascing if ENOMEM (i.e. there is a 1:1 relationship > > > > > between seqno/batch/request). > > > > > > > > > > That should also apply here and be much simpler. > > > > > > > > How does that solve hangs which aren't caused by requests? > > > > > > Was that an intentional rhetorical question? > > > > > > The code you touch here only deals with requests - finding the current > > > batchbuffer if any. > > > -Chris > > > > > > > It wasn't rhetorical. I temporarily ignored that all batches are tied to > > a request. > > > > So what's the plan now? Just looking at the callers, we seem to have a > > couple of callers that can't easily identify the bad request. > > I was thinking along the lines of: > > @@ -737,31 +709,16 @@ i915_error_first_batchbuffer(struct drm_i915_private *dev_priv, > } > > seqno = ring->get_seqno(ring, false); > - list_for_each_entry(vm, &dev_priv->vm_list, global_link) { > - if (!is_active_vm(vm, ring)) > + list_for_each_entry(request, &ring->request_list, list) { > + if (i915_seqno_passed(seqno, request->seqno)) > continue; > > - found_active = true; > - > - list_for_each_entry(vma, &vm->active_list, mm_list) { > - obj = vma->obj; > - if (obj->ring != ring) > - continue; > - > - if (i915_seqno_passed(seqno, obj->last_read_seqno)) > - continue; > - > - if ((obj->base.read_domains & I915_GEM_DOMAIN_COMMAND) == 0) > - continue; > - > - /* We need to copy these to an anonymous buffer as the simplest > - * method to avoid being overwritten by userspace. > - */ > - return i915_error_object_create(dev_priv, obj, vm); > - } > + /* We need to copy these to an anonymous buffer as the simplest > + * method to avoid being overwritten by userspace. > + */ > + return i915_error_object_create(dev_priv, request->batch_obj, request->ctx->vm); > } > > - WARN_ON(!found_active); > The other issue is the existing method doesn't rely as much on proper request handling, ie. this could be more resilient to driver bugs. I kind of want to keep both... -- Ben Widawsky, Intel Open Source Technology Center