From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset Date: Fri, 24 Oct 2014 17:39:59 +0300 Message-ID: <20141024143959.GF4284@intel.com> References: <1401215590-24141-1-git-send-email-ville.syrjala@linux.intel.com> <20141024142850.GC26941@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTP id F22106E5DF for ; Fri, 24 Oct 2014 07:40:02 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20141024142850.GC26941@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Daniel Vetter Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Fri, Oct 24, 2014 at 04:28:50PM +0200, Daniel Vetter wrote: > On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com w= rote: > > From: Ville Syrj=E4l=E4 > > = > > During a GPU reset we need to get pending page flip cleared out > > since the ring contents are gone and flip will never complete > > on its own. This used to work until the mmio vs. CS flip race > > detection came about. That piece of code is looking for a > > specific surface address in the SURFLIVE register, but as > > a flip to that address may never happen the check may never > > pass. So we should just skip the SURFLIVE and flip counter > > checks when the GPU gets reset. > > = > > intel_display_handle_reset() tries to effectively complete > > the flip anyway by calling .update_primary_plane(). But that > > may not satisfy the conditions of the mmio vs. CS race > > detection since there's no guarantee that a modeset didn't > > sneak in between the GPU reset and intel_display_handle_reset(). Such a modeset will not wait for pending flips due to the ongoing GPU reset, and then the primary plane updates ... > > When that happens the primary plane updates performed by > > intel_display_handle_reset() will already use the new surface > > address, and thus the surface address the flip is waiting for > > might never appear in SURFLIVE. The result is that the flip > > will never complete and attempts to perform further page flips > > will fail with -EBUSY. > > = > > During the GPU reset intel_crtc_has_pending_flip() will return > > false regardless, so the deadlock with a modeset vs. the error > > work acquiring crtc->mutex was avoided. And the reset_counter > > check in intel_crtc_has_pending_flip() actually made this bug > > even less severe since it allowed normal modesets to go through > > even though there's a pending flip. > > = > > This is a regression introduced by me here: > > commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9 > > Author: Ville Syrj=E4l=E4 > > Date: Tue Apr 15 21:41:34 2014 +0300 > > = > > drm/i915: Fix mmio vs. CS flip race on ILK+ > > = > > Testcase: igt/kms_flip/flip-vs-panning-vs-hang > > Signed-off-by: Ville Syrj=E4l=E4 > = > So I wondered first how we'll get there since the set_base code waits for > pageflips before updating anything. Except it doesn't when there's a > pending gpu reset. I don't think we need Chris' additional paranoi since > we must complete the pageflip anyway, and we do it with hacks when there's > a gpu reset. So if the commit message gets augmented with some note that > the pageflip wait in set_base is short-circuited for resets this is > = > Reviewed-by: Daniel Vetter > Cc: stable@vger.kernel.org > = > Cheers, Daniel > = > > --- > > drivers/gpu/drm/i915/intel_display.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > = > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i91= 5/intel_display.c > > index 94ac51f..cb9dd8e 100644 > > --- a/drivers/gpu/drm/i915/intel_display.c > > +++ b/drivers/gpu/drm/i915/intel_display.c > > @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc= *crtc) > > struct drm_device *dev =3D crtc->base.dev; > > struct drm_i915_private *dev_priv =3D dev->dev_private; > > = > > + if (i915_reset_in_progress(&dev_priv->gpu_error) || > > + crtc->reset_counter !=3D atomic_read(&dev_priv->gpu_error.reset_c= ounter)) > > + return true; > > + > > /* > > * The relevant registers doen't exist on pre-ctg. > > * As the flip done interrupt doesn't trigger for mmio > > -- = > > 1.8.5.5 > > = > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > = > -- = > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- = Ville Syrj=E4l=E4 Intel OTC