From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mika Kuoppala Subject: Re: [PATCH] drm/i915: Try harder to reset the gen8+ engines Date: Tue, 06 Sep 2016 16:25:11 +0300 Message-ID: <87h99t88js.fsf@gaia.fi.intel.com> References: <20160905093015.21820-1-chris@chris-wilson.co.uk> <87oa418eq4.fsf@gaia.fi.intel.com> <20160906112533.GA16682@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id BEC426E2CA for ; Tue, 6 Sep 2016 13:27:08 +0000 (UTC) In-Reply-To: <20160906112533.GA16682@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson Cc: Daniel Vetter , intel-gfx@lists.freedesktop.org, stable@vger.kernel.org List-Id: intel-gfx@lists.freedesktop.org Q2hyaXMgV2lsc29uIDxjaHJpc0BjaHJpcy13aWxzb24uY28udWs+IHdyaXRlczoKCj4gT24gVHVl LCBTZXAgMDYsIDIwMTYgYXQgMDI6MTE6NDdQTSArMDMwMCwgTWlrYSBLdW9wcGFsYSB3cm90ZToK Pj4gCj4+IFJlc2VuZGluZyBteSByLWIuLi4KPgo+IEl0J3Mgbm90IGVub3VnaCwgZXZlbiByZXRy eWluZyB0aGUgcmVzZXQgYSBmZXcgdGltZXMsIHdlIHN0aWxsCj4gZXZlbnR1YWxseSBnZXQgYSB0 aW1lb3V0Lgo+Cj4gVGhpcyBpcyBqdXN0IHRoZSB1c3VhbAo+IAkxMDogMHhmZmZmZmZmZgo+IAky MDogZ290byAxMAo+IGhhbmdpbmcgYmF0Y2gKPj4gCj4+IENocmlzIFdpbHNvbiA8Y2hyaXNAY2hy aXMtd2lsc29uLmNvLnVrPiB3cml0ZXM6Cj4+IAo+PiA+IElmIGF0IGZpcnN0IHdlIGRvbid0IHN1 Y2NlZWQsIHRyeSBhZ2Fpbi4KPj4gPgo+PiA+IFJ1bm5pbmcgdGhlIHJlc2V0IGFuZCByZWNvdmVy eSByb3V0aW5lcyBpbiBhIGxvb3AgZW5kcyBpbiBhICJyZXNldAo+PiA+IHJlcXVlc3QgdGltZW91 dCIgd2l0aCBhIG10YmYgb2YgYW4gaG91ciBvbiBCcmFzd2VsbC4gVGhpcyBpcyBlZXJpbHkKPj4g PiBzaW1pbGFyIHRvIHRoZSB1bnJlY292ZXJhYmxlIHJlc2V0IGNvbmRpdGlvbiB0aGF0IGZpcnN0 IHByb21wdGVkIHVzIHRvCj4+ID4gdXNlIHRoZSByZXNldC1yZXF1ZXN0IG1lY2hhbmlzbSBpbiBj b21taXQgN2ZkMmQyNjkyMWQxICgiZHJtL2k5MTU6IFJlc2V0Cj4+ID4gcmVxdWVzdCBoYW5kbGlu ZyBmb3IgZ2VuOCsiKS4gUmVwZWF0aW5nIHRoZSByZXNldCByZXF1ZXN0IG1ha2VzIHRoZQo+PiA+ IGZhaWx1cmUgbXVjaCBoYXJkZXIgdG8gcmVwcm9kdWNlIChidXQgdGhlcmUgaXMgbm8gcmVhc29u IHRvIGJlbGlldmUgdGhhdAo+PiA+IGl0IGlzIG1vcmUgdGhhbiBtZXJlIHBhcGVyIG92ZXIgYSB0 aW1pbmcgb3Igb3RoZXIgaXNzdWUpLgo+PiA+Cj4+ID4gU2lnbmVkLW9mZi1ieTogQ2hyaXMgV2ls c29uIDxjaHJpc0BjaHJpcy13aWxzb24uY28udWs+Cj4+ID4gQ2M6IE1pa2EgS3VvcHBhbGEgPG1p a2Eua3VvcHBhbGFAaW50ZWwuY29tPgo+PiA+IENjOiBBcnVuIFNpbHV2ZXJ5IDxhcnVuLnNpbHV2 ZXJ5QGxpbnV4LmludGVsLmNvbT4KPj4gPiBDYzogRGFuaWVsIFZldHRlciA8ZGFuaWVsLnZldHRl ckBmZndsbC5jaD4KPj4gPiBDYzogc3RhYmxlQHZnZXIua2VybmVsLm9yZwo+PiA+IC0tLQo+PiA+ ICBkcml2ZXJzL2dwdS9kcm0vaTkxNS9pbnRlbF91bmNvcmUuYyB8IDI0ICsrKysrKysrKysrKyst LS0tLS0tLS0tLQo+PiA+ICAxIGZpbGUgY2hhbmdlZCwgMTMgaW5zZXJ0aW9ucygrKSwgMTEgZGVs ZXRpb25zKC0pCj4+ID4KPj4gPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9ncHUvZHJtL2k5MTUvaW50 ZWxfdW5jb3JlLmMgYi9kcml2ZXJzL2dwdS9kcm0vaTkxNS9pbnRlbF91bmNvcmUuYwo+PiA+IGlu ZGV4IGU5ZjY4Y2Q1NmUzMi4uMWJlOGNlZDAzYmE1IDEwMDY0NAo+PiA+IC0tLSBhL2RyaXZlcnMv Z3B1L2RybS9pOTE1L2ludGVsX3VuY29yZS5jCj4+ID4gKysrIGIvZHJpdmVycy9ncHUvZHJtL2k5 MTUvaW50ZWxfdW5jb3JlLmMKPj4gPiBAQCAtMTY4OCwyMCArMTY4OCwyMiBAQCBpbnQgaW50ZWxf d2FpdF9mb3JfcmVnaXN0ZXIoc3RydWN0IGRybV9pOTE1X3ByaXZhdGUgKmRldl9wcml2LAo+PiA+ ICBzdGF0aWMgaW50IGdlbjhfcmVxdWVzdF9lbmdpbmVfcmVzZXQoc3RydWN0IGludGVsX2VuZ2lu ZV9jcyAqZW5naW5lKQo+PiA+ICB7Cj4+ID4gIAlzdHJ1Y3QgZHJtX2k5MTVfcHJpdmF0ZSAqZGV2 X3ByaXYgPSBlbmdpbmUtPmk5MTU7Cj4+ID4gLQlpbnQgcmV0Owo+PiA+ICsJaW50IGxvb3AgPSAz Owo+PiA+Cj4+IAo+PiByZXRyaWVzPwo+PiAKPj4gPiAtCUk5MTVfV1JJVEVfRlcoUklOR19SRVNF VF9DVEwoZW5naW5lLT5tbWlvX2Jhc2UpLAo+PiA+IC0JCSAgICAgIF9NQVNLRURfQklUX0VOQUJM RShSRVNFVF9DVExfUkVRVUVTVF9SRVNFVCkpOwo+PiA+ICsJZG8gewo+PiA+ICsJCUk5MTVfV1JJ VEVfRlcoUklOR19SRVNFVF9DVEwoZW5naW5lLT5tbWlvX2Jhc2UpLAo+PiA+ICsJCQkgICAgICBf TUFTS0VEX0JJVF9FTkFCTEUoUkVTRVRfQ1RMX1JFUVVFU1RfUkVTRVQpKTsKPj4gPiAgCj4+ID4g LQlyZXQgPSBpbnRlbF93YWl0X2Zvcl9yZWdpc3Rlcl9mdyhkZXZfcHJpdiwKPj4gPiAtCQkJCQkg UklOR19SRVNFVF9DVEwoZW5naW5lLT5tbWlvX2Jhc2UpLAo+PiA+IC0JCQkJCSBSRVNFVF9DVExf UkVBRFlfVE9fUkVTRVQsCj4+ID4gLQkJCQkJIFJFU0VUX0NUTF9SRUFEWV9UT19SRVNFVCwKPj4g PiAtCQkJCQkgNzAwKTsKPj4gPiAtCWlmIChyZXQpCj4+ID4gLQkJRFJNX0VSUk9SKCIlczogcmVz ZXQgcmVxdWVzdCB0aW1lb3V0XG4iLCBlbmdpbmUtPm5hbWUpOwo+PiA+ICsJCWlmICghaW50ZWxf d2FpdF9mb3JfcmVnaXN0ZXJfZncoZGV2X3ByaXYsCj4+ID4gKwkJCQkJCVJJTkdfUkVTRVRfQ1RM KGVuZ2luZS0+bW1pb19iYXNlKSwKPj4gPiArCQkJCQkJUkVTRVRfQ1RMX1JFQURZX1RPX1JFU0VU LAo+PiA+ICsJCQkJCQlSRVNFVF9DVExfUkVBRFlfVE9fUkVTRVQsCj4+ID4gKwkJCQkJCTcwMCkp Cj4+IAo+PiAKPj4gRGlkIHlvdSBjaGVjayBiZXR3ZWVuIHRoZSB3cml0ZSBkaWRuJ3Qgc3RpY2sg dnMgdGhlIHJlYWR5bmVzcyBkaWRuJ3QKPj4gc2lnbmFsPwo+Cj4gVGhlIHJlYWQgd2lsbCBmbHVz aCB0aGUgd3JpdGUuIFNvIHJlYWRpbmcgYmFjayB0aGUgYml0IHdlIGp1c3Qgc2V0IHNob3dzCj4g dXMgdGhhdCB0aGUgaHcgaXMgbm90IHlldCByZWFkeS4KPgo+IFNob3VsZG4ndCB3ZSBhbHNvIGJl IHdhaXRpbmcgZm9yIHRoZSByZWVzdCBiaXQgdG8gY2xlYXIgb24gY2FuY2VsPyBBbmQKPiB2ZXJp ZnlpbmcgdGhhdCB0aGUgcmVzZXQgaXRzZWxmIGRpZD8KPgoKUmVxdWVzdCAtPiB0aW1lb3V0IC0+ IGNhbmNlbCBhbmQgd2FpdCAyIGxvd2VzdCBiaXQgdG8gYmUgemVybyBhbmQKdGhlbiBvbmx5IGFm dGVyIHRoYXQgcmV0cnk/CgpBbmQgd291bGQgbm90IGh1cnQgdG8gY2hlY2sgdGhleSBhcmUgemVy byBiZWZvcmUgY29udGludWluZyBhZnRlcgphIHN1Y2Nlc2Z1bCByZXNldCBhbHNvLgoKLU1pa2EK Cj4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvZ3B1L2RybS9pOTE1L2ludGVsX3VuY29yZS5jIGIvZHJp dmVycy9ncHUvZHJtL2k5MTUvaW50ZWxfdW5jb3JlLmMKPiBpbmRleCA2Yjg0YmQ2MzMxMGMuLjM2 ZDlhNDg1Yjc4OCAxMDA2NDQKPiAtLS0gYS9kcml2ZXJzL2dwdS9kcm0vaTkxNS9pbnRlbF91bmNv cmUuYwo+ICsrKyBiL2RyaXZlcnMvZ3B1L2RybS9pOTE1L2ludGVsX3VuY29yZS5jCj4gQEAgLTE3 MDEsNiArMTcwMSw5IEBAIHN0YXRpYyB2b2lkIGdlbjhfdW5yZXF1ZXN0X2VuZ2luZV9yZXNldChz dHJ1Y3QgaW50ZWxfZW5naW5lX2NzICplbmdpbmUpCj4gIAo+ICAgICAgICAgSTkxNV9XUklURV9G VyhSSU5HX1JFU0VUX0NUTChlbmdpbmUtPm1taW9fYmFzZSksCj4gICAgICAgICAgICAgICAgICAg ICAgIF9NQVNLRURfQklUX0RJU0FCTEUoUkVTRVRfQ1RMX1JFUVVFU1RfUkVTRVQpKTsKPiArICAg ICAgIGludGVsX3dhaXRfZm9yX3JlZ2lzdGVyX2Z3KGRldl9wcml2LAo+ICsgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgUklOR19SRVNFVF9DVEwoZW5naW5lLT5tbWlvX2Jhc2UpLAo+ ICsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgUkVTRVRfQ1RMX1JFQURZX1RPX1JF U0VULCAwLCA3MDApOwo+ICB9Cj4gIAo+ICBzdGF0aWMgaW50IGdlbjhfcmVzZXRfZW5naW5lcyhz dHJ1Y3QgZHJtX2k5MTVfcHJpdmF0ZSAqZGV2X3ByaXYsCj4gQEAgLTE3MDgsMTggKzE3MTEsMTgg QEAgc3RhdGljIGludCBnZW44X3Jlc2V0X2VuZ2luZXMoc3RydWN0IGRybV9pOTE1X3ByaXZhdGUg KmRldl9wcml2LAo+ICB7Cj4gICAgICAgICBzdHJ1Y3QgaW50ZWxfZW5naW5lX2NzICplbmdpbmU7 Cj4gICAgICAgICB1bnNpZ25lZCBpbnQgdG1wOwo+ICsgICAgICAgaW50IHJldCA9IC1FSU87Cj4g IAo+ICAgICAgICAgZm9yX2VhY2hfZW5naW5lX21hc2tlZChlbmdpbmUsIGRldl9wcml2LCBlbmdp bmVfbWFzaywgdG1wKQo+ICAgICAgICAgICAgICAgICBpZiAoZ2VuOF9yZXF1ZXN0X2VuZ2luZV9y ZXNldChlbmdpbmUpKQo+ICAgICAgICAgICAgICAgICAgICAgICAgIGdvdG8gbm90X3JlYWR5Owo+ ICAKPiAtICAgICAgIHJldHVybiBnZW42X3Jlc2V0X2VuZ2luZXMoZGV2X3ByaXYsIGVuZ2luZV9t YXNrKTsKPiArICAgICAgIHJldCA9IGdlbjZfcmVzZXRfZW5naW5lcyhkZXZfcHJpdiwgZW5naW5l X21hc2spOwo+ICAKPiAtbm90X3JlYWR5Ogo+ICAgICAgICAgZm9yX2VhY2hfZW5naW5lX21hc2tl ZChlbmdpbmUsIGRldl9wcml2LCBlbmdpbmVfbWFzaywgdG1wKQo+ICAgICAgICAgICAgICAgICBn ZW44X3VucmVxdWVzdF9lbmdpbmVfcmVzZXQoZW5naW5lKTsKPiAgCj4gLSAgICAgICByZXR1cm4g LUVJTzsKPiArICAgICAgIHJldHVybiByZXQ7Cj4gIH0KPiAgCj4gIHR5cGVkZWYgaW50ICgqcmVz ZXRfZnVuYykoc3RydWN0IGRybV9pOTE1X3ByaXZhdGUgKiwgdW5zaWduZWQgZW5naW5lX21hc2sp Owo+Cj4gLS0gCj4gQ2hyaXMgV2lsc29uLCBJbnRlbCBPcGVuIFNvdXJjZSBUZWNobm9sb2d5IENl bnRyZQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpJbnRl bC1nZnggbWFpbGluZyBsaXN0CkludGVsLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6 Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com ([134.134.136.100]:53524 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934863AbcIFN1T (ORCPT ); Tue, 6 Sep 2016 09:27:19 -0400 From: Mika Kuoppala To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, Arun Siluvery , Daniel Vetter , stable@vger.kernel.org Subject: Re: [PATCH] drm/i915: Try harder to reset the gen8+ engines In-Reply-To: <20160906112533.GA16682@nuc-i3427.alporthouse.com> References: <20160905093015.21820-1-chris@chris-wilson.co.uk> <87oa418eq4.fsf@gaia.fi.intel.com> <20160906112533.GA16682@nuc-i3427.alporthouse.com> Date: Tue, 06 Sep 2016 16:25:11 +0300 Message-ID: <87h99t88js.fsf@gaia.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain Sender: stable-owner@vger.kernel.org List-ID: Chris Wilson writes: > On Tue, Sep 06, 2016 at 02:11:47PM +0300, Mika Kuoppala wrote: >> >> Resending my r-b... > > It's not enough, even retrying the reset a few times, we still > eventually get a timeout. > > This is just the usual > 10: 0xffffffff > 20: goto 10 > hanging batch >> >> Chris Wilson writes: >> >> > If at first we don't succeed, try again. >> > >> > Running the reset and recovery routines in a loop ends in a "reset >> > request timeout" with a mtbf of an hour on Braswell. This is eerily >> > similar to the unrecoverable reset condition that first prompted us to >> > use the reset-request mechanism in commit 7fd2d26921d1 ("drm/i915: Reset >> > request handling for gen8+"). Repeating the reset request makes the >> > failure much harder to reproduce (but there is no reason to believe that >> > it is more than mere paper over a timing or other issue). >> > >> > Signed-off-by: Chris Wilson >> > Cc: Mika Kuoppala >> > Cc: Arun Siluvery >> > Cc: Daniel Vetter >> > Cc: stable@vger.kernel.org >> > --- >> > drivers/gpu/drm/i915/intel_uncore.c | 24 +++++++++++++----------- >> > 1 file changed, 13 insertions(+), 11 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c >> > index e9f68cd56e32..1be8ced03ba5 100644 >> > --- a/drivers/gpu/drm/i915/intel_uncore.c >> > +++ b/drivers/gpu/drm/i915/intel_uncore.c >> > @@ -1688,20 +1688,22 @@ int intel_wait_for_register(struct drm_i915_private *dev_priv, >> > static int gen8_request_engine_reset(struct intel_engine_cs *engine) >> > { >> > struct drm_i915_private *dev_priv = engine->i915; >> > - int ret; >> > + int loop = 3; >> > >> >> retries? >> >> > - I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base), >> > - _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET)); >> > + do { >> > + I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base), >> > + _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET)); >> > >> > - ret = intel_wait_for_register_fw(dev_priv, >> > - RING_RESET_CTL(engine->mmio_base), >> > - RESET_CTL_READY_TO_RESET, >> > - RESET_CTL_READY_TO_RESET, >> > - 700); >> > - if (ret) >> > - DRM_ERROR("%s: reset request timeout\n", engine->name); >> > + if (!intel_wait_for_register_fw(dev_priv, >> > + RING_RESET_CTL(engine->mmio_base), >> > + RESET_CTL_READY_TO_RESET, >> > + RESET_CTL_READY_TO_RESET, >> > + 700)) >> >> >> Did you check between the write didn't stick vs the readyness didn't >> signal? > > The read will flush the write. So reading back the bit we just set shows > us that the hw is not yet ready. > > Shouldn't we also be waiting for the reest bit to clear on cancel? And > verifying that the reset itself did? > Request -> timeout -> cancel and wait 2 lowest bit to be zero and then only after that retry? And would not hurt to check they are zero before continuing after a succesful reset also. -Mika > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 6b84bd63310c..36d9a485b788 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1701,6 +1701,9 @@ static void gen8_unrequest_engine_reset(struct intel_engine_cs *engine) > > I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base), > _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET)); > + intel_wait_for_register_fw(dev_priv, > + RING_RESET_CTL(engine->mmio_base), > + RESET_CTL_READY_TO_RESET, 0, 700); > } > > static int gen8_reset_engines(struct drm_i915_private *dev_priv, > @@ -1708,18 +1711,18 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv, > { > struct intel_engine_cs *engine; > unsigned int tmp; > + int ret = -EIO; > > for_each_engine_masked(engine, dev_priv, engine_mask, tmp) > if (gen8_request_engine_reset(engine)) > goto not_ready; > > - return gen6_reset_engines(dev_priv, engine_mask); > + ret = gen6_reset_engines(dev_priv, engine_mask); > > -not_ready: > for_each_engine_masked(engine, dev_priv, engine_mask, tmp) > gen8_unrequest_engine_reset(engine); > > - return -EIO; > + return ret; > } > > typedef int (*reset_func)(struct drm_i915_private *, unsigned engine_mask); > > -- > Chris Wilson, Intel Open Source Technology Centre