From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore Date: Thu, 29 Jun 2017 22:26:08 +0300 Message-ID: <20170629192608.GF12629@intel.com> References: <20170629134948.5614-6-ville.syrjala@linux.intel.com> <20170629143642.28221-1-ville.syrjala@linux.intel.com> <149875905040.25198.11560401593791605431@mail.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <149875905040.25198.11560401593791605431@mail.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Chris Wilson Cc: Daniel Vetter , intel-gfx@lists.freedesktop.org, stable@vger.kernel.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org T24gVGh1LCBKdW4gMjksIDIwMTcgYXQgMDY6NTc6MzBQTSArMDEwMCwgQ2hyaXMgV2lsc29uIHdy b3RlOgo+IFF1b3RpbmcgdmlsbGUuc3lyamFsYUBsaW51eC5pbnRlbC5jb20gKDIwMTctMDYtMjkg MTU6MzY6NDIpCj4gPiBGcm9tOiBWaWxsZSBTeXJqw6Rsw6QgPHZpbGxlLnN5cmphbGFAbGludXgu aW50ZWwuY29tPgo+ID4gCj4gPiBJbnRyb2R1Y2UgYW4gcndfc2VtYXBob3JlIHRvIHByb3RlY3Qg dGhlIGRpc3BsYXkgY29tbWl0cy4gQWxsIG5vcm1hbAo+ID4gY29tbWl0cyB1c2UgZG93bl9yZWFk KCkgYW5kIGhlbmNlIGNhbiBwcm9jZWVkIGluIHBhcmFsbGVsLCBidXQgR1BVIHJlc2V0Cj4gPiB3 aWxsIHVzZSBkb3duX3dyaXRlKCkgbWFraW5nIHN1cmUgbm8gb3RoZXIgY29tbWl0cyBhcmUgaW4g cHJvZ3Jlc3Mgd2hlbgo+ID4gd2UgaGF2ZSB0byBwdWxsIHRoZSBwbHVnIG9uIHRoZSBkaXNwbGF5 IGVuZ2luZSBvbiBwcmUtZzR4IHBsYXRmb3Jtcy4KPiA+IFRoZXJlIGFyZSBubyBtb2Rlc2V0L2dl bSBsb2NrcyB0YWtlbiBpbnNpZGUgX19pbnRlbF9hdG9taWNfY29tbWl0X3RhaWwoKQo+ID4gaXRz ZWxmLCBhbmQgd2Ugd2FpdCBmb3IgYWxsIGRlcGVuZGVuY2llcyBiZWZvcmUgdGhlIGRvd25fcmVh ZCgpLCBhbmQKPiA+IHRodXMgdGhlcmUgaXMgbm8gY2hhbmNlIG9mIGRlYWRsb2NrcyB3aXRoIHRo aXMgc2NoZW1lLgo+IAo+IFRoaXMgbWF0Y2hlcyB3aGF0IEkgdGhvdWdodCBzaG91bGQgYmUgZG9u ZSAoSSBkaWRuJ3QgdGhpbmsgb2YgdXNpbmcKPiByd3NlbSBqdXN0IGEgbXV0ZXgsIG5pY2UgdG91 Y2gpLiBUaGUgcG9pbnQgSSBnb3Qgc3R1Y2sgb24gd2FzIHdoYXQKPiBzaG91bGQgYmUgZG9uZSBh ZnRlciB0aGUgcmVzZXQ/IEkgZXhwZWN0ZWQgYW5vdGhlciBtb2Rlc2V0IHRvIHJldHVybiB0aGUK PiBzdGF0ZSBiYWNrIG9yIG90aGVyd2lzZSB0aGUgaW5mbGlnaHQgd291bGQgZ2V0IGNvbmZ1c2Vk PwoKSSBndWVzcyB0aGF0IGNhbiBoYXBwZW4uIEZvciBpbnN0YW5jZSwgaWYgd2UgaGF2ZSBhIGNy dGNfZW5hYmxlKCkgaW4gZmxpZ2h0LAphbmQgdGhlbiB3ZSBkbyBhIHJlc2V0IGJlZm9yZSBpdCBn ZXRzIGNvbW1pdHRlZCB3ZSB3b3VsZCBlbmQgdXAgZG9pbmcKY3J0Y19lbmFibGUoKSB0d2ljZSBp biBhIHJvdyB3aXRob3V0IGEgY3J0Y19kaXNhYmxlIGluIGJldHdlZW4uIEZvciBwYWdlCmZsaXBz IGFuZCBzdWNoIHRoaXMgc2hvdWxkbid0IGJlIGEgYmlnIGRlYWwgaW4gZ2VuZXJhbC4KCj4gIAo+ ID4gRHVyaW5nIHJlc2V0IHdlIHNob3VsZCBiZSByZWNvbW1pdGluZyB0aGUgc3RhdGUgdGhhdCB3 YXMgY29tbWl0dGVkIGxhc3QuCj4gPiBCdXQgZm9yIG5vdyB3ZSdsbCBzZXR0bGUgZm9yIHJlY29t bWl0aW5nIHRoZSBsYXN0IHN0YXRlIGZvciBlYWNoIG9iamVjdC4KPiAKPiBBaCwgSSBndWVzcyB0 aGF0IGV4cGxhaW5zIHRoZSBhYm92ZS4gV2hhdCBpcyB0aGUgY29tcGxpY2F0aW9uIHdpdGgKPiBy ZXN0b3JpbmcgdGhlIGN1cnJlbnQgc3RhdGUgYXMgb3Bwb3NlZCB0byB0aGUgbmV4dCBzdGF0ZT8K CldlbGwgdGhlIG1haW4gdGhpbmcgaXMganVzdCB0cmFja2luZyB3aGljaCBpcyB0aGUgY3VycmVu dCBzdGF0ZS4gVGhhdApqdXN0IG5lZWRzIHJlZmFjdG9yaW5nIHRoZSAuYXRvbWljX2R1cGxpY2F0 ZV9zdGF0ZSgpIGNhbGxpbmcgY29udmVudGlvbgphY3Jvc3MgdGhlIHdob2xlIHRyZWUgc28gdGhh dCB3ZSBjYW4gdGhlbiBkdXBsaWNhdGUgdGhlIGNvbW1pdHRlZCBzdGF0ZQpyYXRoZXIgdGhhbiB0 aGUgbGF0ZXN0IHN0YXRlLgoKQWxzbyBkdWUgdG8gdGhlIGNvbW1pdF9od19kb25lKCkgYmVpbmcg cG90ZW50aWFsbHkgZG9uZSBhZnRlciB0aGUKbW9kZXNldCBsb2NrcyBoYXZlIGJlZW4gZHJvcHBl ZCwgSSBkb24ndCB0aGluayB3ZSBjYW4gYmUgY2VydGFpbgpvZiBpdCBnZXR0aW5nIGNhbGxlZCBp biB0aGUgc2FtZSBvcmRlciBhcyBzd2FwX3N0YXRlKCksIGhlbmNlCndoZW4gd2UgdHJhY2sgdGhl IGNvbW1pdHRlZCBzdGF0ZSBpbiBjb21taXRfaHdfZG9uZSgpIHdlJ2xsIGhhdmUKdG8gaGF2ZSBz b21lIHdheSB0byBmaWd1cmUgb3V0IGlmIG91ciBuZXcgc3RhdGUgaXMgaW4gZmFjdCB0aGUKbGF0 ZXN0IGNvbW1pdHRlZCBzdGF0ZSBmb3IgZWFjaCBvYmplY3Qgb3IgaWYgdGhlIGNhbGxzIGdvdApy ZW9yZGVyZWQuIFdlIGRvbid0IGluc2VydCBhbnkgZGVwZW5kZW5jaWVzIGJldHdlZW4gdHdvIGNv bW1pdHMKdW5sZXNzIHRoZXkgdG91Y2ggdGhlIHNhbWUgYWN0aXZlIGNydGMsIHRodXMgdGhpcyBy ZW9yZGVyaW5nCnNlZW1zIHZlcnkgbXVjaCBwb3NzaWJsZS4gRHVubm8gaWYgd2Ugc2hvdWxkIGFk ZCBzb21lIHdheSB0byBhZGQKc3VjaCBkZXBlbmRlZW5jaWVzIHdoZW5ldmVyIHRoZSBzYW1lIG9i amVjdCBpcyBwYXJ0IG9mIHR3byBvdGhlcndpc2UKaW5kZXBlbmRlbnQgY29tbWl0cywgb3IgaWYg d2UganVzdCB3YW50IHRvIHRyeSBhbmQgd29yayB3aXRoIHRoZQpyZW9yZGVyZWQgY2FsbHMuIE15 IGlkZWEgZm9yIHRoZSBsYXR0ZXIgd2FzIHNvbWUga2luZCBvZiBzZXFuby9hZ2UKY291bnRlciBv biB0aGUgb2JqZWN0IHN0YXRlcyB0aGF0IGFsbG93cyBtZSB0byByZWNvbmduaXplIHdoaWNoCnN0 YXRlIGlzIG1vcmUgcmVjZW50LiBUaGUgb2JqZWN0IHN0YXRlcyBhcmVuJ3QgcmVmY291bnRlZCBz byBoYW5naW5nCm9uIHRvIHRoZSB3cm9uZyBwb2ludGVyIGNvdWxkIGNhdXNlIGFuIG9vcHMgdGhl IG5leHQgdGltZSB3ZSBoYXZlIHRvCnBlcmZvcm0gYSBHUFUgcmVzZXQuCgotLSAKVmlsbGUgU3ly asOkbMOkCkludGVsIE9UQwpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fXwpkcmktZGV2ZWwgbWFpbGluZyBsaXN0CmRyaS1kZXZlbEBsaXN0cy5mcmVlZGVza3Rv cC5vcmcKaHR0cHM6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmkt ZGV2ZWwK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com ([134.134.136.31]:58744 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751612AbdF2T0S (ORCPT ); Thu, 29 Jun 2017 15:26:18 -0400 Date: Thu, 29 Jun 2017 22:26:08 +0300 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, stable@vger.kernel.org, Daniel Vetter Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore Message-ID: <20170629192608.GF12629@intel.com> References: <20170629134948.5614-6-ville.syrjala@linux.intel.com> <20170629143642.28221-1-ville.syrjala@linux.intel.com> <149875905040.25198.11560401593791605431@mail.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <149875905040.25198.11560401593791605431@mail.alporthouse.com> Sender: stable-owner@vger.kernel.org List-ID: On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote: > Quoting ville.syrjala@linux.intel.com (2017-06-29 15:36:42) > > From: Ville Syrj�l� > > > > Introduce an rw_semaphore to protect the display commits. All normal > > commits use down_read() and hence can proceed in parallel, but GPU reset > > will use down_write() making sure no other commits are in progress when > > we have to pull the plug on the display engine on pre-g4x platforms. > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail() > > itself, and we wait for all dependencies before the down_read(), and > > thus there is no chance of deadlocks with this scheme. > > This matches what I thought should be done (I didn't think of using > rwsem just a mutex, nice touch). The point I got stuck on was what > should be done after the reset? I expected another modeset to return the > state back or otherwise the inflight would get confused? I guess that can happen. For instance, if we have a crtc_enable() in flight, and then we do a reset before it gets committed we would end up doing crtc_enable() twice in a row without a crtc_disable in between. For page flips and such this shouldn't be a big deal in general. > > > During reset we should be recommiting the state that was committed last. > > But for now we'll settle for recommiting the last state for each object. > > Ah, I guess that explains the above. What is the complication with > restoring the current state as opposed to the next state? Well the main thing is just tracking which is the current state. That just needs refactoring the .atomic_duplicate_state() calling convention across the whole tree so that we can then duplicate the committed state rather than the latest state. Also due to the commit_hw_done() being potentially done after the modeset locks have been dropped, I don't think we can be certain of it getting called in the same order as swap_state(), hence when we track the committed state in commit_hw_done() we'll have to have some way to figure out if our new state is in fact the latest committed state for each object or if the calls got reordered. We don't insert any dependencies between two commits unless they touch the same active crtc, thus this reordering seems very much possible. Dunno if we should add some way to add such dependeencies whenever the same object is part of two otherwise independent commits, or if we just want to try and work with the reordered calls. My idea for the latter was some kind of seqno/age counter on the object states that allows me to recongnize which state is more recent. The object states aren't refcounted so hanging on to the wrong pointer could cause an oops the next time we have to perform a GPU reset. -- Ville Syrj�l� Intel OTC