From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore Date: Fri, 30 Jun 2017 16:53:19 +0300 Message-ID: <20170630135319.GX12629@intel.com> References: <20170629134948.5614-6-ville.syrjala@linux.intel.com> <20170629143642.28221-1-ville.syrjala@linux.intel.com> <149875905040.25198.11560401593791605431@mail.alporthouse.com> <20170629192608.GF12629@intel.com> <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Daniel Vetter Cc: Daniel Vetter , intel-gfx@lists.freedesktop.org, stable@vger.kernel.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org T24gRnJpLCBKdW4gMzAsIDIwMTcgYXQgMDM6MzU6MDNQTSArMDIwMCwgRGFuaWVsIFZldHRlciB3 cm90ZToKPiBPbiBUaHUsIEp1biAyOSwgMjAxNyBhdCAxMDoyNjowOFBNICswMzAwLCBWaWxsZSBT eXJqw6Rsw6Qgd3JvdGU6Cj4gPiBPbiBUaHUsIEp1biAyOSwgMjAxNyBhdCAwNjo1NzozMFBNICsw MTAwLCBDaHJpcyBXaWxzb24gd3JvdGU6Cj4gPiA+IFF1b3RpbmcgdmlsbGUuc3lyamFsYUBsaW51 eC5pbnRlbC5jb20gKDIwMTctMDYtMjkgMTU6MzY6NDIpCj4gPiA+ID4gRnJvbTogVmlsbGUgU3ly asOkbMOkIDx2aWxsZS5zeXJqYWxhQGxpbnV4LmludGVsLmNvbT4KPiA+ID4gPiAKPiA+ID4gPiBJ bnRyb2R1Y2UgYW4gcndfc2VtYXBob3JlIHRvIHByb3RlY3QgdGhlIGRpc3BsYXkgY29tbWl0cy4g QWxsIG5vcm1hbAo+ID4gPiA+IGNvbW1pdHMgdXNlIGRvd25fcmVhZCgpIGFuZCBoZW5jZSBjYW4g cHJvY2VlZCBpbiBwYXJhbGxlbCwgYnV0IEdQVSByZXNldAo+ID4gPiA+IHdpbGwgdXNlIGRvd25f d3JpdGUoKSBtYWtpbmcgc3VyZSBubyBvdGhlciBjb21taXRzIGFyZSBpbiBwcm9ncmVzcyB3aGVu Cj4gPiA+ID4gd2UgaGF2ZSB0byBwdWxsIHRoZSBwbHVnIG9uIHRoZSBkaXNwbGF5IGVuZ2luZSBv biBwcmUtZzR4IHBsYXRmb3Jtcy4KPiA+ID4gPiBUaGVyZSBhcmUgbm8gbW9kZXNldC9nZW0gbG9j a3MgdGFrZW4gaW5zaWRlIF9faW50ZWxfYXRvbWljX2NvbW1pdF90YWlsKCkKPiA+ID4gPiBpdHNl bGYsIGFuZCB3ZSB3YWl0IGZvciBhbGwgZGVwZW5kZW5jaWVzIGJlZm9yZSB0aGUgZG93bl9yZWFk KCksIGFuZAo+ID4gPiA+IHRodXMgdGhlcmUgaXMgbm8gY2hhbmNlIG9mIGRlYWRsb2NrcyB3aXRo IHRoaXMgc2NoZW1lLgo+ID4gPiAKPiA+ID4gVGhpcyBtYXRjaGVzIHdoYXQgSSB0aG91Z2h0IHNo b3VsZCBiZSBkb25lIChJIGRpZG4ndCB0aGluayBvZiB1c2luZwo+ID4gPiByd3NlbSBqdXN0IGEg bXV0ZXgsIG5pY2UgdG91Y2gpLiBUaGUgcG9pbnQgSSBnb3Qgc3R1Y2sgb24gd2FzIHdoYXQKPiA+ ID4gc2hvdWxkIGJlIGRvbmUgYWZ0ZXIgdGhlIHJlc2V0PyBJIGV4cGVjdGVkIGFub3RoZXIgbW9k ZXNldCB0byByZXR1cm4gdGhlCj4gPiA+IHN0YXRlIGJhY2sgb3Igb3RoZXJ3aXNlIHRoZSBpbmZs aWdodCB3b3VsZCBnZXQgY29uZnVzZWQ/Cj4gPiAKPiA+IEkgZ3Vlc3MgdGhhdCBjYW4gaGFwcGVu LiBGb3IgaW5zdGFuY2UsIGlmIHdlIGhhdmUgYSBjcnRjX2VuYWJsZSgpIGluIGZsaWdodCwKPiA+ IGFuZCB0aGVuIHdlIGRvIGEgcmVzZXQgYmVmb3JlIGl0IGdldHMgY29tbWl0dGVkIHdlIHdvdWxk IGVuZCB1cCBkb2luZwo+ID4gY3J0Y19lbmFibGUoKSB0d2ljZSBpbiBhIHJvdyB3aXRob3V0IGEg Y3J0Y19kaXNhYmxlIGluIGJldHdlZW4uIEZvciBwYWdlCj4gPiBmbGlwcyBhbmQgc3VjaCB0aGlz IHNob3VsZG4ndCBiZSBhIGJpZyBkZWFsIGluIGdlbmVyYWwuCj4gCj4gYXRvbWljIGNvbW1pdHMg YXJlIG9yZGVyZWQuIFlvdSBoYXZlIHRvIHdhaXQgZm9yIHRoZSBwcmV2aW91cyBvbmVzIHRvCj4g Y29tcGxldGUgYmVmb3JlIHlvdSBkbyBhIG5ldyBvbmUuIElmIHlvdSBkb24ndCBkbyB0aGF0LCB0 aGVuIGFsbCBoZWxsCj4gYnJlYWtzIGxvb3NlLgoKV2hhdCB3ZSdyZSBlZmZlY3RpdmVseSBkb2lu ZyBoZXJlIGlzIGluc2VydGluZyB0d28gbmV3IGNvbW1pdHMgaW4KdGhlIG1pZGRsZSBvZiB0aGUg dGltZWxpbmUsIG9uZSB0byBkaXNhYmxlIGV2ZXJ5dGhpbmcsIGFuZCBhbm90aGVyCm9uZSB0byBy ZS1lbmFibGUgZXZlcnl0aGluZy4gQXQgdGhlIGVuZCBvZiB0aGUgdGhlIHJlLWVuYWJsZSB3ZSB3 b3VsZAp3YW50IHRoZSBoYXJkd2FyZSBzdGF0ZSBzaG91bGQgbWF0Y2ggZXhhY3RseSB3aGF0IHdh cyB0aGVyZSBiZWZvcmUKdGhlIGRpc2FibGUsIGhlbmNlIGFueSBjb21taXRzIHN0aWxsIGluIHRo ZSB0aW1lbGluZSBzaG91bGQgd29yawpjb3JyZWN0bHkuIFRoYXQgaXMsIHRoZWlyIG9sZF9zdGF0 ZSB3aWxsIG1hdGNoIHRoZSBoYXJkd2FyZSBzdGF0ZQp3aGVuIGl0IGNvbWVzIHRpbWUgdG8gY29t bWl0IHRoZW0uCgpCdXQgd2UgY2FuIG9ubHkgZG8gdGhhdCBwcm9wZXJseSBhZnRlciB3ZSBzdGFy dCB0byB0cmFjayB0aGUgY29tbWl0dGVkCnN0YXRlLiBXaXRob3V0IHRoYXQgdHJhY2tpbmcgd2Ug Y2FuIGdldCBpbnRvIHRyb3VibGUgd3J0LiB0aGUKaGFyZHdhcmUgc3RhdGUgbm90IG1hdGNoaW5n IHRoZSBvbGQgc3RhdGUgd2hlbiBpdCBjb21lcyB0aW1lIHRvCmNvbW1pdCB0aGUgbmV3IHN0YXRl LgoKPiAKPiBXaGF0IHlvdSByZWFsbHkgY2FuJ3QgZG8gd2l0aCBhdG9taWMgKHdpdGhvdXQgcmV3 cml0aW5nIGV2ZXJ5dGhpbmcgb25jZQo+IG1vcmUpIGlzIGNhbmNlbCBhIGNvbW1pdC4gUHJlLWF0 b21pYyB3ZSBjb3VsZCBkbyB0aGF0IG9uIGdlbjQgc2luY2UgdGhlCj4gbW1pbyBmbGlwcyBkaWVk IHdpdGggdGhlIGdwdSwgYnV0IHRoYXQncyB0aGUgb25lIGRlc2lnbiBjaGFuZ2Ugd2UgbmVlZCB0 bwo+IGNvcGUgd2l0aCAocGx1cyBURFIgaW5zaXN0aW5nIGl0IGNhbid0IGZvcmNlLWNvbXBsZXRl IHJlcXVlc3RzIGFueW1vcmUpLgo+IAo+ID4gPiA+IER1cmluZyByZXNldCB3ZSBzaG91bGQgYmUg cmVjb21taXRpbmcgdGhlIHN0YXRlIHRoYXQgd2FzIGNvbW1pdHRlZCBsYXN0Lgo+ID4gPiA+IEJ1 dCBmb3Igbm93IHdlJ2xsIHNldHRsZSBmb3IgcmVjb21taXRpbmcgdGhlIGxhc3Qgc3RhdGUgZm9y IGVhY2ggb2JqZWN0Lgo+ID4gPiAKPiA+ID4gQWgsIEkgZ3Vlc3MgdGhhdCBleHBsYWlucyB0aGUg YWJvdmUuIFdoYXQgaXMgdGhlIGNvbXBsaWNhdGlvbiB3aXRoCj4gPiA+IHJlc3RvcmluZyB0aGUg Y3VycmVudCBzdGF0ZSBhcyBvcHBvc2VkIHRvIHRoZSBuZXh0IHN0YXRlPwo+ID4gCj4gPiBXZWxs IHRoZSBtYWluIHRoaW5nIGlzIGp1c3QgdHJhY2tpbmcgd2hpY2ggaXMgdGhlIGN1cnJlbnQgc3Rh dGUuIFRoYXQKPiA+IGp1c3QgbmVlZHMgcmVmYWN0b3JpbmcgdGhlIC5hdG9taWNfZHVwbGljYXRl X3N0YXRlKCkgY2FsbGluZyBjb252ZW50aW9uCj4gPiBhY3Jvc3MgdGhlIHdob2xlIHRyZWUgc28g dGhhdCB3ZSBjYW4gdGhlbiBkdXBsaWNhdGUgdGhlIGNvbW1pdHRlZCBzdGF0ZQo+ID4gcmF0aGVy IHRoYW4gdGhlIGxhdGVzdCBzdGF0ZS4KPiA+IAo+ID4gQWxzbyBkdWUgdG8gdGhlIGNvbW1pdF9o d19kb25lKCkgYmVpbmcgcG90ZW50aWFsbHkgZG9uZSBhZnRlciB0aGUKPiA+IG1vZGVzZXQgbG9j a3MgaGF2ZSBiZWVuIGRyb3BwZWQsIEkgZG9uJ3QgdGhpbmsgd2UgY2FuIGJlIGNlcnRhaW4KPiA+ IG9mIGl0IGdldHRpbmcgY2FsbGVkIGluIHRoZSBzYW1lIG9yZGVyIGFzIHN3YXBfc3RhdGUoKSwg aGVuY2UKPiA+IHdoZW4gd2UgdHJhY2sgdGhlIGNvbW1pdHRlZCBzdGF0ZSBpbiBjb21taXRfaHdf ZG9uZSgpIHdlJ2xsIGhhdmUKPiA+IHRvIGhhdmUgc29tZSB3YXkgdG8gZmlndXJlIG91dCBpZiBv dXIgbmV3IHN0YXRlIGlzIGluIGZhY3QgdGhlCj4gPiBsYXRlc3QgY29tbWl0dGVkIHN0YXRlIGZv ciBlYWNoIG9iamVjdCBvciBpZiB0aGUgY2FsbHMgZ290Cj4gPiByZW9yZGVyZWQuIFdlIGRvbid0 IGluc2VydCBhbnkgZGVwZW5kZW5jaWVzIGJldHdlZW4gdHdvIGNvbW1pdHMKPiA+IHVubGVzcyB0 aGV5IHRvdWNoIHRoZSBzYW1lIGFjdGl2ZSBjcnRjLCB0aHVzIHRoaXMgcmVvcmRlcmluZwo+ID4g c2VlbXMgdmVyeSBtdWNoIHBvc3NpYmxlLiBEdW5ubyBpZiB3ZSBzaG91bGQgYWRkIHNvbWUgd2F5 IHRvIGFkZAo+ID4gc3VjaCBkZXBlbmRlZW5jaWVzIHdoZW5ldmVyIHRoZSBzYW1lIG9iamVjdCBp cyBwYXJ0IG9mIHR3byBvdGhlcndpc2UKPiA+IGluZGVwZW5kZW50IGNvbW1pdHMsIG9yIGlmIHdl IGp1c3Qgd2FudCB0byB0cnkgYW5kIHdvcmsgd2l0aCB0aGUKPiA+IHJlb3JkZXJlZCBjYWxscy4g TXkgaWRlYSBmb3IgdGhlIGxhdHRlciB3YXMgc29tZSBraW5kIG9mIHNlcW5vL2FnZQo+ID4gY291 bnRlciBvbiB0aGUgb2JqZWN0IHN0YXRlcyB0aGF0IGFsbG93cyBtZSB0byByZWNvbmduaXplIHdo aWNoCj4gPiBzdGF0ZSBpcyBtb3JlIHJlY2VudC4gVGhlIG9iamVjdCBzdGF0ZXMgYXJlbid0IHJl ZmNvdW50ZWQgc28gaGFuZ2luZwo+ID4gb24gdG8gdGhlIHdyb25nIHBvaW50ZXIgY291bGQgY2F1 c2UgYW4gb29wcyB0aGUgbmV4dCB0aW1lIHdlIGhhdmUgdG8KPiA+IHBlcmZvcm0gYSBHUFUgcmVz ZXQuCj4gCj4gQXRvbWljIGNvbW1pdHMgYXJlIHN0cm9uZ2x5IG9yZGVyZWQgb24gYSBnaXZlbiBD UlRDLCBzbyBzdHVmZiBjYW4ndCBiZQo+IG91dC1vZi1vcmRlciB3aXRoaW4gb25lLiBBY3Jvc3Mg dGhlbSB0aGUgaWRlYSB3YXMgdG8ganVzdCBhZGQgbW9yZSBDUlRDCj4gc3RhdGVzIGludG8gdGhl IGF0b21pYyBjb21taXQgdG8gbWFrZSBzdXJlIHN0dWZmIGlzIG9yZGVyZWQgY29ycmVjdGx5LgoK QW5kIGF0b21pYyBjb21taXRzIG5vdCB0b3VjaGluZyB0aGUgc2FtZSBjcnRjIHdpbGwgbm90IGJl IG9yZGVyZWQgaW4gYW55CndheS4gVGh1cyBpZiB0aGV5IHRvdWNoIHRoZSBzYW1lIG9iamVjdCAo ZWcuIGRpc2FibGVkIHBsYW5lIG9yIGNvbm5lY3RvcikKd2UgY2FuJ3QgY3VycmVudGx5IHRlbGwg aWYgdGhlIGNvbW1pdF9od19kb25lKCkgY2FsbHMgaGFwcGVuZWQgaW4gdGhlIHNhbWUKb3JkZXIg YXMgdGhlIHN3YXBfc3RhdGUoKSBjYWxscyBmb3IgdGhhdCBwYXJ0aWN1bGFyIG9iamVjdC4KCi0t IApWaWxsZSBTeXJqw6Rsw6QKSW50ZWwgT1RDCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fCmRyaS1kZXZlbCBtYWlsaW5nIGxpc3QKZHJpLWRldmVsQGxpc3Rz LmZyZWVkZXNrdG9wLm9yZwpodHRwczovL2xpc3RzLmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xp c3RpbmZvL2RyaS1kZXZlbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com ([192.55.52.88]:22175 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751933AbdF3Nx3 (ORCPT ); Fri, 30 Jun 2017 09:53:29 -0400 Date: Fri, 30 Jun 2017 16:53:19 +0300 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= To: Daniel Vetter Cc: Chris Wilson , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, stable@vger.kernel.org, Daniel Vetter Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore Message-ID: <20170630135319.GX12629@intel.com> References: <20170629134948.5614-6-ville.syrjala@linux.intel.com> <20170629143642.28221-1-ville.syrjala@linux.intel.com> <149875905040.25198.11560401593791605431@mail.alporthouse.com> <20170629192608.GF12629@intel.com> <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170630133503.4ykvlvtj4wd4nacf@phenom.ffwll.local> Sender: stable-owner@vger.kernel.org List-ID: On Fri, Jun 30, 2017 at 03:35:03PM +0200, Daniel Vetter wrote: > On Thu, Jun 29, 2017 at 10:26:08PM +0300, Ville Syrj�l� wrote: > > On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote: > > > Quoting ville.syrjala@linux.intel.com (2017-06-29 15:36:42) > > > > From: Ville Syrj�l� > > > > > > > > Introduce an rw_semaphore to protect the display commits. All normal > > > > commits use down_read() and hence can proceed in parallel, but GPU reset > > > > will use down_write() making sure no other commits are in progress when > > > > we have to pull the plug on the display engine on pre-g4x platforms. > > > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail() > > > > itself, and we wait for all dependencies before the down_read(), and > > > > thus there is no chance of deadlocks with this scheme. > > > > > > This matches what I thought should be done (I didn't think of using > > > rwsem just a mutex, nice touch). The point I got stuck on was what > > > should be done after the reset? I expected another modeset to return the > > > state back or otherwise the inflight would get confused? > > > > I guess that can happen. For instance, if we have a crtc_enable() in flight, > > and then we do a reset before it gets committed we would end up doing > > crtc_enable() twice in a row without a crtc_disable in between. For page > > flips and such this shouldn't be a big deal in general. > > atomic commits are ordered. You have to wait for the previous ones to > complete before you do a new one. If you don't do that, then all hell > breaks loose. What we're effectively doing here is inserting two new commits in the middle of the timeline, one to disable everything, and another one to re-enable everything. At the end of the the re-enable we would want the hardware state should match exactly what was there before the disable, hence any commits still in the timeline should work correctly. That is, their old_state will match the hardware state when it comes time to commit them. But we can only do that properly after we start to track the committed state. Without that tracking we can get into trouble wrt. the hardware state not matching the old state when it comes time to commit the new state. > > What you really can't do with atomic (without rewriting everything once > more) is cancel a commit. Pre-atomic we could do that on gen4 since the > mmio flips died with the gpu, but that's the one design change we need to > cope with (plus TDR insisting it can't force-complete requests anymore). > > > > > During reset we should be recommiting the state that was committed last. > > > > But for now we'll settle for recommiting the last state for each object. > > > > > > Ah, I guess that explains the above. What is the complication with > > > restoring the current state as opposed to the next state? > > > > Well the main thing is just tracking which is the current state. That > > just needs refactoring the .atomic_duplicate_state() calling convention > > across the whole tree so that we can then duplicate the committed state > > rather than the latest state. > > > > Also due to the commit_hw_done() being potentially done after the > > modeset locks have been dropped, I don't think we can be certain > > of it getting called in the same order as swap_state(), hence > > when we track the committed state in commit_hw_done() we'll have > > to have some way to figure out if our new state is in fact the > > latest committed state for each object or if the calls got > > reordered. We don't insert any dependencies between two commits > > unless they touch the same active crtc, thus this reordering > > seems very much possible. Dunno if we should add some way to add > > such dependeencies whenever the same object is part of two otherwise > > independent commits, or if we just want to try and work with the > > reordered calls. My idea for the latter was some kind of seqno/age > > counter on the object states that allows me to recongnize which > > state is more recent. The object states aren't refcounted so hanging > > on to the wrong pointer could cause an oops the next time we have to > > perform a GPU reset. > > Atomic commits are strongly ordered on a given CRTC, so stuff can't be > out-of-order within one. Across them the idea was to just add more CRTC > states into the atomic commit to make sure stuff is ordered correctly. And atomic commits not touching the same crtc will not be ordered in any way. Thus if they touch the same object (eg. disabled plane or connector) we can't currently tell if the commit_hw_done() calls happened in the same order as the swap_state() calls for that particular object. -- Ville Syrj�l� Intel OTC