From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects Date: Thu, 20 Nov 2014 09:53:42 +0100 Message-ID: <546DAC16.80704@canonical.com> References: <546C5085.1020300@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by gabe.freedesktop.org (Postfix) with ESMTP id B8BC06ED4B for ; Thu, 20 Nov 2014 00:53:45 -0800 (PST) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Michael Marineau Cc: linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ben Skeggs List-Id: dri-devel@lists.freedesktop.org T3AgMjAtMTEtMTQgb20gMDU6MDYgc2NocmVlZiBNaWNoYWVsIE1hcmluZWF1Ogo+IE9uIFdlZCwg Tm92IDE5LCAyMDE0IGF0IDEyOjEwIEFNLCBNYWFydGVuIExhbmtob3JzdAo+IDxtYWFydGVuLmxh bmtob3JzdEBjYW5vbmljYWwuY29tPiB3cm90ZToKPj4gSGV5LAo+Pgo+PiBPbiAxOS0xMS0xNCAw Nzo0MywgTWljaGFlbCBNYXJpbmVhdSB3cm90ZToKPj4+IE9uIDMuMTgtcmMga2VybmVsJ3MgSSBo YXZlIGJlZW4gaW50ZXJtaXR0ZW50bHkgZXhwZXJpZW5jaW5nIEdQVQo+Pj4gbG9ja3VwcyBzaG9y dGx5IGFmdGVyIHN0YXJ0dXAsIGFjY29tcGFuaWVkIHdpdGggb25lIG9yIGJvdGggb2YgdGhlCj4+ PiBmb2xsb3dpbmcgZXJyb3JzOgo+Pj4KPj4+IG5vdXZlYXUgRVsgICBQRklGT11bMDAwMDowMTow MC4wXSByZWFkIGZhdWx0IGF0IDB4MDAwNzM0YTAwMCBbUFRFXQo+Pj4gZnJvbSBQQkRNQTAvSE9T VF9DUFUgb24gY2hhbm5lbCAweDAwN2ZhYTMwMDAgW3Vua25vd25dCj4+PiBub3V2ZWF1IEVbICAg ICBEUk1dIEdQVSBsb2NrdXAgLSBzd2l0Y2hpbmcgdG8gc29mdHdhcmUgZmJjb24KPj4+Cj4+PiBJ IHdhcyBhYmxlIHRvIHRyYWNlIHRoZSBpc3N1ZSB3aXRoIGJpc2VjdCB0byBjb21taXQKPj4+IDgw OWU5NDQ3YjkyZmZlMTM0NmIyZDZlYzM5MGUyMTJkNTMwN2Y2MWMgImRybS9ub3V2ZWF1OiB1c2Ug c2hhcmVkCj4+PiBmZW5jZXMgZm9yIHJlYWRhYmxlIG9iamVjdHMiLiBUaGUgbG9ja3VwcyBhcHBl YXIgdG8gaGF2ZSBjbGVhcmVkIHVwCj4+PiBzaW5jZSByZXZlcnRpbmcgdGhhdCBhbmQgYSBmZXcg cmVsYXRlZCBmb2xsb3d1cCBjb21taXRzOgo+Pj4KPj4+IDgwOWU5NDQ3OiAiZHJtL25vdXZlYXU6 IHVzZSBzaGFyZWQgZmVuY2VzIGZvciByZWFkYWJsZSBvYmplY3RzIgo+Pj4gMDU1ZGZmZGY6ICJk cm0vbm91dmVhdTogYnVtcCBkcml2ZXIgcGF0Y2hsZXZlbCB0byAxLjIuMSIKPj4+IGUzYmU0YzIz OiAiZHJtL25vdXZlYXU6IHNwZWNpZnkgaWYgaW50ZXJydXB0aWJsZSB3YWl0IGlzIGRlc2lyZWQg aW4KPj4+IG5vdXZlYXVfZmVuY2Vfc3luYyIKPj4+IDE1YTk5NmJiOiAiZHJtL25vdXZlYXU6IGFz c2lnbiBmZW5jZV9jaGFuLT5uYW1lIGNvcnJlY3RseSIKPj4gV2VpcmQuIEknbSBub3Qgc3VyZSB5 ZXQgd2hhdCBjYXVzZXMgaXQuCj4+Cj4+IGh0dHA6Ly9jZ2l0LmZyZWVkZXNrdG9wLm9yZy9+bWxh bmtob3JzdC9saW51eC9jb21taXQvP2g9Zml4ZWQtZmVuY2VzLWZvci1iaXNlY3QmaWQ9ODZiZTRm MjE2YmJiOWVhMzMzOTg0M2E1NjU4ZDRjMjExNjJjN2VlMgo+IEJ1aWxkaW5nIGEga2VybmVsIGZy b20gdGhhdCBjb21taXQgZ2l2ZXMgbWUgYW4gZW50aXJlbHkgbmV3IGJlaGF2aW9yOgo+IFggaGFu Z3MgZm9yIGF0IGxlYXN0IDEwLTIwIHNlY29uZHMgYXQgYSB0aW1lIHdpdGggYnJpZWYgbW9tZW50 cyBvZgo+IHJlc3BvbnNpdmVuZXNzIGJlZm9yZSBoYW5naW5nIGFnYWluIHdoaWxlIGdpdGsgb24g dGhlIGtlcm5lbCByZXBvCj4gbG9hZHMuIE90aGVyd2lzZSB0aGUgc3lzdGVtIGlzIHJlc3BvbnNp dmUuIFRoZSBoZWFkIG9mIHRoYXQKPiBmaXhlZC1mZW5jZXMtZm9yLWJpc2VjdCBicmFuY2ggKDFj NmFhZmI1KSB3aGljaCBpcyB0aGUgInVzZSBzaGFyZWQKPiBmZW5jZXMgZm9yIHJlYWRhYmxlIG9i amVjdHMiIGNvbW1pdCBJIG9yaWdpbmFsbHkgYmlzZWN0ZWQgdG8gZG9lcwo+IGZlYXR1cmUgdGhl IGNvbXBsZXRlIGxvY2t1cHMgSSB3YXMgc2VlaW5nIGJlZm9yZS4KT2sgZm9yIHRoZSBzYWtlIG9m IGFyZ3VtZW50IGxldHMganVzdCBhc3N1bWUgdGhleSdyZSBzZXBhcmF0ZSBidWdzLCBhbmQgd2Ug c2hvdWxkIGxvb2sgYXQgeG9yZwpoYW5naW5nIGZpcnN0LgoKSXMgdGhlcmUgYW55dGhpbmcgaW4g dGhlIGRtZXNnIHdoZW4gdGhlIGhhbmdpbmcgaGFwcGVucz8KCkFuZCBpdCdzIHByb2JhYmx5IDE1 IHNlY29uZHMsIGlmIGl0J3MgY2FsbGVkIHRocm91Z2ggbm91dmVhdV9mZW5jZV93YWl0LgoKVHJ5 IGNoYW5naW5nIGVsc2UgaWYgKCFyZXQpIHRvIGVsc2UgaWYgKFdBUk5fT04oIXJldCkpIGluIHRo YXQgZnVuY3Rpb24sIGFuZCBzZWUgaWYgeW91IGdldCBzb21lIGRtZXNnIHNwYW0uIDopCgoKPj4g T24gdGhlIEVESVRFRCBwYXRjaCBmcm9tIGZpeGVkLWZlbmNlcy1mb3ItYmlzZWN0LCBjYW4geW91 IGRvIHRoZSBmb2xsb3dpbmc6Cj4+Cj4+IEluIG5vdXZlYXUvbnY4NF9mZW5jZS5jIGZ1bmN0aW9u IG52ODRfZmVuY2VfY29udGV4dF9uZXcsIHJlbW92ZQo+Pgo+PiBmY3R4LT5iYXNlLnNlcXVlbmNl ID0gbnY4NF9mZW5jZV9yZWFkKGNoYW4pOwo+Pgo+PiBhbmQgYWRkIGJhY2sKPj4KPj4gbm91dmVh dV9ib193cjMyKHByaXYtPmJvLCBjaGFuLT5jaGlkICogMTYvNCwgMHgwMDAwMDAwMCk7Cj4gTWFr aW5nIHlvdXIgc3VnZ2VzdGVkIGNoYW5nZSBvbiB0b3Agb2YgZWFjaCA4NmJlNGYyMSBhbmQgMWM2 YWFmYjUgbWFkZQo+IG5vIG5vdGljZWFibGUgZGlmZmVyZW5jZSBpbiBlaXRoZXIgb2YgdGhlIHR3 byBiZWhhdmlvcnMuCj4KPj4gSWYgdGhhdCBmYWlscyB5b3Ugc2hvdWxkIGNvbXBpbGUgeW91ciBr ZXJuZWwgd2l0aCB0cmFjZSBldmVudHMsIHRvIGdldCBzb21lIGRlYnVnZ2luZyBpbmZvIGZyb20g dGhlIGZlbmNlcy4gSSdsbCBwb3N0IGRlYnVnZ2luZyBpbmZvIGlmIHRoaXMgZG9lcyBub3QgZml4 IGl0Lgo+IEhhcHB5IHRvIGdhdGhlciB3aGF0ZXZlciBkZWJ1ZyBsb2cgb3IgdHJhY2luZyBkYXRh IHlvdSBuZWVkIDopCj4KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCmRyaS1kZXZlbCBtYWlsaW5nIGxpc3QKZHJpLWRldmVsQGxpc3RzLmZyZWVkZXNrdG9w Lm9yZwpodHRwOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRl dmVsCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757222AbaKTIxq (ORCPT ); Thu, 20 Nov 2014 03:53:46 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:55437 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757178AbaKTIxp (ORCPT ); Thu, 20 Nov 2014 03:53:45 -0500 Message-ID: <546DAC16.80704@canonical.com> Date: Thu, 20 Nov 2014 09:53:42 +0100 From: Maarten Lankhorst User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Michael Marineau CC: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Ben Skeggs , David Airlie Subject: Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects References: <546C5085.1020300@canonical.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Op 20-11-14 om 05:06 schreef Michael Marineau: > On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst > wrote: >> Hey, >> >> On 19-11-14 07:43, Michael Marineau wrote: >>> On 3.18-rc kernel's I have been intermittently experiencing GPU >>> lockups shortly after startup, accompanied with one or both of the >>> following errors: >>> >>> nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000734a000 [PTE] >>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown] >>> nouveau E[ DRM] GPU lockup - switching to software fbcon >>> >>> I was able to trace the issue with bisect to commit >>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared >>> fences for readable objects". The lockups appear to have cleared up >>> since reverting that and a few related followup commits: >>> >>> 809e9447: "drm/nouveau: use shared fences for readable objects" >>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1" >>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in >>> nouveau_fence_sync" >>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly" >> Weird. I'm not sure yet what causes it. >> >> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2 > Building a kernel from that commit gives me an entirely new behavior: > X hangs for at least 10-20 seconds at a time with brief moments of > responsiveness before hanging again while gitk on the kernel repo > loads. Otherwise the system is responsive. The head of that > fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared > fences for readable objects" commit I originally bisected to does > feature the complete lockups I was seeing before. Ok for the sake of argument lets just assume they're separate bugs, and we should look at xorg hanging first. Is there anything in the dmesg when the hanging happens? And it's probably 15 seconds, if it's called through nouveau_fence_wait. Try changing else if (!ret) to else if (WARN_ON(!ret)) in that function, and see if you get some dmesg spam. :) >> On the EDITED patch from fixed-fences-for-bisect, can you do the following: >> >> In nouveau/nv84_fence.c function nv84_fence_context_new, remove >> >> fctx->base.sequence = nv84_fence_read(chan); >> >> and add back >> >> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x00000000); > Making your suggested change on top of each 86be4f21 and 1c6aafb5 made > no noticeable difference in either of the two behaviors. > >> If that fails you should compile your kernel with trace events, to get some debugging info from the fences. I'll post debugging info if this does not fix it. > Happy to gather whatever debug log or tracing data you need :) >