From mboxrd@z Thu Jan  1 00:00:00 1970
From: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
Subject: Re: [PATCH] iommu: arm-smmu: Set SCTLR.HUPCF bit
Date: Mon, 26 Nov 2018 19:31:48 +0000
Message-ID: <20181126193147.GB534@arm.com>
References: <20180927224609.19515-1-robdclark@gmail.com>
 <20181029191000.GD16739@arm.com>
 <CAF6AEGtgC0V6ii2=p2HGmqvHFKNJhaOLi8873SGPDsrg70xGRg@mail.gmail.com>
 <20181113063225.GA3109@brain-police>
 <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Return-path: <freedreno-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Content-Disposition: inline
In-Reply-To: <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/freedreno>,
 <mailto:freedreno-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/freedreno>
List-Post: <mailto:freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
List-Help: <mailto:freedreno-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/freedreno>,
 <mailto:freedreno-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=subscribe>
Errors-To: freedreno-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Sender: "Freedreno" <freedreno-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
To: Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>, "list-Y9sIeH5OGRo@public.gmane.org:IOMMU DRIVERS <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
 Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, " <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, Linux Kernel Mailing List <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>"list-Y9sIeH5OGRo@public.gmane.org:IOMMU DRIVERS <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
 Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, " <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>, linux-arm-msm <linux-arm-msm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, freedreno <freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>, "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" <linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>
List-Id: linux-arm-msm@vger.kernel.org

SGkgUm9iLAoKT24gVHVlLCBOb3YgMTMsIDIwMTggYXQgMDg6MTI6MzVBTSAtMDUwMCwgUm9iIENs
YXJrIHdyb3RlOgo+IE9uIFR1ZSwgTm92IDEzLCAyMDE4IGF0IDE6MzIgQU0gV2lsbCBEZWFjb24g
PHdpbGwuZGVhY29uQGFybS5jb20+IHdyb3RlOgo+ID4gT24gRnJpLCBOb3YgMDksIDIwMTggYXQg
MDE6MDE6NTVQTSAtMDUwMCwgUm9iIENsYXJrIHdyb3RlOgo+ID4gPiBPbiBNb24sIE9jdCAyOSwg
MjAxOCBhdCAzOjA5IFBNIFdpbGwgRGVhY29uIDx3aWxsLmRlYWNvbkBhcm0uY29tPiB3cm90ZToK
PiA+ID4gPiBPbiBUaHUsIFNlcCAyNywgMjAxOCBhdCAwNjo0NjowN1BNIC0wNDAwLCBSb2IgQ2xh
cmsgd3JvdGU6Cj4gPiA+ID4gPiBXZSBzZWVtIHRvIG5lZWQgdG8gc2V0IGVpdGhlciB0aGlzIG9y
IENGQ0ZHIChzdGFsbCksIG90aGVyd2lzZSBncHUKPiA+ID4gPiA+IGZhdWx0cyB0cmlnZ2VyIHBy
b2JsZW1zIHdpdGggb3RoZXIgaW4tZmxpZ2h0IHRyYW5zYWN0aW9ucyBmcm9tIHRoZQo+ID4gPiA+
ID4gR1BVIGNhdXNpbmcgQ1AgZXJyb3JzLCBldGMuCj4gPiA+ID4gPgo+ID4gPiA+ID4gSW4gdGhl
IEFSTSBTTU1VIHNwZWMsIHRoZSAnSGl0IHVuZGVyIHByZXZpb3VzIGNvbnRleHQgZmF1bHQnIGJp
dCBpcwo+ID4gPiA+ID4gZGVzY3JpYmVkIGFzOgo+ID4gPiA+ID4KPiA+ID4gPiA+ICAnMCcgLSBT
dGFsbCBvciB0ZXJtaW5hdGUgc3Vic2VxdWVudCB0cmFuc2FjdGlvbnMgaW4gdGhlIHByZXNlbmNl
Cj4gPiA+ID4gPiAgICAgICAgb2YgYW4gb3V0c3RhbmRpbmcgY29udGV4dCBmYXVsdAo+ID4gPiA+
ID4gICcxJyAtIFByb2Nlc3MgYWxsIHN1YnNlcXVlbnQgdHJhbnNhY3Rpb25zIGluZGVwZW5kZW50
bHkgb2YgYW55Cj4gPiA+ID4gPiAgICAgICAgb3V0c3RhbmRpbmcgY29udGV4dCBmYXVsdC4KPiA+
ID4gPiA+Cj4gPiA+ID4gPiBTaW5jZSB3ZSBkb24ndCBlbmFibGUgQ0ZDRkcgKHN0YWxsKSB0aGUg
YmVoYXZpb3Igb2YgdGVybWluYXRpbmcKPiA+ID4gPiA+IG90aGVyIHRyYW5zYWN0aW9ucyBtYWtl
cyBzZW5zZS4gIEFuZCBpcyBwcm9iYWJseSBub3Qgd2hhdCB3ZSB3YW50Cj4gPiA+ID4gPiAoYW5k
IGRlZmluYXRlbHkgbm90IHdoYXQgd2Ugd2FudCBmb3IgR1BVKS4KPiA+ID4gPiA+Cj4gPiA+ID4g
PiBTaWduZWQtb2ZmLWJ5OiBSb2IgQ2xhcmsgPHJvYmRjbGFya0BnbWFpbC5jb20+Cj4gPiA+ID4g
PiAtLS0KPiA+ID4gPiA+IFNvIEkgaGl0IHRoaXMgaXNzdWUgYSBsb25nIHRpbWUgYmFjayBvbiA4
MjAgKG1zbTg5OTYpIGFuZCBhdCB0aGUKPiA+ID4gPiA+IHRpbWUgSSBzb2x2ZWQgaXQgd2l0aCBh
IHBhdGNoIHRoYXQgZW5hYmxlZCBDRkNGRy4gIEFuZCBpdCByZXN1cmZhY2VkCj4gPiA+ID4gPiBt
b3JlIHJlY2VudGx5IG9uIHNkbTg0NS4gIEJ1dCBhdCB0aGUgdGltZSBDRkNGRyB3YXMgcmVqZWN0
ZWQsIGlpcmMKPiA+ID4gPiA+IGJlY2F1c2Ugb2YgY29uY2VybiB0aGF0IGl0IHdvdWxkIGNhdXNl
IHByb2JsZW1zIG9uIG90aGVyIG5vbi1xY29tCj4gPiA+ID4gPiBhcm0gc21tdSBpbXBsZW1lbnRh
dGlvbnMuICBBbmQgSSB0aGluayBJIGZvcmdvdCB0byBzZW5kIHRoaXMgdmVyc2lvbgo+ID4gPiA+
ID4gb2YgdGhlIHNvbHV0aW9uLgo+ID4gPiA+ID4KPiA+ID4gPiA+IElmIGVuYWJsaW5nIEhVUENG
IGlzIGFudGljaXBhdGVkIHRvIGNhdXNlIHByb2JsZW1zIG9uIG90aGVyIEFSTQo+ID4gPiA+ID4g
U01NVSBpbXBsZW1lbnRhdGlvbnMsIEkgdGhpbmsgSSBjYW4gY29tZSB1cCB3aXRoIGEgdmFyaWFu
dCBvZiB0aGlzCj4gPiA+ID4gPiBwYXRjaCB3aGljaCBjb25kaXRpb25hbGx5IGVuYWJsZXMgaXQg
Zm9yIHNuYXBkcmFnb24uCj4gPiA+ID4gPgo+ID4gPiA+ID4gRWl0aGVyIHdheSwgSSdkIHJlYWxs
eSBsaWtlIHRvIGdldCBzb21lIHZhcmlhbnQgb2YgdGhpcyBmaXggbWVyZ2VkCj4gPiA+ID4gPiAo
YW5kIHByb2JhYmx5IGl0IHdvdWxkIGJlIGEgZ29vZCBpZGVhIGZvciBzdGFibGUga2VybmVsIGJy
YW5jaGVzCj4gPiA+ID4gPiB0b28pLCBzaW5jZSBjdXJyZW50IGJlaGF2aW91ciB3aXRoIHRoZSBH
UFUgbWVhbnMgZmF1bHRzIHR1cm4gaW50bwo+ID4gPiA+ID4gYSBmYW50YXN0aWMgY2FzY2FkZSBv
ZiBmYWlsLgo+ID4gPiA+Cj4gPiA+ID4gQ2FuIHlvdSBkZXNjcmliZSBob3cgdGhpcyBmYW50YXN0
aWMgY2FzY2FkZSBvZiBmYWlsIGltcHJvdmVzIHdpdGggdGhpcwo+ID4gPiA+IHBhdGNoLCBwbGVh
c2U/IElmIHlvdSdyZSBnZXR0aW5nIGNvbnRleHQgZmF1bHRzIHRoZW4gc29tZXRoaW5nIGhhcyBh
bHJlYWR5Cj4gPiA+ID4gZ29uZSBob3JyaWJseSB3cm9uZywgc28gSSdtIHRyeWluZyB0byB3b3Jr
IG91dCBob3cgdGhpcyBpbXByb3ZlcyB0aGluZ3MuCj4gPiA+ID4KPiA+ID4KPiA+ID4gVGhlcmUg
YXJlIHBsZW50eSBvZiBjYXNlcyB3aGVyZSBnZXR0aW5nIGlvbW11IGZhdWx0cyB3aXRoIGEgR1BV
IGlzCj4gPiA+ICJub3JtYWwiLCBvciBhdCBsZWFzdCBub3Qgc29tZXRoaW5nIHRoZSBrZXJuZWwg
b3IgZXZlbiBHTCBkcml2ZXIgY2FuCj4gPiA+IGNvbnRyb2wuCj4gPgo+ID4gU3VjaCBhcz8gQWxs
IHRoZSBtYWlubGluZSBkcml2ZXIgZG9lcyBpcyBwcmludCBhIGRpYWdub3N0aWMgYW5kIGNsZWFy
IHRoZQo+ID4gZmF1bHQsIHdoaWNoIGRvZXNuJ3Qgc2VlbSBnZW5lcmFsbHkgdXNlZnVsLgo+IAo+
IGl0IGlzIHVzZWZ1bCB0byBkZWJ1ZyB0aGUgZmF1bHQgOy0pCj4gCj4gQWx0aG91Z2ggZXZlbnR1
YWxseSB3ZSdsbCB3YW50IHRvIGJlIGFibGUgdG8gZG8gbW9yZSB0aGFuIHRoYXQsIGxpa2UKPiBo
YXZlIHRoZSBmYXVsdCB0cmlnZ2VyIGJyaW5naW5nIGluIHBhZ2VzIG9mIGEgbW1hcCdkIGZpbGUg
YW5kIHRoYXQKPiBzb3J0IG9mIHRoaW5nLgoKUmlnaHQsIGFuZCBmZWVscyB2ZXJ5IHN0cmFuZ2Ug
dG8gbWUgaWYgd2UgaGF2ZSB0aGlzIGJpdCBzZXQgYmVjYXVzZSBpdAptZWFucyB0aGF0IHlvdXIg
ZmF1bHQgaXMgbm8gbG9uZ2VyIHN5bmNocm9ub3VzIGFuZCB0aGVyZWZvcmUgZGl2ZXJnZXMKZnJv
bSB0aGUgZmF1bHQgbW9kZWwgdGhhdCB5b3UgZ2V0IGZyb20gdGhlIENQVSwgd2hlcmUgeW91IGNl
cnRhaW5seQp3b3VsZG4ndCBleHBlY3Qgc3RvcmVzIGFwcGVhcmluZyBpbiB0aGUgcHJvZ3JhbSBh
ZnRlciBhIGZhdWx0aW5nIGxvYWQgdG8KYmUgdmlzaWJsZSBpbiBtZW1vcnkuIEhvd2V2ZXIsIHRo
aW5raW5nIGhhcmRlciBhYm91dCBpdCwgSSBzdXBwb3NlIHdlJ3JlCmFscmVhZHkgaW4gYSBzaXR1
YXRpb24gd2hlcmUgdGhlIHRyYW5zbGF0aW9ucyBhcmUgaGFuZGxlZCBvdXQgb2Ygb3JkZXIKaW4g
dGhlIGFic2VuY2Ugb2YgYmFycmllcnMsIHNvIG1heWJlIGl0J3Mgbm90IHRoZSBlbmQgb2YgdGhl
IHdvcmxkLgoKQ291bGQgeW91IGR1bXAgdGhlIEZTUiB2YWx1ZSB0aGF0IHlvdSBzZWUgaW4gdGhl
IGZhdWx0IGhhbmRsZXIsIHBsZWFzZT8KRnJvbSBteSByZWFkaW5nIG9mIHRoZSBhcmNoaXRlY3R1
cmUgc3BlYywgYXMgbG9uZyBhcyBjbGVhciBhbGwgb2YgdGhlCmZhdWx0IGJpdHMgaW4gdGhlIGly
cSBoYW5kbGVyLCB0aGVuIHlvdXIgbWFjaGluZSBzaG91bGRuJ3QgZGllIGxpa2UgaXQKZG9lcyB3
aXRoIEhVUENGRz1DRkNGRz0wLi4KCj4gPiA+IFdpdGggdGhpcyBwYXRjaCwgeW91IHN0aWxsIGdl
dCB0aGUgaW9tbXUgZmF1bHQsIGJ1dCBpdCBkb2Vzbid0IGNhdXNlCj4gPiA+IHRoZSBncHUgdG8g
Y3Jhc2guICBCdXQgd2l0aG91dCBpdCwgb3RoZXIgbWVtb3J5IGFjY2Vzc2VzIGluIGZsaWdodAo+
ID4gPiB3aGlsZSB0aGUgZmF1bHQgb2NjdXJzLCBsaWtlIHRoZSBHUFUgY29tbWFuZC1wcm9jZXNz
b3IgcmVhZGluZyBmdXJ0aGVyCj4gPiA+IGFoZWFkIGluIHRoZSBjbWRzdHJlYW0gdG8gc2V0dXAg
bmV4dCBkcmF3LCB3b3VsZCByZXR1cm4gemVybydzLAo+ID4gPiBjYXVzaW5nIHRoZSBHUFUgdG8g
Y3Jhc2ggb3IgZ2V0IGludG8gYSBiYWQgc3RhdGUuCj4gPgo+ID4gSSBnZXQgdGhhdCBwYXJ0LCBi
dXQgSSBkb24ndCB1bmRlcnN0YW5kIHdoeSB3ZSdyZSBzZWVpbmcgZmF1bHRzIGluIHRoZSBmaXJz
dAo+ID4gcGxhY2UgYW5kIEkgd29ycnkgdGhhdCB0aGlzIHBhdGNoIGlzIGp1c3QgdGhlIHRpcCBv
ZiB0aGUgaWNlYmVyZy4gSXQncyBhbHNvCj4gPiBub3QgY2xlYXIgdGhhdCBwcm9jZXNzaW5nIHN1
YnNlcXVlbnQgdHJhbnNhY3Rpb25zIGlzIGFsd2F5cyB0aGUgcmlnaHQgdGhpbmcKPiA+IHRvIGRv
IGluIGEgd29ybGQgd2hlcmUgd2UgYWN0dWFsbHkgd2FudCB0byByZXBvcnQgKGFuZCBoYW5kbGUp
IHN5bmNocm9ub3VzCj4gPiBmYXVsdHMgZnJvbSBkZXZpY2VzLgo+IAo+IFN1cmUsIGl0IGlzIGEg
YnVnLi4gYnV0IGl0IGNhbiBiZSBhbiBhcHBsaWNhdGlvbiBidWcgdGhhdCBpcyBub3QKPiBzb21l
dGhpbmcgdGhlIHVzZXJzcGFjZSBHTCBkcml2ZXIgb3Iga2VybmVsIGNvdWxkIGRvIGFueXRoaW5n
IGFib3V0Lgo+IFdlIHNob3VsZG4ndCBsZXQgdGhpcyBraWxsIHRoZSBHUFUuICBJZiB0aGUgYXBw
bGljYXRpb24gZGlkbid0IGhhdmUKPiB0aGlzIG11Y2ggY29udHJvbCwgd2Ugd291bGRuJ3QgbmVl
ZCBhbiBJT01NVSBpbiB0aGUgZmlyc3QgcGxhY2VbMV0uCj4gV2l0aCBvcGVuY2wgY29tcHV0ZSwg
dGhlIHVzZXJzcGFjZSBjb250cm9sbGVkIHNoYWRlciBoYXMgZnVsbCBibG93bgo+IHBvaW50ZXJz
IHRvIEdQVSBtZW1vcnkuCj4gCj4gQW5kIGV2ZW4gaW4gY2FzZXMgd2hlcmUgaXQgaXMgYSB1c2Vy
c3BhY2UgR0wgZHJpdmVyIGJ1ZywgaGF2aW5nIHNvbWUKPiByb2J1c3RuZXNzIHRvIG5vdCBjb21w
bGV0ZWx5IGtpbGwgdGhlIEdQVSBtYWtlcyBkZWJ1Z2dpbmcgbXVjaCBlYXNpZXIuCj4gU29tZXRo
aW5nIEkgZG8gYSBsb3Qgd2hlbiBicmluZ2luZyB1cCBzdXBwb3J0IGZvciBhIG5ldyBnZW5lcmF0
aW9uIG9mCj4gR1BVLgo+IAo+IEknbSBoYXZpbmcgYSBoYXJkIHRpbWUgdW5kZXJzdGFuZGluZyB5
b3VyIG9iamVjdGlvbiB0byB0aGlzLgo+IFJldHVybmluZyB6ZXJvJ3MgZm9yIG5vbi1mYXVsdGlu
ZyB0cmFuc2FjdGlvbnMgaXMgYSAqcmVhbGx5IGJhZCBpZGVhKi4KCldhaXQgLS0gd2hvIHNhaWQg
YW55dGhpbmcgYWJvdXQgcmV0dXJuaW5nIHplcm9lcz8gV2hlcmUgZG9lcyB0aGF0IGJlaGF2aW91
cgphcHBlYXIgaW4gdGhlIGFyY2hpdGVjdHVyZT8KCldpbGwKX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX18KRnJlZWRyZW5vIG1haWxpbmcgbGlzdApGcmVlZHJl
bm9AbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21h
aWxtYW4vbGlzdGluZm8vZnJlZWRyZW5vCg==

From mboxrd@z Thu Jan  1 00:00:00 1970
From: will.deacon@arm.com (Will Deacon)
Date: Mon, 26 Nov 2018 19:31:48 +0000
Subject: [PATCH] iommu: arm-smmu: Set SCTLR.HUPCF bit
In-Reply-To: <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w@mail.gmail.com>
References: <20180927224609.19515-1-robdclark@gmail.com>
 <20181029191000.GD16739@arm.com>
 <CAF6AEGtgC0V6ii2=p2HGmqvHFKNJhaOLi8873SGPDsrg70xGRg@mail.gmail.com>
 <20181113063225.GA3109@brain-police>
 <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w@mail.gmail.com>
Message-ID: <20181126193147.GB534@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Hi Rob,

On Tue, Nov 13, 2018 at 08:12:35AM -0500, Rob Clark wrote:
> On Tue, Nov 13, 2018 at 1:32 AM Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Nov 09, 2018 at 01:01:55PM -0500, Rob Clark wrote:
> > > On Mon, Oct 29, 2018 at 3:09 PM Will Deacon <will.deacon@arm.com> wrote:
> > > > On Thu, Sep 27, 2018 at 06:46:07PM -0400, Rob Clark wrote:
> > > > > We seem to need to set either this or CFCFG (stall), otherwise gpu
> > > > > faults trigger problems with other in-flight transactions from the
> > > > > GPU causing CP errors, etc.
> > > > >
> > > > > In the ARM SMMU spec, the 'Hit under previous context fault' bit is
> > > > > described as:
> > > > >
> > > > >  '0' - Stall or terminate subsequent transactions in the presence
> > > > >        of an outstanding context fault
> > > > >  '1' - Process all subsequent transactions independently of any
> > > > >        outstanding context fault.
> > > > >
> > > > > Since we don't enable CFCFG (stall) the behavior of terminating
> > > > > other transactions makes sense.  And is probably not what we want
> > > > > (and definately not what we want for GPU).
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > > > > ---
> > > > > So I hit this issue a long time back on 820 (msm8996) and at the
> > > > > time I solved it with a patch that enabled CFCFG.  And it resurfaced
> > > > > more recently on sdm845.  But at the time CFCFG was rejected, iirc
> > > > > because of concern that it would cause problems on other non-qcom
> > > > > arm smmu implementations.  And I think I forgot to send this version
> > > > > of the solution.
> > > > >
> > > > > If enabling HUPCF is anticipated to cause problems on other ARM
> > > > > SMMU implementations, I think I can come up with a variant of this
> > > > > patch which conditionally enables it for snapdragon.
> > > > >
> > > > > Either way, I'd really like to get some variant of this fix merged
> > > > > (and probably it would be a good idea for stable kernel branches
> > > > > too), since current behaviour with the GPU means faults turn into
> > > > > a fantastic cascade of fail.
> > > >
> > > > Can you describe how this fantastic cascade of fail improves with this
> > > > patch, please? If you're getting context faults then something has already
> > > > gone horribly wrong, so I'm trying to work out how this improves things.
> > > >
> > >
> > > There are plenty of cases where getting iommu faults with a GPU is
> > > "normal", or at least not something the kernel or even GL driver can
> > > control.
> >
> > Such as? All the mainline driver does is print a diagnostic and clear the
> > fault, which doesn't seem generally useful.
> 
> it is useful to debug the fault ;-)
> 
> Although eventually we'll want to be able to do more than that, like
> have the fault trigger bringing in pages of a mmap'd file and that
> sort of thing.

Right, and feels very strange to me if we have this bit set because it
means that your fault is no longer synchronous and therefore diverges
from the fault model that you get from the CPU, where you certainly
wouldn't expect stores appearing in the program after a faulting load to
be visible in memory. However, thinking harder about it, I suppose we're
already in a situation where the translations are handled out of order
in the absence of barriers, so maybe it's not the end of the world.

Could you dump the FSR value that you see in the fault handler, please?
>>From my reading of the architecture spec, as long as clear all of the
fault bits in the irq handler, then your machine shouldn't die like it
does with HUPCFG=CFCFG=0..

> > > With this patch, you still get the iommu fault, but it doesn't cause
> > > the gpu to crash.  But without it, other memory accesses in flight
> > > while the fault occurs, like the GPU command-processor reading further
> > > ahead in the cmdstream to setup next draw, would return zero's,
> > > causing the GPU to crash or get into a bad state.
> >
> > I get that part, but I don't understand why we're seeing faults in the first
> > place and I worry that this patch is just the tip of the iceberg. It's also
> > not clear that processing subsequent transactions is always the right thing
> > to do in a world where we actually want to report (and handle) synchronous
> > faults from devices.
> 
> Sure, it is a bug.. but it can be an application bug that is not
> something the userspace GL driver or kernel could do anything about.
> We shouldn't let this kill the GPU.  If the application didn't have
> this much control, we wouldn't need an IOMMU in the first place[1].
> With opencl compute, the userspace controlled shader has full blown
> pointers to GPU memory.
> 
> And even in cases where it is a userspace GL driver bug, having some
> robustness to not completely kill the GPU makes debugging much easier.
> Something I do a lot when bringing up support for a new generation of
> GPU.
> 
> I'm having a hard time understanding your objection to this.
> Returning zero's for non-faulting transactions is a *really bad idea*.

Wait -- who said anything about returning zeroes? Where does that behaviour
appear in the architecture?

Will

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Mdfq=OF=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7743FC43441
	for <linux-kernel@archiver.kernel.org>; Mon, 26 Nov 2018 19:31:34 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id D6267205C9
	for <linux-kernel@archiver.kernel.org>; Mon, 26 Nov 2018 19:31:33 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6267205C9
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726985AbeK0G0j (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 27 Nov 2018 01:26:39 -0500
Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45946 "EHLO
        foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1726630AbeK0G0i (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 27 Nov 2018 01:26:38 -0500
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
        by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 381AF1B55;
        Mon, 26 Nov 2018 11:31:31 -0800 (PST)
Received: from edgewater-inn.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249])
        by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 085B33F5AF;
        Mon, 26 Nov 2018 11:31:31 -0800 (PST)
Received: by edgewater-inn.cambridge.arm.com (Postfix, from userid 1000)
        id 81AB61AE0839; Mon, 26 Nov 2018 19:31:48 +0000 (GMT)
Date:   Mon, 26 Nov 2018 19:31:48 +0000
From:   Will Deacon <will.deacon@arm.com>
To:     Rob Clark <robdclark@gmail.com>
Cc:     "list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg
        Roedel <joro@8bytes.org>," <iommu@lists.linux-foundation.org>,
        Robin Murphy <robin.murphy@arm.com>,
        "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" 
        <linux-arm-kernel@lists.infradead.org>,
        freedreno <freedreno@lists.freedesktop.org>,
        linux-arm-msm <linux-arm-msm@vger.kernel.org>,
        "list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg
        Roedel <joro@8bytes.org>," <joro@8bytes.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] iommu: arm-smmu: Set SCTLR.HUPCF bit
Message-ID: <20181126193147.GB534@arm.com>
References: <20180927224609.19515-1-robdclark@gmail.com>
 <20181029191000.GD16739@arm.com>
 <CAF6AEGtgC0V6ii2=p2HGmqvHFKNJhaOLi8873SGPDsrg70xGRg@mail.gmail.com>
 <20181113063225.GA3109@brain-police>
 <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07M+0w@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Rob,

On Tue, Nov 13, 2018 at 08:12:35AM -0500, Rob Clark wrote:
> On Tue, Nov 13, 2018 at 1:32 AM Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Nov 09, 2018 at 01:01:55PM -0500, Rob Clark wrote:
> > > On Mon, Oct 29, 2018 at 3:09 PM Will Deacon <will.deacon@arm.com> wrote:
> > > > On Thu, Sep 27, 2018 at 06:46:07PM -0400, Rob Clark wrote:
> > > > > We seem to need to set either this or CFCFG (stall), otherwise gpu
> > > > > faults trigger problems with other in-flight transactions from the
> > > > > GPU causing CP errors, etc.
> > > > >
> > > > > In the ARM SMMU spec, the 'Hit under previous context fault' bit is
> > > > > described as:
> > > > >
> > > > >  '0' - Stall or terminate subsequent transactions in the presence
> > > > >        of an outstanding context fault
> > > > >  '1' - Process all subsequent transactions independently of any
> > > > >        outstanding context fault.
> > > > >
> > > > > Since we don't enable CFCFG (stall) the behavior of terminating
> > > > > other transactions makes sense.  And is probably not what we want
> > > > > (and definately not what we want for GPU).
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > > > > ---
> > > > > So I hit this issue a long time back on 820 (msm8996) and at the
> > > > > time I solved it with a patch that enabled CFCFG.  And it resurfaced
> > > > > more recently on sdm845.  But at the time CFCFG was rejected, iirc
> > > > > because of concern that it would cause problems on other non-qcom
> > > > > arm smmu implementations.  And I think I forgot to send this version
> > > > > of the solution.
> > > > >
> > > > > If enabling HUPCF is anticipated to cause problems on other ARM
> > > > > SMMU implementations, I think I can come up with a variant of this
> > > > > patch which conditionally enables it for snapdragon.
> > > > >
> > > > > Either way, I'd really like to get some variant of this fix merged
> > > > > (and probably it would be a good idea for stable kernel branches
> > > > > too), since current behaviour with the GPU means faults turn into
> > > > > a fantastic cascade of fail.
> > > >
> > > > Can you describe how this fantastic cascade of fail improves with this
> > > > patch, please? If you're getting context faults then something has already
> > > > gone horribly wrong, so I'm trying to work out how this improves things.
> > > >
> > >
> > > There are plenty of cases where getting iommu faults with a GPU is
> > > "normal", or at least not something the kernel or even GL driver can
> > > control.
> >
> > Such as? All the mainline driver does is print a diagnostic and clear the
> > fault, which doesn't seem generally useful.
> 
> it is useful to debug the fault ;-)
> 
> Although eventually we'll want to be able to do more than that, like
> have the fault trigger bringing in pages of a mmap'd file and that
> sort of thing.

Right, and feels very strange to me if we have this bit set because it
means that your fault is no longer synchronous and therefore diverges
from the fault model that you get from the CPU, where you certainly
wouldn't expect stores appearing in the program after a faulting load to
be visible in memory. However, thinking harder about it, I suppose we're
already in a situation where the translations are handled out of order
in the absence of barriers, so maybe it's not the end of the world.

Could you dump the FSR value that you see in the fault handler, please?
>From my reading of the architecture spec, as long as clear all of the
fault bits in the irq handler, then your machine shouldn't die like it
does with HUPCFG=CFCFG=0..

> > > With this patch, you still get the iommu fault, but it doesn't cause
> > > the gpu to crash.  But without it, other memory accesses in flight
> > > while the fault occurs, like the GPU command-processor reading further
> > > ahead in the cmdstream to setup next draw, would return zero's,
> > > causing the GPU to crash or get into a bad state.
> >
> > I get that part, but I don't understand why we're seeing faults in the first
> > place and I worry that this patch is just the tip of the iceberg. It's also
> > not clear that processing subsequent transactions is always the right thing
> > to do in a world where we actually want to report (and handle) synchronous
> > faults from devices.
> 
> Sure, it is a bug.. but it can be an application bug that is not
> something the userspace GL driver or kernel could do anything about.
> We shouldn't let this kill the GPU.  If the application didn't have
> this much control, we wouldn't need an IOMMU in the first place[1].
> With opencl compute, the userspace controlled shader has full blown
> pointers to GPU memory.
> 
> And even in cases where it is a userspace GL driver bug, having some
> robustness to not completely kill the GPU makes debugging much easier.
> Something I do a lot when bringing up support for a new generation of
> GPU.
> 
> I'm having a hard time understanding your objection to this.
> Returning zero's for non-faulting transactions is a *really bad idea*.

Wait -- who said anything about returning zeroes? Where does that behaviour
appear in the architecture?

Will