From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kenneth Graunke Subject: Re: [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8. Date: Sun, 11 Jan 2015 16:05:25 -0800 Message-ID: <3086167.N74lbW7sKX@vakarian> References: <1420944289-832-1-git-send-email-kenneth@whitecape.org> <20150111214941.GA2376@bwidawsk.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2113053376==" Return-path: In-Reply-To: <20150111214941.GA2376@bwidawsk.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Ben Widawsky Cc: mesa-dev@lists.freedesktop.org, intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org --===============2113053376== Content-Type: multipart/signed; boundary="nextPart3021001.dOWbumBm1F"; micalg="pgp-sha256"; protocol="application/pgp-signature" --nextPart3021001.dOWbumBm1F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" On Sunday, January 11, 2015 01:49:41 PM Ben Widawsky wrote: > On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote: > > This is an important optimization for avoiding read-after-write (RA= W) > > stalls in the HiZ buffer. Certain workloads would run very slowly = with > > HiZ enabled, but run much faster with the "hiz=3Dfalse" driconf opt= ion. > > With this patch, they run at full speed even with HiZ. > >=20 > > Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e > > (Iris Pro 6200). > >=20 > > Thanks to Jesse Barnes for finding this missing bit! > > Thanks to Chris Wilson for helping me find where to set it. > >=20 > > Signed-off-by: Kenneth Graunke > > Cc: Jesse Barnes > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > >=20 > > Here's an alternate patch which implements the workaround in the ke= rnel > > instead of Mesa. It's probably better to do it there, since the ke= rnel > > does it on Haswell already. > >=20 > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/= drm/i915/intel_ringbuffer.c > > index dabc1d8..23020d6 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_e= ngine_cs *ring) > > =09=09=09 HDC_DONOT_FETCH_MEM_WHEN_MASKED | > > =09=09=09 (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0)); > > =20 > > +=09/* From the Haswell PRM, Command Reference: Registers, CACHE_MO= DE_0: > > +=09 * "The Hierarchical Z RAW Stall Optimization allows non-overla= pping > > +=09 * polygons in the same 8x4 pixel/sample area to be processed = without > > +=09 * stalling waiting for the earlier ones to write to Hierarchi= cal Z > > +=09 * buffer." > > +=09 * > > +=09 * This optimization is off by default for Broadwell; turn it o= n. > > +=09 */ > > +=09WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE)= ; > > + > > =09/* Wa4x4STCOptimizationDisable:bdw */ > > =09WA_SET_BIT_MASKED(CACHE_MODE_1, > > =09=09=09 GEN8_4x4_STC_OPTIMIZATION_DISABLE); > > @@ -836,6 +846,11 @@ static int chv_init_workarounds(struct intel_e= ngine_cs *ring) > > =09=09=09 HDC_FORCE_NON_COHERENT | > > =09=09=09 HDC_DONOT_FETCH_MEM_WHEN_MASKED); > > =20 > > +=09/* According to the CACHE_MODE_0 default value documentation, s= ome > > +=09 * CHV platforms disable this optimization by default. Turn it= on. > > +=09 */ > > +=09WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE)= ; > > + > > =09/* Improve HiZ throughput on CHV. */ > > =09WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); > > =20 >=20 > I think you should do this as two separate patches, 1 per platform. F= or the BSW > patch (given that I had the same functionality in the kernel patch I = asked you > to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my ker= nel patch > which we can use for the commit): > Signed-off-by: Ben Widawsky Huh, I don't recall seeing that kernel patch. Sorry. I guess I'll spl= it it and resubmit... > I haven't looked at Broadwell docs, so I'll let someone else take car= e of that. >=20 > I don't know if I agree with Chris that we should call these in the w= orkaround > section, but whatever. init_clock_gating is equally sucky. init_clock_gating doesn't work. The register writes don't stick and th= ey have no effect at all. Setting them here makes them actually take effect in= the context. =2D-Ken --nextPart3021001.dOWbumBm1F Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUsw/FAAoJEFtb2gcdScw4r8IP/jsofd5qdiMf+b7xumLLgZfe 6jh9RSJU9lT8SypOV+OPrrmULY0dXvws2ORLXrNC293/8ghFOE/bm2ziTDGjzY23 /pc3ksY9Typl+7l1qrMq9WKTOQofHZGe/oiWmAYqwVQD4tygoPBNUrFEMd+MwS/z 8tPQukONMaS6TiIVrHgnVyiXbJJogoJuipb5olDQsAYjCv/zqP3gSB9l6XNuhQlj 13PzA0wQ7QD3N3ZjEMET35Y01o21T0ta6aU7bmrd14cWSy35fvXdXn1xcso0CoTd HnHtVBRxv4No+2rNKkDt4/9HyigGqweBfEBvMmP51Hv1f3KdhFl/jPuAwjwSYbwP +beK35mVAGYYmGtx+in+ueALq/UKz8XYvqM+dCNoE8uAFcq82CmoUvv38OE/KCy6 myrVMp/AjGRIZrT/xKgcA92mAXi5YuJCGuXp2IlEoamHeTtwOnGf56Rcd+Ym91rj oXFzj29WMGYBL13gCUcjhfL18bCPhcBLYY3r/xgEkr+jYbosukWDhlJMOrb8v2Qv DIKjK3wG4fmz1GT9ZEA7L7GTuhFdFMYlw01SUXDmxVeMQFKijydQIJmNlwewtFrb FOeL+oM0QT4TjT/9/H+mTxazysqrx9ZAelrfF+4TthuLUkTKUzQUPyUCjAckPbb/ bCvY2aMVNvoG8kFWiYTY =Xgxl -----END PGP SIGNATURE----- --nextPart3021001.dOWbumBm1F-- --===============2113053376== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK --===============2113053376==--