From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francisco Jerez Subject: Re: [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings. Date: Tue, 06 Jan 2015 16:19:08 +0200 Message-ID: <874ms4gffn.fsf@riseup.net> References: <1420419950-3135-1-git-send-email-zhigang.gong@intel.com> <87k312gc97.fsf@riseup.net> <20150105023637.GA28045@ivb-gt2-rev4> <87h9w5gmtd.fsf@riseup.net> <20150106080101.GF28045@ivb-gt2-rev4> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1550022966==" Return-path: Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) by gabe.freedesktop.org (Postfix) with ESMTP id CCAD589B98 for ; Tue, 6 Jan 2015 06:22:16 -0800 (PST) In-Reply-To: <20150106080101.GF28045@ivb-gt2-rev4> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Zhigang Gong Cc: intel-gfx@lists.freedesktop.org, Zhigang Gong List-Id: intel-gfx@lists.freedesktop.org --===============1550022966== Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Zhigang Gong writes: > On Mon, Jan 05, 2015 at 07:27:26PM +0200, Francisco Jerez wrote: >> Zhigang Gong writes: >>=20 >> > On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote: >> >> Zhigang Gong writes: >> >>=20 >> >> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable >> >> > move of cacheable global atomics to L3 is needed for GT3 D >> >> > stepping. >> >> > >> >> > I enabled it and tested it with HSW GT2 D stepping and GT3 E steppi= ng. >> >> > The atomics works fine in beignet. And it could get more than 10x p= erformance >> >> > improvement with some workload, for an example, the "splat" kernel = in darktable, >> >> > without this patch, it consumes 50 seconds in one large raw picture= processing. >> >> > But with this patch, the same process only takes less than 1 second. >> >> > >> >>=20 >> >> I tried this already (on HSW GT2 D as well) and I don't think it's >> >> enough to get L3 atomics working reliably. Even though they did seem= to >> >> work OK at first glance I observed some corruption issues (e.g. atomic >> >> writes not landing in system memory) when doing atomic writes to >> >> contiguous (as in within the same cache-line) locations in memory. T= he >> >> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit >> >> mailing list some time ago exposes this IIRC, and probably a couple of >> >> other tests too. >> > Ok, I will find that case and have a try on my systems. I just tested = all >> > the atomic related OpenCL conformance test cases without any issues. > > I just found the patchset hasn't been accepted by piglit. I took a look at > the source code. And realize that that shader will not work correctly if > mesa doesn't allocate some DC in L3 space. Don't know whether you already > allocated some DC when you did that test. > Of course, and it still fails no matter how you allocate the L3. >> > >> >>=20 >> >> Also this change is going to cause an instant lock-up anytime Mesa us= es >> >> atomics because Mesa doesn't change the default L3 way allocation for >> >> the DC, which turns out to be 0 on HSW. >> > >> > This is another issue, IMHO, if the application wants to use atomics, >> > it's better to allocate some L3 space for DC. Otherwise, it could >> > never leverage the "atomics in L3 feature". Based on my test, >> > the performance impact as huge as more than 10x for some workloads. >> > >> Sure, but this change alone will cause a regression (an irrecoverable >> system hang) with current releases of Mesa.=20 >>=20 >> I agree that Mesa should be fixed eventually to assign L3 space to the >> DC for some workloads, but it doesn't seem like we have a satisfactory >> API to do that right now -- The current mechanism used by Beignet >> (poking the L3 control registers from the batch buffer) is slightly >> concerning from a security point of view because it allows an arbitrary >> userspace application to cause misrendering or impact the performance of > I agree with you that the L3 configuration is very dangerous. It's better > to do more checking in the kernel space and make sure the user space > application will not crash the system via L3 configuration. > And, if the kernel provide new API we will use it in beignet ASAP in beig= net. > >> other clients (since the L3 config registers are not being saved and >> restored as part of the context) and even crash the whole system. It > I found the L3 config registers are part of the IVB context, but not part > of HSW's. So if two user applications want to use different L3 config, > the user application need to do the config registers' store/restore manua= lly. > Is there a way to do this type of thing gracefully in KMD? The new API > may need to consider the safely context switching regards to L3 config ch= ange. > Yeah, relying on userspace saving and restoring the L3 config values sounds rather fragile to me, and may not work if e.g. a context hangs and gets kicked out. IMHO it would be preferrable to keep track of the L3 config and save/restore it from the kernel on context switch. >> also doesn't look like it could be made to work with the hardware >> command checker. > > As to the hardware command checker: > I just found with current nightly kernel, if we boot the kernel with > "i915.enable_ppgtt=3D2", then we can configure the L3 related registers > in user space. But the command checker still don't allow user space to > chagne L3 config by default. > Ugh, apparently that's because the hardware command checker is not enabled at all in that case... > Thanks, > Zhigang Gong. > >>=20 >> > Thanks, >> > Zhigang Gong.=20 >> > >> >>=20 >> >> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571= .html >> >>=20 >> >> > Signed-off-by: Zhigang Gong >> >> > --- >> >> > drivers/gpu/drm/i915/intel_pm.c | 10 ++++++---- >> >> > 1 file changed, 6 insertions(+), 4 deletions(-) >> >> > >> >> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915= /intel_pm.c >> >> > index 7d99a9c..8a27802 100644 >> >> > --- a/drivers/gpu/drm/i915/intel_pm.c >> >> > +++ b/drivers/gpu/drm/i915/intel_pm.c >> >> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struc= t drm_device *dev) >> >> >=20=20 >> >> > ilk_init_lp_watermarks(dev); >> >> >=20=20 >> >> > - /* L3 caching of data atomics doesn't work -- disable it. */ >> >> > - I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); >> >> > - I915_WRITE(HSW_ROW_CHICKEN3, >> >> > - _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE= )); >> >> > + if (IS_HSW_GT3(dev) && dev->pdev->revision <=3D 6) { >> >> > + /* L3 caching of data atomics doesn't work -- disable it. */ >> >> > + I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); >> >> > + I915_WRITE(HSW_ROW_CHICKEN3, >> >> > + _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABL= E)); >> >> > + } >> >> >=20=20 >> >> > /* This is required by WaCatErrorRejectionIssue:hsw */ >> >> > I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, >> >> > --=20 >> >> > 1.8.3.2 >> > >> > >> > >> > >> >> _______________________________________________ >> >> Intel-gfx mailing list >> >> Intel-gfx@lists.freedesktop.org >> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx --=-=-=-- --==-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlSr7twACgkQg5k4nX1Sv1uynAD/e4JG3EB78CmLu+w6ciYsTOKk VfUUh9X3y/TI9V7n4q4A/RSJwW4Z1GwZa5mECEgyoZoHihcEA3wZ79GBHeMLReJK =Lx2V -----END PGP SIGNATURE----- --==-=-=-- --===============1550022966== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK --===============1550022966==--