Re: [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings.

From: Francisco Jerez <currojerez@riseup.net>
To: Zhigang Gong <zhigang.gong@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, Zhigang Gong <zhigang.gong@intel.com>
Subject: Re: [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings.
Date: Tue, 06 Jan 2015 16:19:08 +0200	[thread overview]
Message-ID: <874ms4gffn.fsf@riseup.net> (raw)
In-Reply-To: <20150106080101.GF28045@ivb-gt2-rev4>

[-- Attachment #1.1.1: Type: text/plain, Size: 6426 bytes --]

Zhigang Gong <zhigang.gong@linux.intel.com> writes:

> On Mon, Jan 05, 2015 at 07:27:26PM +0200, Francisco Jerez wrote:
>> Zhigang Gong <zhigang.gong@linux.intel.com> writes:
>> 
>> > On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
>> >> Zhigang Gong <zhigang.gong@intel.com> writes:
>> >> 
>> >> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
>> >> > move of cacheable global atomics to L3 is needed for GT3 D
>> >> > stepping.
>> >> >
>> >> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
>> >> > The atomics works fine in beignet. And it could get more than 10x performance
>> >> > improvement with some workload, for an example, the "splat" kernel in darktable,
>> >> > without this patch, it consumes 50 seconds in one large raw picture processing.
>> >> > But with this patch, the same process only takes less than 1 second.
>> >> >
>> >> 
>> >> I tried this already (on HSW GT2 D as well) and I don't think it's
>> >> enough to get L3 atomics working reliably.  Even though they did seem to
>> >> work OK at first glance I observed some corruption issues (e.g. atomic
>> >> writes not landing in system memory) when doing atomic writes to
>> >> contiguous (as in within the same cache-line) locations in memory.  The
>> >> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
>> >> mailing list some time ago exposes this IIRC, and probably a couple of
>> >> other tests too.
>> > Ok, I will find that case and have a try on my systems. I just tested all
>> > the atomic related OpenCL conformance test cases without any issues.
>
> I just found the patchset hasn't been accepted by piglit. I took a look at
> the source code. And realize that that shader will not work correctly if
> mesa doesn't allocate some DC in L3 space. Don't know whether you already
> allocated some DC when you did that test.
>

Of course, and it still fails no matter how you allocate the L3.

>> >
>> >> 
>> >> Also this change is going to cause an instant lock-up anytime Mesa uses
>> >> atomics because Mesa doesn't change the default L3 way allocation for
>> >> the DC, which turns out to be 0 on HSW.
>> >
>> > This is another issue, IMHO, if the application wants to use atomics,
>> > it's better to allocate some L3 space for DC. Otherwise, it could
>> > never leverage the "atomics in L3 feature". Based on my test,
>> > the performance impact as huge as more than 10x for some workloads.
>> >
>> Sure, but this change alone will cause a regression (an irrecoverable
>> system hang) with current releases of Mesa. 
>> 
>> I agree that Mesa should be fixed eventually to assign L3 space to the
>> DC for some workloads, but it doesn't seem like we have a satisfactory
>> API to do that right now -- The current mechanism used by Beignet
>> (poking the L3 control registers from the batch buffer) is slightly
>> concerning from a security point of view because it allows an arbitrary
>> userspace application to cause misrendering or impact the performance of
> I agree with you that the L3 configuration is very dangerous. It's better
> to do more checking in the kernel space and make sure the user space
> application will not crash the system via L3 configuration.
> And, if the kernel provide new API we will use it in beignet ASAP in beignet.
>
>> other clients (since the L3 config registers are not being saved and
>> restored as part of the context) and even crash the whole system.  It
> I found the L3 config registers are part of the IVB context, but not part
> of HSW's. So if two user applications want to use different L3 config,
> the user application need to do the config registers' store/restore manually.
> Is there a way to do this type of thing gracefully in KMD? The new API
> may need to consider the safely context switching regards to L3 config change.
>
Yeah, relying on userspace saving and restoring the L3 config values
sounds rather fragile to me, and may not work if e.g. a context hangs
and gets kicked out.  IMHO it would be preferrable to keep track of the
L3 config and save/restore it from the kernel on context switch.

>> also doesn't look like it could be made to work with the hardware
>> command checker.
>
> As to the hardware command checker:
> I just found with current nightly kernel, if we boot the kernel with
> "i915.enable_ppgtt=2", then we can configure the L3 related registers
> in user space. But the command checker still don't allow user space to
> chagne L3 config by default.
>

Ugh, apparently that's because the hardware command checker is not
enabled at all in that case...

> Thanks,
> Zhigang Gong.
>
>> 
>> > Thanks,
>> > Zhigang Gong. 
>> >
>> >> 
>> >> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
>> >> 
>> >> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
>> >> > ---
>> >> >  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>> >> >  1 file changed, 6 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> >> > index 7d99a9c..8a27802 100644
>> >> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> >> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> >> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>> >> >  
>> >> >  	ilk_init_lp_watermarks(dev);
>> >> >  
>> >> > -	/* L3 caching of data atomics doesn't work -- disable it. */
>> >> > -	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > -	I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > -		   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > +	if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
>> >> > +		/* L3 caching of data atomics doesn't work -- disable it. */
>> >> > +		I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > +		I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > +			   _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > +	}
>> >> >  
>> >> >  	/* This is required by WaCatErrorRejectionIssue:hsw */
>> >> >  	I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
>> >> > -- 
>> >> > 1.8.3.2
>> >
>> >
>> >
>> >
>> >> _______________________________________________
>> >> Intel-gfx mailing list
>> >> Intel-gfx@lists.freedesktop.org
>> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[-- Attachment #1.2: Type: application/pgp-signature, Size: 212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx