From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>, Francisco Jerez <currojerez@riseup.net>
Cc: Jani Nikula <jani.nikula@intel.com>,
intel-gfx@lists.freedesktop.org,
Eero Tamminen <eero.t.tamminen@intel.com>,
beignet@lists.freedesktop.org
Subject: Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
Date: Wed, 11 Jan 2017 14:07:37 +0200 [thread overview]
Message-ID: <877f61er5i.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20170111081734.77p2iq6wmt7nekza@phenom.ffwll.local>
Daniel Vetter <daniel@ffwll.ch> writes:
> On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> disabling an L3SQ optimization that has huge performance implications
>> and is unlikely to be necessary for the correct functioning of usual
>> graphic workloads. Userspace is free to re-enable the workaround on
>> demand, and is generally in a better position to determine whether the
>> workaround is necessary than the DRM is (e.g. only during the
>> execution of compute kernels that rely on both L3 fences and HDC R/W
>> requests).
>>
>> The same workaround seems to apply to BDW (at least to production
>> stepping G1) and SKL as well (the internal workaround database claims
>> that it does for all steppings, while the BSpec workaround table only
>> mentions pre-production steppings), but the DRM doesn't do anything
>> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> when it sees fit. Do the same on KBL platforms.
>>
>> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> This is followed by a regression of 35% and 10% respectively for the
>> same benchmarks and platform caused by my recent patch series
>> switching userspace to use the dataport constant cache instead of the
>> sampler to implement uniform pull constant loads, which caused us to
>> hit more heavily the L3 cache (and on platforms other than KBL had the
>> opposite effect of improving performance of the same two benchmarks).
>> The overall effect on KBL of this change combined with the recent
>> userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf
>> was affected by the constant cache changes (though it improved as it
>> did on other platforms rather than regressing), but is not
>> significantly affected by this patch (with statistical significance of
>> 5% and sample size 20).
>>
>> v2: Drop some more code to avoid unused variable warning.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> Cc: Jani Nikula <jani.nikula@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: beignet@lists.freedesktop.org
>
> Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> for compute kernels? Are the patches for mesa compute/beignet
> ready&reviewed?
This is explicit setting on kbl/E0 only. So one could argue
that unless they filter based on PCI-IDs, things would already
blow up across the skl/kbl population, if they forgot
to set it. The whitelisting is in place and looks sane
so this E0 exception is a wart that got in by me reading wa
database slavishly without thinking.
-Mika
> -Daniel
>
>> ---
>> drivers/gpu/drm/i915/intel_lrc.c | 10 ----------
>> drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
>> 2 files changed, 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 6db246a..656e0a3 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
>> uint32_t *batch,
>> uint32_t index)
>> {
>> - struct drm_i915_private *dev_priv = engine->i915;
>> uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
>>
>> - /*
>> - * WaDisableLSQCROPERFforOCL:kbl
>> - * This WA is implemented in skl_init_clock_gating() but since
>> - * this batch updates GEN8_L3SQCREG4 with default value we need to
>> - * set this bit here to retain the WA during flush.
>> - */
>> - if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> - l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
>> -
>> wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
>> MI_SRM_LRM_GLOBAL_GTT));
>> wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 0971ac3..7cb2ab4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
>> WA_SET_BIT_MASKED(HDC_CHICKEN0,
>> HDC_FENCE_DEST_SLM_DISABLE);
>>
>> - /* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
>> - * involving this register should also be added to WA batch as required.
>> - */
>> - if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> - /* WaDisableLSQCROPERFforOCL:kbl */
>> - I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
>> - GEN8_LQSC_RO_PERF_DIS);
>> -
>> /* WaToEnableHwFixForPushConstHWBug:kbl */
>> if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
>> WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
>> --
>> 2.10.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-01-11 12:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-08 7:31 [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
2017-01-08 11:57 ` kbuild test robot
2017-01-09 21:07 ` [PATCHv2] " Francisco Jerez
2017-01-11 8:17 ` [Intel-gfx] " Daniel Vetter
2017-01-11 12:07 ` Mika Kuoppala [this message]
2017-01-11 12:24 ` Chris Wilson
2017-01-11 12:40 ` Mika Kuoppala
2017-01-11 13:24 ` [Intel-gfx] " Daniel Vetter
2017-01-12 0:05 ` Francisco Jerez
2017-01-12 13:58 ` Mika Kuoppala
2017-01-12 0:03 ` [Intel-gfx] " Francisco Jerez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877f61er5i.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=beignet@lists.freedesktop.org \
--cc=currojerez@riseup.net \
--cc=daniel@ffwll.ch \
--cc=eero.t.tamminen@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).