* [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview. @ 2015-01-13 20:46 Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke 0 siblings, 2 replies; 5+ messages in thread From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw) To: intel-gfx Found by reading the HIZ_CHICKEN documentation. Improves performance in a HiZ microbenchmark by around 50%. Improves performance in OglZBuffer by around 18%. Thanks to Chris Wilson for helping me figure out where to put this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 6 insertions(+) The same as v1 but resent with Ville's R-b, mostly since it's in a series with the next two patches, which did change. diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0f32fd1a..a39bb03 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5202,6 +5202,9 @@ enum punit_power_well { #define COMMON_SLICE_CHICKEN2 0x7014 # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (1<<0) +#define HIZ_CHICKEN 0x7018 +# define CHV_HZ_8X8_MODE_IN_1X (1<<15) + #define GEN7_L3SQCREG1 0xB010 #define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 12a36f0..dabc1d8 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -836,6 +836,9 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) HDC_FORCE_NON_COHERENT | HDC_DONOT_FETCH_MEM_WHEN_MASKED); + /* Improve HiZ throughput on CHV. */ + WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); + return 0; } -- 2.2.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell. 2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke @ 2015-01-13 20:46 ` Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke 1 sibling, 0 replies; 5+ messages in thread From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw) To: intel-gfx This is an important optimization for avoiding read-after-write (RAW) stalls in the HiZ buffer. Certain workloads would run very slowly with HiZ enabled, but run much faster with the "hiz=false" driconf option. With this patch, they run at full speed even with HiZ. Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e (Iris Pro 6200). Thanks to Jesse Barnes and Ben Widawsky for their help in tracking this down. Thanks to Chris Wilson for showing me the new workarounds system. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> --- drivers/gpu/drm/i915/intel_ringbuffer.c | 10 ++++++++++ 1 file changed, 10 insertions(+) Split, as requested by Ben. Fix the thankyous. diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index dabc1d8..0df15a4 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring) HDC_DONOT_FETCH_MEM_WHEN_MASKED | (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0)); + /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0: + * "The Hierarchical Z RAW Stall Optimization allows non-overlapping + * polygons in the same 8x4 pixel/sample area to be processed without + * stalling waiting for the earlier ones to write to Hierarchical Z + * buffer." + * + * This optimization is off by default for Broadwell; turn it on. + */ + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); + /* Wa4x4STCOptimizationDisable:bdw */ WA_SET_BIT_MASKED(CACHE_MODE_1, GEN8_4x4_STC_OPTIMIZATION_DISABLE); -- 2.2.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview. 2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke @ 2015-01-13 20:46 ` Kenneth Graunke 2015-01-15 6:07 ` shuang.he 2015-01-16 11:35 ` Ville Syrjälä 1 sibling, 2 replies; 5+ messages in thread From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw) To: intel-gfx This is an important optimization for avoiding read-after-write (RAW) stalls in the HiZ buffer. Certain workloads would run very slowly with HiZ enabled, but run much faster with the "hiz=false" driconf option. With this patch, they run at full speed even with HiZ. Increases performance in OglVSInstancing by about 2.7x on Braswell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> --- drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++ 1 file changed, 5 insertions(+) Split, as requested by Ben. diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 0df15a4..23020d6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) HDC_FORCE_NON_COHERENT | HDC_DONOT_FETCH_MEM_WHEN_MASKED); + /* According to the CACHE_MODE_0 default value documentation, some + * CHV platforms disable this optimization by default. Turn it on. + */ + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); + /* Improve HiZ throughput on CHV. */ WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); -- 2.2.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview. 2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke @ 2015-01-15 6:07 ` shuang.he 2015-01-16 11:35 ` Ville Syrjälä 1 sibling, 0 replies; 5+ messages in thread From: shuang.he @ 2015-01-15 6:07 UTC (permalink / raw) To: shuang.he, ethan.gao, intel-gfx, kenneth Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 5578 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV -1 353/353 352/353 ILK 355/355 355/355 SNB 400/422 400/422 IVB 487/487 487/487 BYT 296/296 296/296 HSW +22 486/508 508/508 BDW -1 402/402 401/402 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied *PNV igt_gen3_render_linear_blits PASS(3, M25M23) CRASH(1, M23) HSW igt_kms_cursor_crc_cursor-size-change NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_kms_fence_pin_leak NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_kms_flip_event_leak NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_kms_mmio_vs_cs_flip_setcrtc_vs_cs_flip NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_kms_mmio_vs_cs_flip_setplane_vs_cs_flip NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_lpsp_non-edp NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_cursor NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_cursor-dpms NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_dpms-mode-unset-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_dpms-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_drm-resources-equal NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_fences NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_fences-dpms NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_gem-execbuf NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_gem-mmap-cpu NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_gem-mmap-gtt NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_gem-pread NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_i2c NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_modeset-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_modeset-non-lpsp-stress-no-wait NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_pci-d3-state NSPT(1, M40)PASS(2, M20) PASS(1, M20) HSW igt_pm_rpm_rte NSPT(1, M40)PASS(2, M20) PASS(1, M20) *BDW igt_gem_concurrent_blit_gtt-rcs-early-read-interruptible PASS(5, M30M28) DMESG_WARN(1, M28) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview. 2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke 2015-01-15 6:07 ` shuang.he @ 2015-01-16 11:35 ` Ville Syrjälä 1 sibling, 0 replies; 5+ messages in thread From: Ville Syrjälä @ 2015-01-16 11:35 UTC (permalink / raw) To: Kenneth Graunke; +Cc: intel-gfx On Tue, Jan 13, 2015 at 12:46:53PM -0800, Kenneth Graunke wrote: > This is an important optimization for avoiding read-after-write (RAW) > stalls in the HiZ buffer. Certain workloads would run very slowly with > HiZ enabled, but run much faster with the "hiz=false" driconf option. > With this patch, they run at full speed even with HiZ. > > Increases performance in OglVSInstancing by about 2.7x on Braswell. > > Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> also for the remaining two patches. > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++ > 1 file changed, 5 insertions(+) > > Split, as requested by Ben. > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 0df15a4..23020d6 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) > HDC_FORCE_NON_COHERENT | > HDC_DONOT_FETCH_MEM_WHEN_MASKED); > > + /* According to the CACHE_MODE_0 default value documentation, some > + * CHV platforms disable this optimization by default. Turn it on. > + */ > + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); > + > /* Improve HiZ throughput on CHV. */ > WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); > > -- > 2.2.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-16 11:35 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke 2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke 2015-01-15 6:07 ` shuang.he 2015-01-16 11:35 ` Ville Syrjälä
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox