public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview.
@ 2015-01-13 20:46 Kenneth Graunke
  2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
  2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
  0 siblings, 2 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
  To: intel-gfx

Found by reading the HIZ_CHICKEN documentation.

Improves performance in a HiZ microbenchmark by around 50%.
Improves performance in OglZBuffer by around 18%.

Thanks to Chris Wilson for helping me figure out where to put this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         | 3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 2 files changed, 6 insertions(+)

The same as v1 but resent with Ville's R-b, mostly since it's in a series
with the next two patches, which did change.

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f32fd1a..a39bb03 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5202,6 +5202,9 @@ enum punit_power_well {
 #define COMMON_SLICE_CHICKEN2			0x7014
 # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE	(1<<0)
 
+#define HIZ_CHICKEN				0x7018
+# define CHV_HZ_8X8_MODE_IN_1X			(1<<15)
+
 #define GEN7_L3SQCREG1				0xB010
 #define  VLV_B0_WA_L3SQCREG1_VALUE		0x00D30000
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 12a36f0..dabc1d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -836,6 +836,9 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
 			  HDC_FORCE_NON_COHERENT |
 			  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+	/* Improve HiZ throughput on CHV. */
+	WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
+
 	return 0;
 }
 
-- 
2.2.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell.
  2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
@ 2015-01-13 20:46 ` Kenneth Graunke
  2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
  1 sibling, 0 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
  To: intel-gfx

This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.

Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).

Thanks to Jesse Barnes and Ben Widawsky for their help in tracking this
down.  Thanks to Chris Wilson for showing me the new workarounds system.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Split, as requested by Ben.  Fix the thankyous.

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dabc1d8..0df15a4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring)
 			  HDC_DONOT_FETCH_MEM_WHEN_MASKED |
 			  (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
 
+	/* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0:
+	 * "The Hierarchical Z RAW Stall Optimization allows non-overlapping
+	 *  polygons in the same 8x4 pixel/sample area to be processed without
+	 *  stalling waiting for the earlier ones to write to Hierarchical Z
+	 *  buffer."
+	 *
+	 * This optimization is off by default for Broadwell; turn it on.
+	 */
+	WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
 	/* Wa4x4STCOptimizationDisable:bdw */
 	WA_SET_BIT_MASKED(CACHE_MODE_1,
 			  GEN8_4x4_STC_OPTIMIZATION_DISABLE);
-- 
2.2.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
  2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
  2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
@ 2015-01-13 20:46 ` Kenneth Graunke
  2015-01-15  6:07   ` shuang.he
  2015-01-16 11:35   ` Ville Syrjälä
  1 sibling, 2 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
  To: intel-gfx

This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.

Increases performance in OglVSInstancing by about 2.7x on Braswell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++
 1 file changed, 5 insertions(+)

Split, as requested by Ben.

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0df15a4..23020d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
 			  HDC_FORCE_NON_COHERENT |
 			  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+	/* According to the CACHE_MODE_0 default value documentation, some
+	 * CHV platforms disable this optimization by default.  Turn it on.
+	 */
+	WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
 	/* Improve HiZ throughput on CHV. */
 	WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
 
-- 
2.2.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
  2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
@ 2015-01-15  6:07   ` shuang.he
  2015-01-16 11:35   ` Ville Syrjälä
  1 sibling, 0 replies; 5+ messages in thread
From: shuang.he @ 2015-01-15  6:07 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, kenneth

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5578
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -1              353/353              352/353
ILK                                  355/355              355/355
SNB                                  400/422              400/422
IVB                                  487/487              487/487
BYT                                  296/296              296/296
HSW              +22                 486/508              508/508
BDW                 -1              402/402              401/402
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gen3_render_linear_blits      PASS(3, M25M23)      CRASH(1, M23)
 HSW  igt_kms_cursor_crc_cursor-size-change      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_kms_fence_pin_leak      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_kms_flip_event_leak      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_kms_mmio_vs_cs_flip_setcrtc_vs_cs_flip      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_kms_mmio_vs_cs_flip_setplane_vs_cs_flip      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_lpsp_non-edp      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_cursor      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_cursor-dpms      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_dpms-mode-unset-non-lpsp      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_dpms-non-lpsp      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_drm-resources-equal      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_fences      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_fences-dpms      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-execbuf      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-mmap-cpu      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-mmap-gtt      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-pread      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_i2c      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_modeset-non-lpsp      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_modeset-non-lpsp-stress-no-wait      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_pci-d3-state      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_rte      NSPT(1, M40)PASS(2, M20)      PASS(1, M20)
*BDW  igt_gem_concurrent_blit_gtt-rcs-early-read-interruptible      PASS(5, M30M28)      DMESG_WARN(1, M28)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
  2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
  2015-01-15  6:07   ` shuang.he
@ 2015-01-16 11:35   ` Ville Syrjälä
  1 sibling, 0 replies; 5+ messages in thread
From: Ville Syrjälä @ 2015-01-16 11:35 UTC (permalink / raw)
  To: Kenneth Graunke; +Cc: intel-gfx

On Tue, Jan 13, 2015 at 12:46:53PM -0800, Kenneth Graunke wrote:
> This is an important optimization for avoiding read-after-write (RAW)
> stalls in the HiZ buffer.  Certain workloads would run very slowly with
> HiZ enabled, but run much faster with the "hiz=false" driconf option.
> With this patch, they run at full speed even with HiZ.
> 
> Increases performance in OglVSInstancing by about 2.7x on Braswell.
> 
> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
also for the remaining two patches.

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> Split, as requested by Ben.
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0df15a4..23020d6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
>  			  HDC_FORCE_NON_COHERENT |
>  			  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
>  
> +	/* According to the CACHE_MODE_0 default value documentation, some
> +	 * CHV platforms disable this optimization by default.  Turn it on.
> +	 */
> +	WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
> +
>  	/* Improve HiZ throughput on CHV. */
>  	WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
>  
> -- 
> 2.2.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-16 11:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
2015-01-15  6:07   ` shuang.he
2015-01-16 11:35   ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox