* [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview.
@ 2015-01-13 20:46 Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
0 siblings, 2 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
To: intel-gfx
Found by reading the HIZ_CHICKEN documentation.
Improves performance in a HiZ microbenchmark by around 50%.
Improves performance in OglZBuffer by around 18%.
Thanks to Chris Wilson for helping me figure out where to put this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_reg.h | 3 +++
drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
2 files changed, 6 insertions(+)
The same as v1 but resent with Ville's R-b, mostly since it's in a series
with the next two patches, which did change.
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f32fd1a..a39bb03 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5202,6 +5202,9 @@ enum punit_power_well {
#define COMMON_SLICE_CHICKEN2 0x7014
# define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (1<<0)
+#define HIZ_CHICKEN 0x7018
+# define CHV_HZ_8X8_MODE_IN_1X (1<<15)
+
#define GEN7_L3SQCREG1 0xB010
#define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 12a36f0..dabc1d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -836,6 +836,9 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
HDC_FORCE_NON_COHERENT |
HDC_DONOT_FETCH_MEM_WHEN_MASKED);
+ /* Improve HiZ throughput on CHV. */
+ WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
+
return 0;
}
--
2.2.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell.
2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
@ 2015-01-13 20:46 ` Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
1 sibling, 0 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
To: intel-gfx
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer. Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.
Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).
Thanks to Jesse Barnes and Ben Widawsky for their help in tracking this
down. Thanks to Chris Wilson for showing me the new workarounds system.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 10 ++++++++++
1 file changed, 10 insertions(+)
Split, as requested by Ben. Fix the thankyous.
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dabc1d8..0df15a4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring)
HDC_DONOT_FETCH_MEM_WHEN_MASKED |
(IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
+ /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0:
+ * "The Hierarchical Z RAW Stall Optimization allows non-overlapping
+ * polygons in the same 8x4 pixel/sample area to be processed without
+ * stalling waiting for the earlier ones to write to Hierarchical Z
+ * buffer."
+ *
+ * This optimization is off by default for Broadwell; turn it on.
+ */
+ WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Wa4x4STCOptimizationDisable:bdw */
WA_SET_BIT_MASKED(CACHE_MODE_1,
GEN8_4x4_STC_OPTIMIZATION_DISABLE);
--
2.2.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
@ 2015-01-13 20:46 ` Kenneth Graunke
2015-01-15 6:07 ` shuang.he
2015-01-16 11:35 ` Ville Syrjälä
1 sibling, 2 replies; 5+ messages in thread
From: Kenneth Graunke @ 2015-01-13 20:46 UTC (permalink / raw)
To: intel-gfx
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer. Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.
Increases performance in OglVSInstancing by about 2.7x on Braswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++
1 file changed, 5 insertions(+)
Split, as requested by Ben.
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0df15a4..23020d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
HDC_FORCE_NON_COHERENT |
HDC_DONOT_FETCH_MEM_WHEN_MASKED);
+ /* According to the CACHE_MODE_0 default value documentation, some
+ * CHV platforms disable this optimization by default. Turn it on.
+ */
+ WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Improve HiZ throughput on CHV. */
WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
--
2.2.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
@ 2015-01-15 6:07 ` shuang.he
2015-01-16 11:35 ` Ville Syrjälä
1 sibling, 0 replies; 5+ messages in thread
From: shuang.he @ 2015-01-15 6:07 UTC (permalink / raw)
To: shuang.he, ethan.gao, intel-gfx, kenneth
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5578
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV -1 353/353 352/353
ILK 355/355 355/355
SNB 400/422 400/422
IVB 487/487 487/487
BYT 296/296 296/296
HSW +22 486/508 508/508
BDW -1 402/402 401/402
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*PNV igt_gen3_render_linear_blits PASS(3, M25M23) CRASH(1, M23)
HSW igt_kms_cursor_crc_cursor-size-change NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_kms_fence_pin_leak NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_kms_flip_event_leak NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_kms_mmio_vs_cs_flip_setcrtc_vs_cs_flip NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_kms_mmio_vs_cs_flip_setplane_vs_cs_flip NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_lpsp_non-edp NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_cursor NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_cursor-dpms NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_dpms-mode-unset-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_dpms-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_drm-resources-equal NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_fences NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_fences-dpms NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_gem-execbuf NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_gem-mmap-cpu NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_gem-mmap-gtt NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_gem-pread NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_i2c NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_modeset-non-lpsp NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_modeset-non-lpsp-stress-no-wait NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_pci-d3-state NSPT(1, M40)PASS(2, M20) PASS(1, M20)
HSW igt_pm_rpm_rte NSPT(1, M40)PASS(2, M20) PASS(1, M20)
*BDW igt_gem_concurrent_blit_gtt-rcs-early-read-interruptible PASS(5, M30M28) DMESG_WARN(1, M28)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
2015-01-15 6:07 ` shuang.he
@ 2015-01-16 11:35 ` Ville Syrjälä
1 sibling, 0 replies; 5+ messages in thread
From: Ville Syrjälä @ 2015-01-16 11:35 UTC (permalink / raw)
To: Kenneth Graunke; +Cc: intel-gfx
On Tue, Jan 13, 2015 at 12:46:53PM -0800, Kenneth Graunke wrote:
> This is an important optimization for avoiding read-after-write (RAW)
> stalls in the HiZ buffer. Certain workloads would run very slowly with
> HiZ enabled, but run much faster with the "hiz=false" driconf option.
> With this patch, they run at full speed even with HiZ.
>
> Increases performance in OglVSInstancing by about 2.7x on Braswell.
>
> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
also for the remaining two patches.
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> Split, as requested by Ben.
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0df15a4..23020d6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -846,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring)
> HDC_FORCE_NON_COHERENT |
> HDC_DONOT_FETCH_MEM_WHEN_MASKED);
>
> + /* According to the CACHE_MODE_0 default value documentation, some
> + * CHV platforms disable this optimization by default. Turn it on.
> + */
> + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
> +
> /* Improve HiZ throughput on CHV. */
> WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
>
> --
> 2.2.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-16 11:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-13 20:46 [PATCH v2 1/3] drm/i915: Improve HiZ throughput on Cherryview Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 2/3] drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell Kenneth Graunke
2015-01-13 20:46 ` [PATCH v2 3/3] drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview Kenneth Graunke
2015-01-15 6:07 ` shuang.he
2015-01-16 11:35 ` Ville Syrjälä
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox