* [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn @ 2014-08-26 13:44 Arun Siluvery 2014-08-26 13:44 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 2014-08-26 13:44 ` [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs Arun Siluvery 0 siblings, 2 replies; 25+ messages in thread From: Arun Siluvery @ 2014-08-26 13:44 UTC (permalink / raw) To: intel-gfx Workarounds for BDW are applied using LRIs during render ring initialization. Only those WA registers that are part of register state are initialized in this fn, remaining are still in its current place init_clock_gating() which are not affected by a gpu reset. I can send another patch where they can be moved to render ring init function but during testing I found their state doesn't change after reset. Arun Siluvery (2): drm/i915/bdw: Apply workarounds in render ring init function drm/i915/bdw: Export workaround data to debugfs drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 14 +++++ drivers/gpu/drm/i915/i915_gem_context.c | 6 ++ drivers/gpu/drm/i915/intel_pm.c | 48 --------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 102 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + 6 files changed, 164 insertions(+), 48 deletions(-) -- 2.0.4 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 13:44 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery @ 2014-08-26 13:44 ` Arun Siluvery 2014-08-26 14:37 ` Ville Syrjälä 2014-08-27 14:33 ` [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() ville.syrjala 2014-08-26 13:44 ` [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs Arun Siluvery 1 sibling, 2 replies; 25+ messages in thread From: Arun Siluvery @ 2014-08-26 13:44 UTC (permalink / raw) To: intel-gfx For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. v4: Use abstract name when exporting gen specific routines (Chris) For: VIZ-4092 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 79 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + 4 files changed, 87 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..0a9bb0e 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; to->legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring->last_context = to; if (uninitialized) { + if (ring->init_context) { + ret = ring->init_context(ring); + if (ret) + DRM_ERROR("ring init context: %d\n", ret); + } + ret = i915_gem_render_state_init(ring); if (ret) DRM_ERROR("init render state: %d\n", ret); } return 0; unpin_out: if (ring->id == RCS) i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..668acd9 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for - * pre-production hardware - */ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ for_each_pipe(pipe) { I915_WRITE(CHICKEN_PIPESL_1(pipe), I915_READ(CHICKEN_PIPESL_1(pipe)) | BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a - * workaround for for a possible hang in the unlikely event a TLB - * invalidation occurs during a PSD flush. - */ - I915_WRITE(HDC_CHICKEN0, - I915_READ(HDC_CHICKEN0) | - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); - /* WaVSRefCountFullforceMissDisable:bdw */ /* WaDSRefCountFullforceMissDisable:bdw */ I915_WRITE(GEN7_FF_THREAD_MODE, I915_READ(GEN7_FF_THREAD_MODE) & ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); - /* - * BSpec recommends 8x4 when MSAA is used, - * however in practice 16x4 seems fastest. - * - * Note that PS/WM thread counts depend on the WIZ hashing - * disable bit, which we don't touch here, but it's good - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). - */ - I915_WRITE(GEN7_GT_MODE, - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); - I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); /* WaDisableSDEUnitClockGating:bdw */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - - /* Wa4x4STCOptimizationDisable:bdw */ - I915_WRITE(CACHE_MODE_1, - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); } static void haswell_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; ilk_init_lp_watermarks(dev); /* L3 caching of data atomics doesn't work -- disable it. */ I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 13543f8..4146582 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -643,20 +643,98 @@ intel_init_pipe_control(struct intel_engine_cs *ring) return 0; err_unpin: i915_gem_object_ggtt_unpin(ring->scratch.obj); err_unref: drm_gem_object_unreference(&ring->scratch.obj->base); err: return ret; } +static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, + u32 addr, u32 value) +{ + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); + intel_ring_emit(ring, addr); + intel_ring_emit(ring, value); +} + +static int gen8_init_workarounds(struct intel_engine_cs *ring) +{ + int ret; + + /* + * workarounds applied in this fn are part of register state context, + * they need to be re-initialized followed by gpu reset, suspend/resume, + * module reload. + */ + + /* + * update the number of dwords required based on the + * actual number of workarounds applied + */ + ret = intel_ring_begin(ring, 24); + if (ret) + return ret; + + /* WaDisablePartialInstShootdown:bdw */ + /* WaDisableThreadStallDopClockGating:bdw */ + /* FIXME: Unclear whether we really need this on production bdw. */ + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE + | STALL_DOP_GATING_DISABLE)); + + /* WaDisableDopClockGating:bdw May not be needed for production */ + intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); + + /* + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for + * pre-production hardware + */ + intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3, + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS + | GEN8_SAMPLER_POWER_BYPASS_DIS)); + + intel_ring_emit_wa(ring, GEN7_HALF_SLICE_CHICKEN1, + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); + + intel_ring_emit_wa(ring, COMMON_SLICE_CHICKEN2, + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); + + /* Use Force Non-Coherent whenever executing a 3D context. This is a + * workaround for for a possible hang in the unlikely event a TLB + * invalidation occurs during a PSD flush. + */ + intel_ring_emit_wa(ring, HDC_CHICKEN0, + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); + + /* Wa4x4STCOptimizationDisable:bdw */ + intel_ring_emit_wa(ring, CACHE_MODE_1, + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); + + /* + * BSpec recommends 8x4 when MSAA is used, + * however in practice 16x4 seems fastest. + * + * Note that PS/WM thread counts depend on the WIZ hashing + * disable bit, which we don't touch here, but it's good + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). + */ + intel_ring_emit_wa(ring, GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); + + intel_ring_advance(ring); + + return 0; +} + static int init_render_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); if (ret) return ret; /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) @@ -2128,20 +2206,21 @@ int intel_init_render_ring_buffer(struct drm_device *dev) i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK); if (ret != 0) { drm_gem_object_unreference(&obj->base); DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n"); i915.semaphores = 0; } else dev_priv->semaphore_obj = obj; } } + ring->init_context = gen8_init_workarounds; ring->add_request = gen6_add_request; ring->flush = gen8_render_ring_flush; ring->irq_get = gen8_ring_get_irq; ring->irq_put = gen8_ring_put_irq; ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT; ring->get_seqno = gen6_ring_get_seqno; ring->set_seqno = ring_set_seqno; if (i915_semaphore_is_enabled(dev)) { WARN_ON(!dev_priv->semaphore_obj); ring->semaphore.sync_to = gen8_ring_sync; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9cbf7b0..96479c8 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -141,20 +141,22 @@ struct intel_engine_cs { struct intel_hw_status_page status_page; unsigned irq_refcount; /* protected by dev_priv->irq_lock */ u32 irq_enable_mask; /* bitmask to enable ring interrupt */ u32 trace_irq_seqno; bool __must_check (*irq_get)(struct intel_engine_cs *ring); void (*irq_put)(struct intel_engine_cs *ring); int (*init)(struct intel_engine_cs *ring); + int (*init_context)(struct intel_engine_cs *ring); + void (*write_tail)(struct intel_engine_cs *ring, u32 value); int __must_check (*flush)(struct intel_engine_cs *ring, u32 invalidate_domains, u32 flush_domains); int (*add_request)(struct intel_engine_cs *ring); /* Some chipsets are not quite as coherent as advertised and need * an expensive kick to force a true read of the up-to-date seqno. * However, the up-to-date seqno is not always required and the last * seen value is good enough. Note that the seqno will always be -- 2.0.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 13:44 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery @ 2014-08-26 14:37 ` Ville Syrjälä 2014-08-26 14:47 ` Siluvery, Arun 2014-08-26 16:06 ` Chris Wilson 2014-08-27 14:33 ` [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() ville.syrjala 1 sibling, 2 replies; 25+ messages in thread From: Ville Syrjälä @ 2014-08-26 14:37 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote: > For BDW workarounds are currently initialized in init_clock_gating() but > they are lost during reset, suspend/resume etc; this patch moves the WAs > that are part of register state context to render ring init fn otherwise > default context ends up with incorrect values as they don't get initialized > until init_clock_gating fn. > > v2: Add workarounds to golden render state > This method has its own issues, first of all this is different for > each gen and it is generated using a tool so adding new workaround > and mainitaining them across gens is not a straightforward process. > > v3: Use LRIs to emit these workarounds (Ville) > Instead of modifying the golden render state the same LRIs are > emitted from within the driver. > > v4: Use abstract name when exporting gen specific routines (Chris) > > For: VIZ-4092 > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> This one looks good as far as I'm concerned. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Do you plan to give other platforms the same treatment? We need at least CHV converted ASAP. But if you don't have a test machine I can take care of that myself. > --- > drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ > drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 79 +++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + > 4 files changed, 87 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 9683e62..0a9bb0e 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, > } > > uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; > to->legacy_hw_ctx.initialized = true; > > done: > i915_gem_context_reference(to); > ring->last_context = to; > > if (uninitialized) { > + if (ring->init_context) { > + ret = ring->init_context(ring); > + if (ret) > + DRM_ERROR("ring init context: %d\n", ret); > + } > + > ret = i915_gem_render_state_init(ring); > if (ret) > DRM_ERROR("init render state: %d\n", ret); > } > > return 0; > > unpin_out: > if (ring->id == RCS) > i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index c8f744c..668acd9 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) > struct drm_i915_private *dev_priv = dev->dev_private; > enum pipe pipe; > > I915_WRITE(WM3_LP_ILK, 0); > I915_WRITE(WM2_LP_ILK, 0); > I915_WRITE(WM1_LP_ILK, 0); > > /* FIXME(BDW): Check all the w/a, some might only apply to > * pre-production hw. */ > > - /* WaDisablePartialInstShootdown:bdw */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > - > - /* WaDisableThreadStallDopClockGating:bdw */ > - /* FIXME: Unclear whether we really need this on production bdw. */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > - /* > - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for > - * pre-production hardware > - */ > - I915_WRITE(HALF_SLICE_CHICKEN3, > - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); > - I915_WRITE(HALF_SLICE_CHICKEN3, > - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); > > I915_WRITE(_3D_CHICKEN3, > _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); > > - I915_WRITE(COMMON_SLICE_CHICKEN2, > - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); > - > - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, > - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); > - > - /* WaDisableDopClockGating:bdw May not be needed for production */ > - I915_WRITE(GEN7_ROW_CHICKEN2, > - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > /* WaSwitchSolVfFArbitrationPriority:bdw */ > I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); > > /* WaPsrDPAMaskVBlankInSRD:bdw */ > I915_WRITE(CHICKEN_PAR1_1, > I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); > > /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ > for_each_pipe(pipe) { > I915_WRITE(CHICKEN_PIPESL_1(pipe), > I915_READ(CHICKEN_PIPESL_1(pipe)) | > BDW_DPRS_MASK_VBLANK_SRD); > } > > - /* Use Force Non-Coherent whenever executing a 3D context. This is a > - * workaround for for a possible hang in the unlikely event a TLB > - * invalidation occurs during a PSD flush. > - */ > - I915_WRITE(HDC_CHICKEN0, > - I915_READ(HDC_CHICKEN0) | > - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); > - > /* WaVSRefCountFullforceMissDisable:bdw */ > /* WaDSRefCountFullforceMissDisable:bdw */ > I915_WRITE(GEN7_FF_THREAD_MODE, > I915_READ(GEN7_FF_THREAD_MODE) & > ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); > > - /* > - * BSpec recommends 8x4 when MSAA is used, > - * however in practice 16x4 seems fastest. > - * > - * Note that PS/WM thread counts depend on the WIZ hashing > - * disable bit, which we don't touch here, but it's good > - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). > - */ > - I915_WRITE(GEN7_GT_MODE, > - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > - > I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, > _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); > > /* WaDisableSDEUnitClockGating:bdw */ > I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | > GEN8_SDEUNIT_CLOCK_GATE_DISABLE); > - > - /* Wa4x4STCOptimizationDisable:bdw */ > - I915_WRITE(CACHE_MODE_1, > - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); > } > > static void haswell_init_clock_gating(struct drm_device *dev) > { > struct drm_i915_private *dev_priv = dev->dev_private; > > ilk_init_lp_watermarks(dev); > > /* L3 caching of data atomics doesn't work -- disable it. */ > I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 13543f8..4146582 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -643,20 +643,98 @@ intel_init_pipe_control(struct intel_engine_cs *ring) > return 0; > > err_unpin: > i915_gem_object_ggtt_unpin(ring->scratch.obj); > err_unref: > drm_gem_object_unreference(&ring->scratch.obj->base); > err: > return ret; > } > > +static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, > + u32 addr, u32 value) > +{ > + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); > + intel_ring_emit(ring, addr); > + intel_ring_emit(ring, value); > +} > + > +static int gen8_init_workarounds(struct intel_engine_cs *ring) > +{ > + int ret; > + > + /* > + * workarounds applied in this fn are part of register state context, > + * they need to be re-initialized followed by gpu reset, suspend/resume, > + * module reload. > + */ > + > + /* > + * update the number of dwords required based on the > + * actual number of workarounds applied > + */ > + ret = intel_ring_begin(ring, 24); > + if (ret) > + return ret; > + > + /* WaDisablePartialInstShootdown:bdw */ > + /* WaDisableThreadStallDopClockGating:bdw */ > + /* FIXME: Unclear whether we really need this on production bdw. */ > + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, > + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE > + | STALL_DOP_GATING_DISABLE)); > + > + /* WaDisableDopClockGating:bdw May not be needed for production */ > + intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, > + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > + > + /* > + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for > + * pre-production hardware > + */ > + intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3, > + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS > + | GEN8_SAMPLER_POWER_BYPASS_DIS)); > + > + intel_ring_emit_wa(ring, GEN7_HALF_SLICE_CHICKEN1, > + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); > + > + intel_ring_emit_wa(ring, COMMON_SLICE_CHICKEN2, > + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); > + > + /* Use Force Non-Coherent whenever executing a 3D context. This is a > + * workaround for for a possible hang in the unlikely event a TLB > + * invalidation occurs during a PSD flush. > + */ > + intel_ring_emit_wa(ring, HDC_CHICKEN0, > + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); > + > + /* Wa4x4STCOptimizationDisable:bdw */ > + intel_ring_emit_wa(ring, CACHE_MODE_1, > + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); > + > + /* > + * BSpec recommends 8x4 when MSAA is used, > + * however in practice 16x4 seems fastest. > + * > + * Note that PS/WM thread counts depend on the WIZ hashing > + * disable bit, which we don't touch here, but it's good > + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). > + */ > + intel_ring_emit_wa(ring, GEN7_GT_MODE, > + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > + > + intel_ring_advance(ring); > + > + return 0; > +} > + > static int init_render_ring(struct intel_engine_cs *ring) > { > struct drm_device *dev = ring->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > int ret = init_ring_common(ring); > if (ret) > return ret; > > /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ > if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) > @@ -2128,20 +2206,21 @@ int intel_init_render_ring_buffer(struct drm_device *dev) > i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); > ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK); > if (ret != 0) { > drm_gem_object_unreference(&obj->base); > DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n"); > i915.semaphores = 0; > } else > dev_priv->semaphore_obj = obj; > } > } > + ring->init_context = gen8_init_workarounds; > ring->add_request = gen6_add_request; > ring->flush = gen8_render_ring_flush; > ring->irq_get = gen8_ring_get_irq; > ring->irq_put = gen8_ring_put_irq; > ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT; > ring->get_seqno = gen6_ring_get_seqno; > ring->set_seqno = ring_set_seqno; > if (i915_semaphore_is_enabled(dev)) { > WARN_ON(!dev_priv->semaphore_obj); > ring->semaphore.sync_to = gen8_ring_sync; > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > index 9cbf7b0..96479c8 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -141,20 +141,22 @@ struct intel_engine_cs { > struct intel_hw_status_page status_page; > > unsigned irq_refcount; /* protected by dev_priv->irq_lock */ > u32 irq_enable_mask; /* bitmask to enable ring interrupt */ > u32 trace_irq_seqno; > bool __must_check (*irq_get)(struct intel_engine_cs *ring); > void (*irq_put)(struct intel_engine_cs *ring); > > int (*init)(struct intel_engine_cs *ring); > > + int (*init_context)(struct intel_engine_cs *ring); > + > void (*write_tail)(struct intel_engine_cs *ring, > u32 value); > int __must_check (*flush)(struct intel_engine_cs *ring, > u32 invalidate_domains, > u32 flush_domains); > int (*add_request)(struct intel_engine_cs *ring); > /* Some chipsets are not quite as coherent as advertised and need > * an expensive kick to force a true read of the up-to-date seqno. > * However, the up-to-date seqno is not always required and the last > * seen value is good enough. Note that the seqno will always be > -- > 2.0.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 14:37 ` Ville Syrjälä @ 2014-08-26 14:47 ` Siluvery, Arun 2014-08-26 15:32 ` Ville Syrjälä 2014-08-26 16:06 ` Chris Wilson 1 sibling, 1 reply; 25+ messages in thread From: Siluvery, Arun @ 2014-08-26 14:47 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On 26/08/2014 15:37, Ville Syrjälä wrote: > On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote: >> For BDW workarounds are currently initialized in init_clock_gating() but >> they are lost during reset, suspend/resume etc; this patch moves the WAs >> that are part of register state context to render ring init fn otherwise >> default context ends up with incorrect values as they don't get initialized >> until init_clock_gating fn. >> >> v2: Add workarounds to golden render state >> This method has its own issues, first of all this is different for >> each gen and it is generated using a tool so adding new workaround >> and mainitaining them across gens is not a straightforward process. >> >> v3: Use LRIs to emit these workarounds (Ville) >> Instead of modifying the golden render state the same LRIs are >> emitted from within the driver. >> >> v4: Use abstract name when exporting gen specific routines (Chris) >> >> For: VIZ-4092 >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > > This one looks good as far as I'm concerned. > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Do you plan to give other platforms the same treatment? We need at least > CHV converted ASAP. But if you don't have a test machine I can take care > of that myself. > I don't have hardware for CHV, I can borrow and try to do but since it is required at the earliest could you please modify it for CHV? regards Arun >> --- >> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ >> drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 79 +++++++++++++++++++++++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + >> 4 files changed, 87 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >> index 9683e62..0a9bb0e 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_context.c >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c >> @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, >> } >> >> uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; >> to->legacy_hw_ctx.initialized = true; >> >> done: >> i915_gem_context_reference(to); >> ring->last_context = to; >> >> if (uninitialized) { >> + if (ring->init_context) { >> + ret = ring->init_context(ring); >> + if (ret) >> + DRM_ERROR("ring init context: %d\n", ret); >> + } >> + >> ret = i915_gem_render_state_init(ring); >> if (ret) >> DRM_ERROR("init render state: %d\n", ret); >> } >> >> return 0; >> >> unpin_out: >> if (ring->id == RCS) >> i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >> index c8f744c..668acd9 100644 >> --- a/drivers/gpu/drm/i915/intel_pm.c >> +++ b/drivers/gpu/drm/i915/intel_pm.c >> @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) >> struct drm_i915_private *dev_priv = dev->dev_private; >> enum pipe pipe; >> >> I915_WRITE(WM3_LP_ILK, 0); >> I915_WRITE(WM2_LP_ILK, 0); >> I915_WRITE(WM1_LP_ILK, 0); >> >> /* FIXME(BDW): Check all the w/a, some might only apply to >> * pre-production hw. */ >> >> - /* WaDisablePartialInstShootdown:bdw */ >> - I915_WRITE(GEN8_ROW_CHICKEN, >> - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); >> - >> - /* WaDisableThreadStallDopClockGating:bdw */ >> - /* FIXME: Unclear whether we really need this on production bdw. */ >> - I915_WRITE(GEN8_ROW_CHICKEN, >> - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); >> >> - /* >> - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for >> - * pre-production hardware >> - */ >> - I915_WRITE(HALF_SLICE_CHICKEN3, >> - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); >> - I915_WRITE(HALF_SLICE_CHICKEN3, >> - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); >> I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); >> >> I915_WRITE(_3D_CHICKEN3, >> _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); >> >> - I915_WRITE(COMMON_SLICE_CHICKEN2, >> - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); >> - >> - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, >> - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); >> - >> - /* WaDisableDopClockGating:bdw May not be needed for production */ >> - I915_WRITE(GEN7_ROW_CHICKEN2, >> - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); >> >> /* WaSwitchSolVfFArbitrationPriority:bdw */ >> I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); >> >> /* WaPsrDPAMaskVBlankInSRD:bdw */ >> I915_WRITE(CHICKEN_PAR1_1, >> I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); >> >> /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ >> for_each_pipe(pipe) { >> I915_WRITE(CHICKEN_PIPESL_1(pipe), >> I915_READ(CHICKEN_PIPESL_1(pipe)) | >> BDW_DPRS_MASK_VBLANK_SRD); >> } >> >> - /* Use Force Non-Coherent whenever executing a 3D context. This is a >> - * workaround for for a possible hang in the unlikely event a TLB >> - * invalidation occurs during a PSD flush. >> - */ >> - I915_WRITE(HDC_CHICKEN0, >> - I915_READ(HDC_CHICKEN0) | >> - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); >> - >> /* WaVSRefCountFullforceMissDisable:bdw */ >> /* WaDSRefCountFullforceMissDisable:bdw */ >> I915_WRITE(GEN7_FF_THREAD_MODE, >> I915_READ(GEN7_FF_THREAD_MODE) & >> ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); >> >> - /* >> - * BSpec recommends 8x4 when MSAA is used, >> - * however in practice 16x4 seems fastest. >> - * >> - * Note that PS/WM thread counts depend on the WIZ hashing >> - * disable bit, which we don't touch here, but it's good >> - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). >> - */ >> - I915_WRITE(GEN7_GT_MODE, >> - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); >> - >> I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, >> _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); >> >> /* WaDisableSDEUnitClockGating:bdw */ >> I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | >> GEN8_SDEUNIT_CLOCK_GATE_DISABLE); >> - >> - /* Wa4x4STCOptimizationDisable:bdw */ >> - I915_WRITE(CACHE_MODE_1, >> - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); >> } >> >> static void haswell_init_clock_gating(struct drm_device *dev) >> { >> struct drm_i915_private *dev_priv = dev->dev_private; >> >> ilk_init_lp_watermarks(dev); >> >> /* L3 caching of data atomics doesn't work -- disable it. */ >> I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index 13543f8..4146582 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -643,20 +643,98 @@ intel_init_pipe_control(struct intel_engine_cs *ring) >> return 0; >> >> err_unpin: >> i915_gem_object_ggtt_unpin(ring->scratch.obj); >> err_unref: >> drm_gem_object_unreference(&ring->scratch.obj->base); >> err: >> return ret; >> } >> >> +static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, >> + u32 addr, u32 value) >> +{ >> + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); >> + intel_ring_emit(ring, addr); >> + intel_ring_emit(ring, value); >> +} >> + >> +static int gen8_init_workarounds(struct intel_engine_cs *ring) >> +{ >> + int ret; >> + >> + /* >> + * workarounds applied in this fn are part of register state context, >> + * they need to be re-initialized followed by gpu reset, suspend/resume, >> + * module reload. >> + */ >> + >> + /* >> + * update the number of dwords required based on the >> + * actual number of workarounds applied >> + */ >> + ret = intel_ring_begin(ring, 24); >> + if (ret) >> + return ret; >> + >> + /* WaDisablePartialInstShootdown:bdw */ >> + /* WaDisableThreadStallDopClockGating:bdw */ >> + /* FIXME: Unclear whether we really need this on production bdw. */ >> + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, >> + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE >> + | STALL_DOP_GATING_DISABLE)); >> + >> + /* WaDisableDopClockGating:bdw May not be needed for production */ >> + intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, >> + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); >> + >> + /* >> + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for >> + * pre-production hardware >> + */ >> + intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3, >> + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS >> + | GEN8_SAMPLER_POWER_BYPASS_DIS)); >> + >> + intel_ring_emit_wa(ring, GEN7_HALF_SLICE_CHICKEN1, >> + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); >> + >> + intel_ring_emit_wa(ring, COMMON_SLICE_CHICKEN2, >> + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); >> + >> + /* Use Force Non-Coherent whenever executing a 3D context. This is a >> + * workaround for for a possible hang in the unlikely event a TLB >> + * invalidation occurs during a PSD flush. >> + */ >> + intel_ring_emit_wa(ring, HDC_CHICKEN0, >> + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); >> + >> + /* Wa4x4STCOptimizationDisable:bdw */ >> + intel_ring_emit_wa(ring, CACHE_MODE_1, >> + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); >> + >> + /* >> + * BSpec recommends 8x4 when MSAA is used, >> + * however in practice 16x4 seems fastest. >> + * >> + * Note that PS/WM thread counts depend on the WIZ hashing >> + * disable bit, which we don't touch here, but it's good >> + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). >> + */ >> + intel_ring_emit_wa(ring, GEN7_GT_MODE, >> + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); >> + >> + intel_ring_advance(ring); >> + >> + return 0; >> +} >> + >> static int init_render_ring(struct intel_engine_cs *ring) >> { >> struct drm_device *dev = ring->dev; >> struct drm_i915_private *dev_priv = dev->dev_private; >> int ret = init_ring_common(ring); >> if (ret) >> return ret; >> >> /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ >> if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) >> @@ -2128,20 +2206,21 @@ int intel_init_render_ring_buffer(struct drm_device *dev) >> i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); >> ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK); >> if (ret != 0) { >> drm_gem_object_unreference(&obj->base); >> DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n"); >> i915.semaphores = 0; >> } else >> dev_priv->semaphore_obj = obj; >> } >> } >> + ring->init_context = gen8_init_workarounds; >> ring->add_request = gen6_add_request; >> ring->flush = gen8_render_ring_flush; >> ring->irq_get = gen8_ring_get_irq; >> ring->irq_put = gen8_ring_put_irq; >> ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT; >> ring->get_seqno = gen6_ring_get_seqno; >> ring->set_seqno = ring_set_seqno; >> if (i915_semaphore_is_enabled(dev)) { >> WARN_ON(!dev_priv->semaphore_obj); >> ring->semaphore.sync_to = gen8_ring_sync; >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h >> index 9cbf7b0..96479c8 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h >> @@ -141,20 +141,22 @@ struct intel_engine_cs { >> struct intel_hw_status_page status_page; >> >> unsigned irq_refcount; /* protected by dev_priv->irq_lock */ >> u32 irq_enable_mask; /* bitmask to enable ring interrupt */ >> u32 trace_irq_seqno; >> bool __must_check (*irq_get)(struct intel_engine_cs *ring); >> void (*irq_put)(struct intel_engine_cs *ring); >> >> int (*init)(struct intel_engine_cs *ring); >> >> + int (*init_context)(struct intel_engine_cs *ring); >> + >> void (*write_tail)(struct intel_engine_cs *ring, >> u32 value); >> int __must_check (*flush)(struct intel_engine_cs *ring, >> u32 invalidate_domains, >> u32 flush_domains); >> int (*add_request)(struct intel_engine_cs *ring); >> /* Some chipsets are not quite as coherent as advertised and need >> * an expensive kick to force a true read of the up-to-date seqno. >> * However, the up-to-date seqno is not always required and the last >> * seen value is good enough. Note that the seqno will always be >> -- >> 2.0.4 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 14:47 ` Siluvery, Arun @ 2014-08-26 15:32 ` Ville Syrjälä 0 siblings, 0 replies; 25+ messages in thread From: Ville Syrjälä @ 2014-08-26 15:32 UTC (permalink / raw) To: Siluvery, Arun; +Cc: intel-gfx On Tue, Aug 26, 2014 at 03:47:52PM +0100, Siluvery, Arun wrote: > On 26/08/2014 15:37, Ville Syrjälä wrote: > > On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote: > >> For BDW workarounds are currently initialized in init_clock_gating() but > >> they are lost during reset, suspend/resume etc; this patch moves the WAs > >> that are part of register state context to render ring init fn otherwise > >> default context ends up with incorrect values as they don't get initialized > >> until init_clock_gating fn. > >> > >> v2: Add workarounds to golden render state > >> This method has its own issues, first of all this is different for > >> each gen and it is generated using a tool so adding new workaround > >> and mainitaining them across gens is not a straightforward process. > >> > >> v3: Use LRIs to emit these workarounds (Ville) > >> Instead of modifying the golden render state the same LRIs are > >> emitted from within the driver. > >> > >> v4: Use abstract name when exporting gen specific routines (Chris) > >> > >> For: VIZ-4092 > >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > > > > This one looks good as far as I'm concerned. > > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > Do you plan to give other platforms the same treatment? We need at least > > CHV converted ASAP. But if you don't have a test machine I can take care > > of that myself. > > > I don't have hardware for CHV, I can borrow and try to do but since it > is required at the earliest could you please modify it for CHV? Sure, I can take care of it. -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 14:37 ` Ville Syrjälä 2014-08-26 14:47 ` Siluvery, Arun @ 2014-08-26 16:06 ` Chris Wilson 1 sibling, 0 replies; 25+ messages in thread From: Chris Wilson @ 2014-08-26 16:06 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On Tue, Aug 26, 2014 at 05:37:05PM +0300, Ville Syrjälä wrote: > On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote: > > For BDW workarounds are currently initialized in init_clock_gating() but > > they are lost during reset, suspend/resume etc; this patch moves the WAs > > that are part of register state context to render ring init fn otherwise > > default context ends up with incorrect values as they don't get initialized > > until init_clock_gating fn. I am dubious here. How does the default context end up with incorrect values if the registers are loaded afterwards and the default context is *never* restored from memory? Could you explain how this exactly fails? Could you explain why only gen8 is affected? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() 2014-08-26 13:44 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 2014-08-26 14:37 ` Ville Syrjälä @ 2014-08-27 14:33 ` ville.syrjala 2014-08-27 15:02 ` Chris Wilson 1 sibling, 1 reply; 25+ messages in thread From: ville.syrjala @ 2014-08-27 14:33 UTC (permalink / raw) To: intel-gfx From: Ville Syrjälä <ville.syrjala@linux.intel.com> Follow the BDW example and apply the workarounds touching registers which are saved in the context image through LRIs in the new ring->init_context() hook. This makes Mesa much happier and eg. glxgears doesn't hang after the first frame. Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> --- drivers/gpu/drm/i915/intel_pm.c | 14 ------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 36 +++++++++++++++++++++++++++++++-- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index bbe65d5..437f25a 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5841,14 +5841,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE); - /* WaDisablePartialInstShootdown:chv */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:chv */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* WaVSRefCountFullforceMissDisable:chv */ /* WaDSRefCountFullforceMissDisable:chv */ I915_WRITE(GEN7_FF_THREAD_MODE, @@ -5867,10 +5859,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); - /* WaDisableGunitClockGating:chv (pre-production hw) */ I915_WRITE(VLV_GUNIT_CLOCK_GATE, I915_READ(VLV_GUNIT_CLOCK_GATE) | GINT_DIS); @@ -5880,8 +5868,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE)); /* WaDisableDopClockGating:chv (pre-production hw) */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) | GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index cef8465..42d9b43 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -658,7 +658,7 @@ static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, intel_ring_emit(ring, value); } -static int gen8_init_workarounds(struct intel_engine_cs *ring) +static int bdw_init_workarounds(struct intel_engine_cs *ring) { int ret; @@ -728,6 +728,35 @@ static int gen8_init_workarounds(struct intel_engine_cs *ring) return 0; } +static int chv_init_workarounds(struct intel_engine_cs *ring) +{ + int ret; + + ret = intel_ring_begin(ring, 12); + if (ret) + return ret; + + /* WaDisablePartialInstShootdown:chv */ + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); + + /* WaDisableThreadStallDopClockGating:chv */ + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, + _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); + + /* WaDisableDopClockGating:chv (pre-production hw) */ + intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); + + /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ + intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3, + _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); + + intel_ring_advance(ring); + + return 0; +} + static int init_render_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; @@ -2214,7 +2243,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev) dev_priv->semaphore_obj = obj; } } - ring->init_context = gen8_init_workarounds; + if (IS_CHERRYVIEW(dev)) + ring->init_context = chv_init_workarounds; + else + ring->init_context = bdw_init_workarounds; ring->add_request = gen6_add_request; ring->flush = gen8_render_ring_flush; ring->irq_get = gen8_ring_get_irq; -- 1.8.5.5 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() 2014-08-27 14:33 ` [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() ville.syrjala @ 2014-08-27 15:02 ` Chris Wilson 2014-08-29 15:43 ` Barbalho, Rafael 0 siblings, 1 reply; 25+ messages in thread From: Chris Wilson @ 2014-08-27 15:02 UTC (permalink / raw) To: ville.syrjala; +Cc: intel-gfx On Wed, Aug 27, 2014 at 05:33:12PM +0300, ville.syrjala@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Follow the BDW example and apply the workarounds touching registers > which are saved in the context image through LRIs in the new > ring->init_context() hook. > > This makes Mesa much happier and eg. glxgears doesn't hang after > the first frame. > > Cc: Arun Siluvery <arun.siluvery@linux.intel.com> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_pm.c | 14 ------------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 36 +++++++++++++++++++++++++++++++-- > 2 files changed, 34 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index bbe65d5..437f25a 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -5841,14 +5841,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) > > I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE); > > - /* WaDisablePartialInstShootdown:chv */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > - > - /* WaDisableThreadStallDopClockGating:chv */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > - > /* WaVSRefCountFullforceMissDisable:chv */ > /* WaDSRefCountFullforceMissDisable:chv */ > I915_WRITE(GEN7_FF_THREAD_MODE, > @@ -5867,10 +5859,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) > I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | > GEN8_SDEUNIT_CLOCK_GATE_DISABLE); > > - /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ > - I915_WRITE(HALF_SLICE_CHICKEN3, > - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > - > /* WaDisableGunitClockGating:chv (pre-production hw) */ > I915_WRITE(VLV_GUNIT_CLOCK_GATE, I915_READ(VLV_GUNIT_CLOCK_GATE) | > GINT_DIS); > @@ -5880,8 +5868,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) > _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE)); > > /* WaDisableDopClockGating:chv (pre-production hw) */ > - I915_WRITE(GEN7_ROW_CHICKEN2, > - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) | > GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); > } > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index cef8465..42d9b43 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -658,7 +658,7 @@ static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, > intel_ring_emit(ring, value); > } > > -static int gen8_init_workarounds(struct intel_engine_cs *ring) > +static int bdw_init_workarounds(struct intel_engine_cs *ring) Sorry, these are continuing naming gripes. *_ring_init_context so that there isn't that much disconnect between vfunc and function. > { > int ret; > > @@ -728,6 +728,35 @@ static int gen8_init_workarounds(struct intel_engine_cs *ring) > return 0; > } > > +static int chv_init_workarounds(struct intel_engine_cs *ring) > +{ > + int ret; > + > + ret = intel_ring_begin(ring, 12); > + if (ret) > + return ret; > + > + /* WaDisablePartialInstShootdown:chv */ > + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, > + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); intel_ring_emit_lri(); However, if we get fancy, you can do these 4 in a single command intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(3)); intel_ring_emit(ring, GEN8_ROW_CHICKEN); intel_ring_emit(ring, _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) | _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); (perhaps even intel_ring_emit_pair(ring, reg, value)) intel_ring_emit(ring, GEN7_ROW_CHICKEN2); intel_ring_emit(ring, _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); intel_ring_emit(ring, HALF_SLICE_CHICKEN3, intel_ring_emit(ring, _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); hmm, actually if you do int i915_request_emit_lri(rq, int num_registers, ...) { struct intel_ringbuffer *ring; va_list ap; ring = intel_ring_begin(rq, 2*num_registers + 2); if (IS_ERR(ring)) return PTR_ERR(ring); va_start(ap, num_registers); intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_registers)); while (num_registers--) { intel_ring_emit(ring, va_arg(ap, u32)); intel_ring_emit(ring, va_arg(ap, u32)); } intel_ring_emit(ring, MI_NOOP); intel_ring_advance(ring); return 0; } then static int chv_ring_init_context(struct i915_request *rq) { return i915_request_emit_lri(rq, 3, /* WaDisablePartialInstShootdown:chv */ /* WaDisableThreadStallDopClockGating:chv */ GEN8_ROW_CHICKEN, _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) | _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE) /* WaDisableDopClockGating:chv (pre-production hw) */ GEN7_ROW_CHICKEN2, _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE), /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ HALF_SLICE_CHICKEN3, _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); } Just not fond of intel_ring_emit_lri() as we the current intel_ring_emit* style should not be calling intel_ring_begin() itself. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() 2014-08-27 15:02 ` Chris Wilson @ 2014-08-29 15:43 ` Barbalho, Rafael 2014-08-29 16:01 ` Daniel Vetter 0 siblings, 1 reply; 25+ messages in thread From: Barbalho, Rafael @ 2014-08-29 15:43 UTC (permalink / raw) To: Chris Wilson, ville.syrjala@linux.intel.com Cc: intel-gfx@lists.freedesktop.org > -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf > Of Chris Wilson > Sent: Wednesday, August 27, 2014 4:03 PM > To: ville.syrjala@linux.intel.com > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Init some CHV workarounds via > LRIs in ring->init_context() > > On Wed, Aug 27, 2014 at 05:33:12PM +0300, ville.syrjala@linux.intel.com > wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > Follow the BDW example and apply the workarounds touching registers > > which are saved in the context image through LRIs in the new > > ring->init_context() hook. > > > > This makes Mesa much happier and eg. glxgears doesn't hang after > > the first frame. > > > > Cc: Arun Siluvery <arun.siluvery@linux.intel.com> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/intel_pm.c | 14 ------------- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 36 > +++++++++++++++++++++++++++++++-- > > 2 files changed, 34 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > b/drivers/gpu/drm/i915/intel_pm.c > > index bbe65d5..437f25a 100644 > > --- a/drivers/gpu/drm/i915/intel_pm.c > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > @@ -5841,14 +5841,6 @@ static void cherryview_init_clock_gating(struct > drm_device *dev) > > > > I915_WRITE(MI_ARB_VLV, > MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE); > > > > - /* WaDisablePartialInstShootdown:chv */ > > - I915_WRITE(GEN8_ROW_CHICKEN, > > - > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > > - > > - /* WaDisableThreadStallDopClockGating:chv */ > > - I915_WRITE(GEN8_ROW_CHICKEN, > > - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > - > > /* WaVSRefCountFullforceMissDisable:chv */ > > /* WaDSRefCountFullforceMissDisable:chv */ > > I915_WRITE(GEN7_FF_THREAD_MODE, > > @@ -5867,10 +5859,6 @@ static void cherryview_init_clock_gating(struct > drm_device *dev) > > I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | > > GEN8_SDEUNIT_CLOCK_GATE_DISABLE); > > > > - /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ > > - I915_WRITE(HALF_SLICE_CHICKEN3, > > - > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > > - > > /* WaDisableGunitClockGating:chv (pre-production hw) */ > > I915_WRITE(VLV_GUNIT_CLOCK_GATE, > I915_READ(VLV_GUNIT_CLOCK_GATE) | > > GINT_DIS); > > @@ -5880,8 +5868,6 @@ static void cherryview_init_clock_gating(struct > drm_device *dev) > > > _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE)); > > > > /* WaDisableDopClockGating:chv (pre-production hw) */ > > - I915_WRITE(GEN7_ROW_CHICKEN2, > > - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) | > > GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); > > } > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index cef8465..42d9b43 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -658,7 +658,7 @@ static inline void intel_ring_emit_wa(struct > intel_engine_cs *ring, > > intel_ring_emit(ring, value); > > } > > > > -static int gen8_init_workarounds(struct intel_engine_cs *ring) > > +static int bdw_init_workarounds(struct intel_engine_cs *ring) > > Sorry, these are continuing naming gripes. > > *_ring_init_context so that there isn't that much disconnect between > vfunc and function. > > > { > > int ret; > > > > @@ -728,6 +728,35 @@ static int gen8_init_workarounds(struct > intel_engine_cs *ring) > > return 0; > > } > > > > +static int chv_init_workarounds(struct intel_engine_cs *ring) > > +{ > > + int ret; > > + > > + ret = intel_ring_begin(ring, 12); > > + if (ret) > > + return ret; > > + This is missing: dev_priv->num_wa_regs = 0; memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); The code needs to be added so that the debugfs export doesn't crash. > > + /* WaDisablePartialInstShootdown:chv */ > > + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, > > + > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > > intel_ring_emit_lri(); > > However, if we get fancy, you can do these 4 in a single command > > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(3)); > intel_ring_emit(ring, GEN8_ROW_CHICKEN); > intel_ring_emit(ring, > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISA > BLE) | > _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > (perhaps even intel_ring_emit_pair(ring, reg, value)) > > intel_ring_emit(ring, GEN7_ROW_CHICKEN2); > intel_ring_emit(ring, > _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > intel_ring_emit(ring, HALF_SLICE_CHICKEN3, > intel_ring_emit(ring, > > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > > hmm, actually if you do > > int > i915_request_emit_lri(rq, int num_registers, ...) > { > struct intel_ringbuffer *ring; > va_list ap; > > ring = intel_ring_begin(rq, 2*num_registers + 2); > if (IS_ERR(ring)) > return PTR_ERR(ring); > > va_start(ap, num_registers); > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_registers)); > while (num_registers--) { > intel_ring_emit(ring, va_arg(ap, u32)); > intel_ring_emit(ring, va_arg(ap, u32)); > } > intel_ring_emit(ring, MI_NOOP); > intel_ring_advance(ring); > > return 0; > } > > then > > static int chv_ring_init_context(struct i915_request *rq) > { > return i915_request_emit_lri(rq, 3, > > /* WaDisablePartialInstShootdown:chv */ > /* WaDisableThreadStallDopClockGating:chv */ > GEN8_ROW_CHICKEN, > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) | > _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE) > > /* WaDisableDopClockGating:chv (pre-production hw) */ > GEN7_ROW_CHICKEN2, > _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE), > > /* WaDisableSamplerPowerBypass:chv (pre-production > hw) */ > HALF_SLICE_CHICKEN3, > > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > } > > Just not fond of intel_ring_emit_lri() as we the current > intel_ring_emit* style should not be calling intel_ring_begin() itself. > -Chris I agree with Chris and Arun is taking his suggestion on board with a few follow up clean up patches to do as you suggested. However, in the mean time we want this stuff in to enable so if ville sends a v2 with the that fix I'll be able to send an r-b. > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() 2014-08-29 15:43 ` Barbalho, Rafael @ 2014-08-29 16:01 ` Daniel Vetter 0 siblings, 0 replies; 25+ messages in thread From: Daniel Vetter @ 2014-08-29 16:01 UTC (permalink / raw) To: Barbalho, Rafael; +Cc: intel-gfx@lists.freedesktop.org On Fri, Aug 29, 2014 at 03:43:25PM +0000, Barbalho, Rafael wrote: > > > > -----Original Message----- > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf > > Of Chris Wilson > > Sent: Wednesday, August 27, 2014 4:03 PM > > To: ville.syrjala@linux.intel.com > > Cc: intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Init some CHV workarounds via > > LRIs in ring->init_context() > > > > On Wed, Aug 27, 2014 at 05:33:12PM +0300, ville.syrjala@linux.intel.com > > wrote: > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > Follow the BDW example and apply the workarounds touching registers > > > which are saved in the context image through LRIs in the new > > > ring->init_context() hook. > > > > > > This makes Mesa much happier and eg. glxgears doesn't hang after > > > the first frame. > > > > > > Cc: Arun Siluvery <arun.siluvery@linux.intel.com> > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_pm.c | 14 ------------- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 36 > > +++++++++++++++++++++++++++++++-- > > > 2 files changed, 34 insertions(+), 16 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > > b/drivers/gpu/drm/i915/intel_pm.c > > > index bbe65d5..437f25a 100644 > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > @@ -5841,14 +5841,6 @@ static void cherryview_init_clock_gating(struct > > drm_device *dev) > > > > > > I915_WRITE(MI_ARB_VLV, > > MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE); > > > > > > - /* WaDisablePartialInstShootdown:chv */ > > > - I915_WRITE(GEN8_ROW_CHICKEN, > > > - > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > > > - > > > - /* WaDisableThreadStallDopClockGating:chv */ > > > - I915_WRITE(GEN8_ROW_CHICKEN, > > > - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > > - > > > /* WaVSRefCountFullforceMissDisable:chv */ > > > /* WaDSRefCountFullforceMissDisable:chv */ > > > I915_WRITE(GEN7_FF_THREAD_MODE, > > > @@ -5867,10 +5859,6 @@ static void cherryview_init_clock_gating(struct > > drm_device *dev) > > > I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | > > > GEN8_SDEUNIT_CLOCK_GATE_DISABLE); > > > > > > - /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ > > > - I915_WRITE(HALF_SLICE_CHICKEN3, > > > - > > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > > > - > > > /* WaDisableGunitClockGating:chv (pre-production hw) */ > > > I915_WRITE(VLV_GUNIT_CLOCK_GATE, > > I915_READ(VLV_GUNIT_CLOCK_GATE) | > > > GINT_DIS); > > > @@ -5880,8 +5868,6 @@ static void cherryview_init_clock_gating(struct > > drm_device *dev) > > > > > _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE)); > > > > > > /* WaDisableDopClockGating:chv (pre-production hw) */ > > > - I915_WRITE(GEN7_ROW_CHICKEN2, > > > - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > > I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) | > > > GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); > > > } > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index cef8465..42d9b43 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -658,7 +658,7 @@ static inline void intel_ring_emit_wa(struct > > intel_engine_cs *ring, > > > intel_ring_emit(ring, value); > > > } > > > > > > -static int gen8_init_workarounds(struct intel_engine_cs *ring) > > > +static int bdw_init_workarounds(struct intel_engine_cs *ring) > > > > Sorry, these are continuing naming gripes. > > > > *_ring_init_context so that there isn't that much disconnect between > > vfunc and function. > > > > > { > > > int ret; > > > > > > @@ -728,6 +728,35 @@ static int gen8_init_workarounds(struct > > intel_engine_cs *ring) > > > return 0; > > > } > > > > > > +static int chv_init_workarounds(struct intel_engine_cs *ring) > > > +{ > > > + int ret; > > > + > > > + ret = intel_ring_begin(ring, 12); > > > + if (ret) > > > + return ret; > > > + > > This is missing: > > dev_priv->num_wa_regs = 0; > memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); > > The code needs to be added so that the debugfs export doesn't crash. > > > > + /* WaDisablePartialInstShootdown:chv */ > > > + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, > > > + > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > > > > intel_ring_emit_lri(); > > > > However, if we get fancy, you can do these 4 in a single command > > > > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(3)); > > intel_ring_emit(ring, GEN8_ROW_CHICKEN); > > intel_ring_emit(ring, > > > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISA > > BLE) | > > _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > > > (perhaps even intel_ring_emit_pair(ring, reg, value)) > > > > intel_ring_emit(ring, GEN7_ROW_CHICKEN2); > > intel_ring_emit(ring, > > _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > intel_ring_emit(ring, HALF_SLICE_CHICKEN3, > > intel_ring_emit(ring, > > > > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > > > > hmm, actually if you do > > > > int > > i915_request_emit_lri(rq, int num_registers, ...) > > { > > struct intel_ringbuffer *ring; > > va_list ap; > > > > ring = intel_ring_begin(rq, 2*num_registers + 2); > > if (IS_ERR(ring)) > > return PTR_ERR(ring); > > > > va_start(ap, num_registers); > > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_registers)); > > while (num_registers--) { > > intel_ring_emit(ring, va_arg(ap, u32)); > > intel_ring_emit(ring, va_arg(ap, u32)); > > } > > intel_ring_emit(ring, MI_NOOP); > > intel_ring_advance(ring); > > > > return 0; > > } > > > > then > > > > static int chv_ring_init_context(struct i915_request *rq) > > { > > return i915_request_emit_lri(rq, 3, > > > > /* WaDisablePartialInstShootdown:chv */ > > /* WaDisableThreadStallDopClockGating:chv */ > > GEN8_ROW_CHICKEN, > > > > _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) | > > _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE) > > > > /* WaDisableDopClockGating:chv (pre-production hw) */ > > GEN7_ROW_CHICKEN2, > > _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE), > > > > /* WaDisableSamplerPowerBypass:chv (pre-production > > hw) */ > > HALF_SLICE_CHICKEN3, > > > > _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > > } > > > > Just not fond of intel_ring_emit_lri() as we the current > > intel_ring_emit* style should not be calling intel_ring_begin() itself. > > -Chris > > I agree with Chris and Arun is taking his suggestion on board with a few > follow up clean up patches to do as you suggested. > > However, in the mean time we want this stuff in to enable so if ville > sends a v2 with the that fix I'll be able to send an r-b. Fixed locally and merged with Rafael's irc r-b tag. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-26 13:44 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery 2014-08-26 13:44 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery @ 2014-08-26 13:44 ` Arun Siluvery 2014-08-27 15:44 ` Daniel Vetter 2014-08-30 15:10 ` Damien Lespiau 1 sibling, 2 replies; 25+ messages in thread From: Arun Siluvery @ 2014-08-26 13:44 UTC (permalink / raw) To: intel-gfx The workarounds that are applied are exported to a debugfs file; this is used to verify their state after the test case (reset or suspend/resume etc). This patch is only required to support i-g-t. Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++ 3 files changed, 77 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index d42db6b..f0d63f6 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2451,20 +2451,59 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md); seq_printf(m, " fp0: 0x%08x\n", pll->hw_state.fp0); seq_printf(m, " fp1: 0x%08x\n", pll->hw_state.fp1); seq_printf(m, " wrpll: 0x%08x\n", pll->hw_state.wrpll); } drm_modeset_unlock_all(dev); return 0; } +static int intel_wa_registers(struct seq_file *m, void *unused) +{ + int i; + int ret; + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + + if (!IS_BROADWELL(dev)) { + DRM_DEBUG_DRIVER("Workaround table not available !!\n"); + return -EINVAL; + } + + ret = mutex_lock_interruptible(&dev->struct_mutex); + if (ret) + return ret; + + intel_runtime_pm_get(dev_priv); + + seq_printf(m, "Workarounds applied: %d\n", dev_priv->num_wa_regs); + for (i = 0; i < dev_priv->num_wa_regs; ++i) { + u32 addr, mask; + + addr = dev_priv->intel_wa_regs[i].addr; + mask = dev_priv->intel_wa_regs[i].mask; + dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask; + if (dev_priv->intel_wa_regs[i].addr) + seq_printf(m, "0x%X: 0x%08X, mask: 0x%08X\n", + dev_priv->intel_wa_regs[i].addr, + dev_priv->intel_wa_regs[i].value, + dev_priv->intel_wa_regs[i].mask); + } + + intel_runtime_pm_put(dev_priv); + mutex_unlock(&dev->struct_mutex); + + return 0; +} + struct pipe_crc_info { const char *name; struct drm_device *dev; enum pipe pipe; }; static int i915_dp_mst_info(struct seq_file *m, void *unused) { struct drm_info_node *node = (struct drm_info_node *) m->private; struct drm_device *dev = node->minor->dev; @@ -3980,20 +4019,21 @@ static const struct drm_info_list i915_debugfs_list[] = { {"i915_llc", i915_llc, 0}, {"i915_edp_psr_status", i915_edp_psr_status, 0}, {"i915_sink_crc_eDP1", i915_sink_crc, 0}, {"i915_energy_uJ", i915_energy_uJ, 0}, {"i915_pc8_status", i915_pc8_status, 0}, {"i915_power_domain_info", i915_power_domain_info, 0}, {"i915_display_info", i915_display_info, 0}, {"i915_semaphore_status", i915_semaphore_status, 0}, {"i915_shared_dplls_info", i915_shared_dplls_info, 0}, {"i915_dp_mst_info", i915_dp_mst_info, 0}, + {"intel_wa_registers", intel_wa_registers, 0} }; #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) static const struct i915_debugfs_files { const char *name; const struct file_operations *fops; } i915_debugfs_files[] = { {"i915_wedged", &i915_wedged_fops}, {"i915_max_freq", &i915_max_freq_fops}, {"i915_min_freq", &i915_min_freq_fops}, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bcf79f0..49b7be7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1546,20 +1546,34 @@ struct drm_i915_private { wait_queue_head_t pending_flip_queue; #ifdef CONFIG_DEBUG_FS struct intel_pipe_crc pipe_crc[I915_MAX_PIPES]; #endif int num_shared_dpll; struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; + /* + * workarounds are currently applied at different places and + * changes are being done to consolidate them so exact count is + * not clear at this point, use a max value for now. + */ +#define I915_MAX_WA_REGS 16 + struct { + u32 addr; + u32 value; + /* bitmask representing WA bits */ + u32 mask; + } intel_wa_regs[I915_MAX_WA_REGS]; + u32 num_wa_regs; + /* Reclocking support */ bool render_reclock_avail; bool lvds_downclock_avail; /* indicates the reduced downclock for LVDS*/ int lvds_downclock; struct i915_frontbuffer_tracking fb_tracking; u16 orig_clock; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 4146582..e48e12a 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -646,34 +646,54 @@ err_unpin: i915_gem_object_ggtt_unpin(ring->scratch.obj); err_unref: drm_gem_object_unreference(&ring->scratch.obj->base); err: return ret; } static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, u32 addr, u32 value) { + struct drm_device *dev = ring->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + + if (dev_priv->num_wa_regs > I915_MAX_WA_REGS) + return; + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit(ring, addr); intel_ring_emit(ring, value); + + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr; + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = (value) & 0xFFFF; + /* value is updated with the status of remaining bits of this + * register when it is read from debugfs file + */ + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; + dev_priv->num_wa_regs++; + + return; } static int gen8_init_workarounds(struct intel_engine_cs *ring) { int ret; + struct drm_device *dev = ring->dev; + struct drm_i915_private *dev_priv = dev->dev_private; /* * workarounds applied in this fn are part of register state context, * they need to be re-initialized followed by gpu reset, suspend/resume, * module reload. */ + dev_priv->num_wa_regs = 0; + memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); /* * update the number of dwords required based on the * actual number of workarounds applied */ ret = intel_ring_begin(ring, 24); if (ret) return ret; /* WaDisablePartialInstShootdown:bdw */ @@ -718,20 +738,23 @@ static int gen8_init_workarounds(struct intel_engine_cs *ring) * * Note that PS/WM thread counts depend on the WIZ hashing * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). */ intel_ring_emit_wa(ring, GEN7_GT_MODE, GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); intel_ring_advance(ring); + DRM_DEBUG_DRIVER("Number of Workarounds applied: %d\n", + dev_priv->num_wa_regs); + return 0; } static int init_render_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); if (ret) return ret; -- 2.0.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-26 13:44 ` [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs Arun Siluvery @ 2014-08-27 15:44 ` Daniel Vetter 2014-08-27 15:54 ` Chris Wilson ` (2 more replies) 2014-08-30 15:10 ` Damien Lespiau 1 sibling, 3 replies; 25+ messages in thread From: Daniel Vetter @ 2014-08-27 15:44 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: > The workarounds that are applied are exported to a debugfs file; > this is used to verify their state after the test case (reset or > suspend/resume etc). This patch is only required to support i-g-t. > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++ > 3 files changed, 77 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index d42db6b..f0d63f6 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -2451,20 +2451,59 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) > seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md); > seq_printf(m, " fp0: 0x%08x\n", pll->hw_state.fp0); > seq_printf(m, " fp1: 0x%08x\n", pll->hw_state.fp1); > seq_printf(m, " wrpll: 0x%08x\n", pll->hw_state.wrpll); > } > drm_modeset_unlock_all(dev); > > return 0; > } > > +static int intel_wa_registers(struct seq_file *m, void *unused) > +{ > + int i; > + int ret; > + struct drm_info_node *node = (struct drm_info_node *) m->private; > + struct drm_device *dev = node->minor->dev; > + struct drm_i915_private *dev_priv = dev->dev_private; > + > + if (!IS_BROADWELL(dev)) { > + DRM_DEBUG_DRIVER("Workaround table not available !!\n"); > + return -EINVAL; > + } > + > + ret = mutex_lock_interruptible(&dev->struct_mutex); > + if (ret) > + return ret; > + > + intel_runtime_pm_get(dev_priv); > + > + seq_printf(m, "Workarounds applied: %d\n", dev_priv->num_wa_regs); > + for (i = 0; i < dev_priv->num_wa_regs; ++i) { > + u32 addr, mask; > + > + addr = dev_priv->intel_wa_regs[i].addr; > + mask = dev_priv->intel_wa_regs[i].mask; > + dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask; > + if (dev_priv->intel_wa_regs[i].addr) > + seq_printf(m, "0x%X: 0x%08X, mask: 0x%08X\n", > + dev_priv->intel_wa_regs[i].addr, > + dev_priv->intel_wa_regs[i].value, > + dev_priv->intel_wa_regs[i].mask); > + } > + > + intel_runtime_pm_put(dev_priv); > + mutex_unlock(&dev->struct_mutex); > + > + return 0; > +} > + > struct pipe_crc_info { > const char *name; > struct drm_device *dev; > enum pipe pipe; > }; > > static int i915_dp_mst_info(struct seq_file *m, void *unused) > { > struct drm_info_node *node = (struct drm_info_node *) m->private; > struct drm_device *dev = node->minor->dev; > @@ -3980,20 +4019,21 @@ static const struct drm_info_list i915_debugfs_list[] = { > {"i915_llc", i915_llc, 0}, > {"i915_edp_psr_status", i915_edp_psr_status, 0}, > {"i915_sink_crc_eDP1", i915_sink_crc, 0}, > {"i915_energy_uJ", i915_energy_uJ, 0}, > {"i915_pc8_status", i915_pc8_status, 0}, > {"i915_power_domain_info", i915_power_domain_info, 0}, > {"i915_display_info", i915_display_info, 0}, > {"i915_semaphore_status", i915_semaphore_status, 0}, > {"i915_shared_dplls_info", i915_shared_dplls_info, 0}, > {"i915_dp_mst_info", i915_dp_mst_info, 0}, > + {"intel_wa_registers", intel_wa_registers, 0} > }; > #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) > > static const struct i915_debugfs_files { > const char *name; > const struct file_operations *fops; > } i915_debugfs_files[] = { > {"i915_wedged", &i915_wedged_fops}, > {"i915_max_freq", &i915_max_freq_fops}, > {"i915_min_freq", &i915_min_freq_fops}, > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index bcf79f0..49b7be7 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1546,20 +1546,34 @@ struct drm_i915_private { > wait_queue_head_t pending_flip_queue; > > #ifdef CONFIG_DEBUG_FS > struct intel_pipe_crc pipe_crc[I915_MAX_PIPES]; > #endif > > int num_shared_dpll; > struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; > int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; > > + /* > + * workarounds are currently applied at different places and > + * changes are being done to consolidate them so exact count is > + * not clear at this point, use a max value for now. > + */ > +#define I915_MAX_WA_REGS 16 > + struct { > + u32 addr; > + u32 value; > + /* bitmask representing WA bits */ > + u32 mask; > + } intel_wa_regs[I915_MAX_WA_REGS]; > + u32 num_wa_regs; > + > /* Reclocking support */ > bool render_reclock_avail; > bool lvds_downclock_avail; > /* indicates the reduced downclock for LVDS*/ > int lvds_downclock; > > struct i915_frontbuffer_tracking fb_tracking; > > u16 orig_clock; > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 4146582..e48e12a 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -646,34 +646,54 @@ err_unpin: > i915_gem_object_ggtt_unpin(ring->scratch.obj); > err_unref: > drm_gem_object_unreference(&ring->scratch.obj->base); > err: > return ret; > } > > static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, > u32 addr, u32 value) > { > + struct drm_device *dev = ring->dev; > + struct drm_i915_private *dev_priv = dev->dev_private; > + > + if (dev_priv->num_wa_regs > I915_MAX_WA_REGS) > + return; > + > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); > intel_ring_emit(ring, addr); > intel_ring_emit(ring, value); > + > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr; > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = (value) & 0xFFFF; > + /* value is updated with the status of remaining bits of this > + * register when it is read from debugfs file > + */ > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; > + dev_priv->num_wa_regs++; > + > + return; > } > > static int gen8_init_workarounds(struct intel_engine_cs *ring) > { > int ret; > + struct drm_device *dev = ring->dev; > + struct drm_i915_private *dev_priv = dev->dev_private; > > /* > * workarounds applied in this fn are part of register state context, > * they need to be re-initialized followed by gpu reset, suspend/resume, > * module reload. > */ > + dev_priv->num_wa_regs = 0; > + memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); I've dropped this part - we allocate dev_priv already zeroed, so this is redundant. And I expect that we'll use this w/a table from other places (so not just for render w/a, but also other stuff that can get lost e.g. over runtime pm), and then the clearing here could be harmful. Both patches applied to dinq, thanks. -Daniel > > /* > * update the number of dwords required based on the > * actual number of workarounds applied > */ > ret = intel_ring_begin(ring, 24); > if (ret) > return ret; > > /* WaDisablePartialInstShootdown:bdw */ > @@ -718,20 +738,23 @@ static int gen8_init_workarounds(struct intel_engine_cs *ring) > * > * Note that PS/WM thread counts depend on the WIZ hashing > * disable bit, which we don't touch here, but it's good > * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). > */ > intel_ring_emit_wa(ring, GEN7_GT_MODE, > GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > > intel_ring_advance(ring); > > + DRM_DEBUG_DRIVER("Number of Workarounds applied: %d\n", > + dev_priv->num_wa_regs); > + > return 0; > } > > static int init_render_ring(struct intel_engine_cs *ring) > { > struct drm_device *dev = ring->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > int ret = init_ring_common(ring); > if (ret) > return ret; > -- > 2.0.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-27 15:44 ` Daniel Vetter @ 2014-08-27 15:54 ` Chris Wilson 2014-08-27 15:59 ` Siluvery, Arun 2014-08-27 16:19 ` Chris Wilson 2 siblings, 0 replies; 25+ messages in thread From: Chris Wilson @ 2014-08-27 15:54 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Wed, Aug 27, 2014 at 05:44:55PM +0200, Daniel Vetter wrote: > On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: > > The workarounds that are applied are exported to a debugfs file; > > this is used to verify their state after the test case (reset or > > suspend/resume etc). This patch is only required to support i-g-t. > > > > static int gen8_init_workarounds(struct intel_engine_cs *ring) > > { > > int ret; > > + struct drm_device *dev = ring->dev; > > + struct drm_i915_private *dev_priv = dev->dev_private; > > > > /* > > * workarounds applied in this fn are part of register state context, > > * they need to be re-initialized followed by gpu reset, suspend/resume, > > * module reload. > > */ > > + dev_priv->num_wa_regs = 0; > > + memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); > > I've dropped this part - we allocate dev_priv already zeroed, so this is > redundant. And I expect that we'll use this w/a table from other places > (so not just for render w/a, but also other stuff that can get lost e.g. > over runtime pm), and then the clearing here could be harmful. But this is just a debugfs readback, something that userspace can already do, and does. The w/a table here isn't particularly useful as is, plus the changes in this patch wrt to writing the registers! still could really do with some tlc. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-27 15:44 ` Daniel Vetter 2014-08-27 15:54 ` Chris Wilson @ 2014-08-27 15:59 ` Siluvery, Arun 2014-08-27 16:19 ` Chris Wilson 2 siblings, 0 replies; 25+ messages in thread From: Siluvery, Arun @ 2014-08-27 15:59 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On 27/08/2014 16:44, Daniel Vetter wrote: > On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: >> The workarounds that are applied are exported to a debugfs file; >> this is used to verify their state after the test case (reset or >> suspend/resume etc). This patch is only required to support i-g-t. >> >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> --- >> drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++++++++++++++++++++++ >> drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++ >> 3 files changed, 77 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c >> index d42db6b..f0d63f6 100644 >> --- a/drivers/gpu/drm/i915/i915_debugfs.c >> +++ b/drivers/gpu/drm/i915/i915_debugfs.c >> @@ -2451,20 +2451,59 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) >> seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md); >> seq_printf(m, " fp0: 0x%08x\n", pll->hw_state.fp0); >> seq_printf(m, " fp1: 0x%08x\n", pll->hw_state.fp1); >> seq_printf(m, " wrpll: 0x%08x\n", pll->hw_state.wrpll); >> } >> drm_modeset_unlock_all(dev); >> >> return 0; >> } >> >> +static int intel_wa_registers(struct seq_file *m, void *unused) >> +{ >> + int i; >> + int ret; >> + struct drm_info_node *node = (struct drm_info_node *) m->private; >> + struct drm_device *dev = node->minor->dev; >> + struct drm_i915_private *dev_priv = dev->dev_private; >> + >> + if (!IS_BROADWELL(dev)) { >> + DRM_DEBUG_DRIVER("Workaround table not available !!\n"); >> + return -EINVAL; >> + } >> + >> + ret = mutex_lock_interruptible(&dev->struct_mutex); >> + if (ret) >> + return ret; >> + >> + intel_runtime_pm_get(dev_priv); >> + >> + seq_printf(m, "Workarounds applied: %d\n", dev_priv->num_wa_regs); >> + for (i = 0; i < dev_priv->num_wa_regs; ++i) { >> + u32 addr, mask; >> + >> + addr = dev_priv->intel_wa_regs[i].addr; >> + mask = dev_priv->intel_wa_regs[i].mask; >> + dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask; >> + if (dev_priv->intel_wa_regs[i].addr) >> + seq_printf(m, "0x%X: 0x%08X, mask: 0x%08X\n", >> + dev_priv->intel_wa_regs[i].addr, >> + dev_priv->intel_wa_regs[i].value, >> + dev_priv->intel_wa_regs[i].mask); >> + } >> + >> + intel_runtime_pm_put(dev_priv); >> + mutex_unlock(&dev->struct_mutex); >> + >> + return 0; >> +} >> + >> struct pipe_crc_info { >> const char *name; >> struct drm_device *dev; >> enum pipe pipe; >> }; >> >> static int i915_dp_mst_info(struct seq_file *m, void *unused) >> { >> struct drm_info_node *node = (struct drm_info_node *) m->private; >> struct drm_device *dev = node->minor->dev; >> @@ -3980,20 +4019,21 @@ static const struct drm_info_list i915_debugfs_list[] = { >> {"i915_llc", i915_llc, 0}, >> {"i915_edp_psr_status", i915_edp_psr_status, 0}, >> {"i915_sink_crc_eDP1", i915_sink_crc, 0}, >> {"i915_energy_uJ", i915_energy_uJ, 0}, >> {"i915_pc8_status", i915_pc8_status, 0}, >> {"i915_power_domain_info", i915_power_domain_info, 0}, >> {"i915_display_info", i915_display_info, 0}, >> {"i915_semaphore_status", i915_semaphore_status, 0}, >> {"i915_shared_dplls_info", i915_shared_dplls_info, 0}, >> {"i915_dp_mst_info", i915_dp_mst_info, 0}, >> + {"intel_wa_registers", intel_wa_registers, 0} >> }; >> #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) >> >> static const struct i915_debugfs_files { >> const char *name; >> const struct file_operations *fops; >> } i915_debugfs_files[] = { >> {"i915_wedged", &i915_wedged_fops}, >> {"i915_max_freq", &i915_max_freq_fops}, >> {"i915_min_freq", &i915_min_freq_fops}, >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >> index bcf79f0..49b7be7 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -1546,20 +1546,34 @@ struct drm_i915_private { >> wait_queue_head_t pending_flip_queue; >> >> #ifdef CONFIG_DEBUG_FS >> struct intel_pipe_crc pipe_crc[I915_MAX_PIPES]; >> #endif >> >> int num_shared_dpll; >> struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; >> int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; >> >> + /* >> + * workarounds are currently applied at different places and >> + * changes are being done to consolidate them so exact count is >> + * not clear at this point, use a max value for now. >> + */ >> +#define I915_MAX_WA_REGS 16 >> + struct { >> + u32 addr; >> + u32 value; >> + /* bitmask representing WA bits */ >> + u32 mask; >> + } intel_wa_regs[I915_MAX_WA_REGS]; >> + u32 num_wa_regs; >> + >> /* Reclocking support */ >> bool render_reclock_avail; >> bool lvds_downclock_avail; >> /* indicates the reduced downclock for LVDS*/ >> int lvds_downclock; >> >> struct i915_frontbuffer_tracking fb_tracking; >> >> u16 orig_clock; >> >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index 4146582..e48e12a 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -646,34 +646,54 @@ err_unpin: >> i915_gem_object_ggtt_unpin(ring->scratch.obj); >> err_unref: >> drm_gem_object_unreference(&ring->scratch.obj->base); >> err: >> return ret; >> } >> >> static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, >> u32 addr, u32 value) >> { >> + struct drm_device *dev = ring->dev; >> + struct drm_i915_private *dev_priv = dev->dev_private; >> + >> + if (dev_priv->num_wa_regs > I915_MAX_WA_REGS) >> + return; >> + >> intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); >> intel_ring_emit(ring, addr); >> intel_ring_emit(ring, value); >> + >> + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr; >> + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = (value) & 0xFFFF; >> + /* value is updated with the status of remaining bits of this >> + * register when it is read from debugfs file >> + */ >> + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; >> + dev_priv->num_wa_regs++; >> + >> + return; >> } >> >> static int gen8_init_workarounds(struct intel_engine_cs *ring) >> { >> int ret; >> + struct drm_device *dev = ring->dev; >> + struct drm_i915_private *dev_priv = dev->dev_private; >> >> /* >> * workarounds applied in this fn are part of register state context, >> * they need to be re-initialized followed by gpu reset, suspend/resume, >> * module reload. >> */ >> + dev_priv->num_wa_regs = 0; >> + memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs)); > > I've dropped this part - we allocate dev_priv already zeroed, so this is > redundant. And I expect that we'll use this w/a table from other places > (so not just for render w/a, but also other stuff that can get lost e.g. > over runtime pm), and then the clearing here could be harmful. > > Both patches applied to dinq, thanks. > -Daniel Chris gave some comments on function naming and also suggested different way of emitting LRIs, should I update and send a new patch? regards Arun >> >> /* >> * update the number of dwords required based on the >> * actual number of workarounds applied >> */ >> ret = intel_ring_begin(ring, 24); >> if (ret) >> return ret; >> >> /* WaDisablePartialInstShootdown:bdw */ >> @@ -718,20 +738,23 @@ static int gen8_init_workarounds(struct intel_engine_cs *ring) >> * >> * Note that PS/WM thread counts depend on the WIZ hashing >> * disable bit, which we don't touch here, but it's good >> * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). >> */ >> intel_ring_emit_wa(ring, GEN7_GT_MODE, >> GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); >> >> intel_ring_advance(ring); >> >> + DRM_DEBUG_DRIVER("Number of Workarounds applied: %d\n", >> + dev_priv->num_wa_regs); >> + >> return 0; >> } >> >> static int init_render_ring(struct intel_engine_cs *ring) >> { >> struct drm_device *dev = ring->dev; >> struct drm_i915_private *dev_priv = dev->dev_private; >> int ret = init_ring_common(ring); >> if (ret) >> return ret; >> -- >> 2.0.4 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-27 15:44 ` Daniel Vetter 2014-08-27 15:54 ` Chris Wilson 2014-08-27 15:59 ` Siluvery, Arun @ 2014-08-27 16:19 ` Chris Wilson 2 siblings, 0 replies; 25+ messages in thread From: Chris Wilson @ 2014-08-27 16:19 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Wed, Aug 27, 2014 at 05:44:55PM +0200, Daniel Vetter wrote: > > static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, > > u32 addr, u32 value) > > { > > + struct drm_device *dev = ring->dev; > > + struct drm_i915_private *dev_priv = dev->dev_private; > > + > > + if (dev_priv->num_wa_regs > I915_MAX_WA_REGS) > > + return; > > + > > intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); > > intel_ring_emit(ring, addr); > > intel_ring_emit(ring, value); > > + > > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr; > > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = (value) & 0xFFFF; > > + /* value is updated with the status of remaining bits of this > > + * register when it is read from debugfs file > > + */ > > + dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; > > + dev_priv->num_wa_regs++; This is bogus. The idea of this function is be called many times during the system uptime. The idea of the existing tool is to be able to probe whether the workaround has taken effect without the aid of a kernel (except for having to work with the kernel in case of forcewake etc). -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-26 13:44 ` [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs Arun Siluvery 2014-08-27 15:44 ` Daniel Vetter @ 2014-08-30 15:10 ` Damien Lespiau 2014-08-31 19:43 ` Siluvery, Arun 1 sibling, 1 reply; 25+ messages in thread From: Damien Lespiau @ 2014-08-30 15:10 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: > The workarounds that are applied are exported to a debugfs file; > this is used to verify their state after the test case (reset or > suspend/resume etc). This patch is only required to support i-g-t. I'm really, really confused. Please bear with me. 1. We only deal with masked registers AFAICS. Those registers have the high 16 bits masking the writes. 2. The values given to intel_ring_emit_wa() are the actual values we're going to write in the register, so they include those mask bits. say: intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); 3. We then record in intel_wa_regs the reg address and two fields named mask and value. 3. a) mask intel_wa_regs[dev_priv->num_wa_regs].mask = value & 0xFFFF; You're selecting the low 16bits and put it in mask. But the masked bits are the upper 16bits? This may work when the W/A is about setting bits, but we have a bug if we ever have a W/A that is about clearing a bit. It would seem better to me to grab the upper bits which are, after all, the bitmask we're interested in. 3. b) value /* value is updated with the status of remaining bits of this * register when it is read from debugfs file */ dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; I don't understand what the comment explains. The *why* we need to do that is missing and, frankly, having to update the reference values we capture at intel_ring_emit_wa() time sounds like a bug to me. I also take a note that, at this, point, intel_wa_regs.value contains both the value and the mask. Weird. 4. Time to expose that intel_wa_regs array to user space 4. a) mask mask = dev_priv->intel_wa_regs[i].mask Straigthforward enough, except that these are still the lower 16 bits, so the value really. 4. b) value dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask; Hum? This really started my journey to dig futher. So we: - override the reference value from intel_ring_emit_wa() with whatever we have in the register at that moment - Or it with a mask that's not really a mask (but the reference value) 5. igt test So you grab those mask and value fields from the debugs file and read the register through mapped MMIO. and then status = (current_wa[i].value & current_wa[i].mask) != (wa_regs[i].value & wa_regs[i].mask); So that's where I'm starting to put things back together and understand what the intention is. I still think that's not quite right, especially how we get the mask and why we read back the register in the debugfs file. Or am I just missing something? In any case, having to spend that much time trying to understand what's going on is a maintainability problem, we need code that a least looks straightforward. HTH, -- Damien ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs 2014-08-30 15:10 ` Damien Lespiau @ 2014-08-31 19:43 ` Siluvery, Arun 0 siblings, 0 replies; 25+ messages in thread From: Siluvery, Arun @ 2014-08-31 19:43 UTC (permalink / raw) To: Damien Lespiau; +Cc: intel-gfx On 30/08/2014 16:10, Damien Lespiau wrote: > On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: >> The workarounds that are applied are exported to a debugfs file; >> this is used to verify their state after the test case (reset or >> suspend/resume etc). This patch is only required to support i-g-t. > > I'm really, really confused. Please bear with me. I have reworked all the patches hopefully things will be more clear this time :) > > 1. We only deal with masked registers AFAICS. Those registers have the > high 16 bits masking the writes. > > 2. The values given to intel_ring_emit_wa() are the actual values we're going > to write in the register, so they include those mask bits. say: > > intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, > _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > 3. We then record in intel_wa_regs the reg address and two fields named mask > and value. > > 3. a) mask > > intel_wa_regs[dev_priv->num_wa_regs].mask = value & 0xFFFF; > > You're selecting the low 16bits and put it in mask. But the masked bits are the > upper 16bits? This may work when the W/A is about setting bits, but we have a > bug if we ever have a W/A that is about clearing a bit. It would seem better to > me to grab the upper bits which are, after all, the bitmask we're interested in. > it is a valid issue, changed to use upper 16-bits as mask. > 3. b) value > > /* value is updated with the status of remaining bits of this > * register when it is read from debugfs file > */ > dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value; > > I don't understand what the comment explains. The *why* we need to do that is > missing and, frankly, having to update the reference values we capture at > intel_ring_emit_wa() time sounds like a bug to me. > > I also take a note that, at this, point, intel_wa_regs.value contains both the > value and the mask. Weird. > I agree the why part is missing. The idea is value represents the status of other bits in this register besides w/a bit; this is actually redundant here, I guess I added because I wanted to initialize all members. This change is not applicable in the new patches. > 4. Time to expose that intel_wa_regs array to user space > > 4. a) mask > > mask = dev_priv->intel_wa_regs[i].mask > > Straigthforward enough, except that these are still the lower 16 bits, so the > value really. > > 4. b) value > > dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask; > > Hum? This really started my journey to dig futher. So we: > > - override the reference value from intel_ring_emit_wa() with whatever we > have in the register at that moment > > - Or it with a mask that's not really a mask (but the reference value) > > 5. igt test > > So you grab those mask and value fields from the debugs file and read the > register through mapped MMIO. and then > > status = (current_wa[i].value & current_wa[i].mask) != > (wa_regs[i].value & wa_regs[i].mask); > > So that's where I'm starting to put things back together and understand what > the intention is. I still think that's not quite right, especially how we get > the mask and why we read back the register in the debugfs file. > We read the register value after the test case (eg reset) and compare it with a known value that is exported to debugfs file. regards Arun > Or am I just missing something? In any case, having to spend that much time > trying to understand what's going on is a maintainability problem, we need code > that a least looks straightforward. > > HTH, > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn @ 2014-08-26 9:33 Arun Siluvery 2014-08-26 9:33 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 0 siblings, 1 reply; 25+ messages in thread From: Arun Siluvery @ 2014-08-26 9:33 UTC (permalink / raw) To: intel-gfx Workarounds for BDW are applied using LRIs during render ring initialization. Only those WA registers that are part of register state are initialized in this fn, remaining are still in its current place init_clock_gating() which are not affected by a gpu reset. I can send another patch where they can be moved to render ring init function but during testing I found their state doesn't change after reset. Arun Siluvery (2): drm/i915/bdw: Apply workarounds in render ring init function drm/i915/bdw: Export workaround data to debugfs drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 14 +++++ drivers/gpu/drm/i915/i915_gem_context.c | 6 ++ drivers/gpu/drm/i915/intel_pm.c | 48 --------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 101 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 6 files changed, 162 insertions(+), 48 deletions(-) -- 2.0.4 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 9:33 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery @ 2014-08-26 9:33 ` Arun Siluvery 2014-08-26 10:09 ` Chris Wilson 0 siblings, 1 reply; 25+ messages in thread From: Arun Siluvery @ 2014-08-26 9:33 UTC (permalink / raw) To: intel-gfx For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. For: VIZ-4092 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 85 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..2debce4 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; to->legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring->last_context = to; if (uninitialized) { + if (IS_BROADWELL(ring->dev)) { + ret = bdw_init_workarounds(ring); + if (ret) + DRM_ERROR("init workarounds: %d\n", ret); + } + ret = i915_gem_render_state_init(ring); if (ret) DRM_ERROR("init render state: %d\n", ret); } return 0; unpin_out: if (ring->id == RCS) i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..668acd9 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for - * pre-production hardware - */ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ for_each_pipe(pipe) { I915_WRITE(CHICKEN_PIPESL_1(pipe), I915_READ(CHICKEN_PIPESL_1(pipe)) | BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a - * workaround for for a possible hang in the unlikely event a TLB - * invalidation occurs during a PSD flush. - */ - I915_WRITE(HDC_CHICKEN0, - I915_READ(HDC_CHICKEN0) | - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); - /* WaVSRefCountFullforceMissDisable:bdw */ /* WaDSRefCountFullforceMissDisable:bdw */ I915_WRITE(GEN7_FF_THREAD_MODE, I915_READ(GEN7_FF_THREAD_MODE) & ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); - /* - * BSpec recommends 8x4 when MSAA is used, - * however in practice 16x4 seems fastest. - * - * Note that PS/WM thread counts depend on the WIZ hashing - * disable bit, which we don't touch here, but it's good - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). - */ - I915_WRITE(GEN7_GT_MODE, - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); - I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); /* WaDisableSDEUnitClockGating:bdw */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - - /* Wa4x4STCOptimizationDisable:bdw */ - I915_WRITE(CACHE_MODE_1, - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); } static void haswell_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; ilk_init_lp_watermarks(dev); /* L3 caching of data atomics doesn't work -- disable it. */ I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 13543f8..c1e7da6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -643,20 +643,98 @@ intel_init_pipe_control(struct intel_engine_cs *ring) return 0; err_unpin: i915_gem_object_ggtt_unpin(ring->scratch.obj); err_unref: drm_gem_object_unreference(&ring->scratch.obj->base); err: return ret; } +static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, + u32 addr, u32 value) +{ + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); + intel_ring_emit(ring, addr); + intel_ring_emit(ring, value); +} + +int bdw_init_workarounds(struct intel_engine_cs *ring) +{ + int ret; + + /* + * workarounds applied in this fn are part of register state context, + * they need to be re-initialized followed by gpu reset, suspend/resume, + * module reload. + */ + + /* + * update the number of dwords required based on the + * actual number of workarounds applied + */ + ret = intel_ring_begin(ring, 24); + if (ret) + return ret; + + /* WaDisablePartialInstShootdown:bdw */ + /* WaDisableThreadStallDopClockGating:bdw */ + /* FIXME: Unclear whether we really need this on production bdw. */ + intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN, + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE + | STALL_DOP_GATING_DISABLE)); + + /* WaDisableDopClockGating:bdw May not be needed for production */ + intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); + + /* + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for + * pre-production hardware + */ + intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3, + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS + | GEN8_SAMPLER_POWER_BYPASS_DIS)); + + intel_ring_emit_wa(ring, GEN7_HALF_SLICE_CHICKEN1, + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); + + intel_ring_emit_wa(ring, COMMON_SLICE_CHICKEN2, + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); + + /* Use Force Non-Coherent whenever executing a 3D context. This is a + * workaround for for a possible hang in the unlikely event a TLB + * invalidation occurs during a PSD flush. + */ + intel_ring_emit_wa(ring, HDC_CHICKEN0, + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); + + /* Wa4x4STCOptimizationDisable:bdw */ + intel_ring_emit_wa(ring, CACHE_MODE_1, + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); + + /* + * BSpec recommends 8x4 when MSAA is used, + * however in practice 16x4 seems fastest. + * + * Note that PS/WM thread counts depend on the WIZ hashing + * disable bit, which we don't touch here, but it's good + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). + */ + intel_ring_emit_wa(ring, GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); + + intel_ring_advance(ring); + + return 0; +} + static int init_render_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); if (ret) return ret; /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9cbf7b0..9da4107 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -411,20 +411,21 @@ int intel_ring_flush_all_caches(struct intel_engine_cs *ring); int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring); void intel_fini_pipe_control(struct intel_engine_cs *ring); int intel_init_pipe_control(struct intel_engine_cs *ring); int intel_init_render_ring_buffer(struct drm_device *dev); int intel_init_bsd_ring_buffer(struct drm_device *dev); int intel_init_bsd2_ring_buffer(struct drm_device *dev); int intel_init_blt_ring_buffer(struct drm_device *dev); int intel_init_vebox_ring_buffer(struct drm_device *dev); +int bdw_init_workarounds(struct intel_engine_cs *ring); u64 intel_ring_get_active_head(struct intel_engine_cs *ring); void intel_ring_setup_status_page(struct intel_engine_cs *ring); static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf) { return ringbuf->tail; } static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring) -- 2.0.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 9:33 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery @ 2014-08-26 10:09 ` Chris Wilson 2014-08-26 10:16 ` Siluvery, Arun 0 siblings, 1 reply; 25+ messages in thread From: Chris Wilson @ 2014-08-26 10:09 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: > For BDW workarounds are currently initialized in init_clock_gating() but > they are lost during reset, suspend/resume etc; this patch moves the WAs > that are part of register state context to render ring init fn otherwise > default context ends up with incorrect values as they don't get initialized > until init_clock_gating fn. > > v2: Add workarounds to golden render state > This method has its own issues, first of all this is different for > each gen and it is generated using a tool so adding new workaround > and mainitaining them across gens is not a straightforward process. > > v3: Use LRIs to emit these workarounds (Ville) > Instead of modifying the golden render state the same LRIs are > emitted from within the driver. > > For: VIZ-4092 > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ > drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > 4 files changed, 85 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 9683e62..2debce4 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, > } > > uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; > to->legacy_hw_ctx.initialized = true; > > done: > i915_gem_context_reference(to); > ring->last_context = to; > > if (uninitialized) { > + if (IS_BROADWELL(ring->dev)) { > + ret = bdw_init_workarounds(ring); > + if (ret) > + DRM_ERROR("init workarounds: %d\n", ret); A good rule of thumb is that if you are exporting gen specific routines, the layering and abstraction is fishy. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 10:09 ` Chris Wilson @ 2014-08-26 10:16 ` Siluvery, Arun 2014-08-26 10:34 ` Chris Wilson 0 siblings, 1 reply; 25+ messages in thread From: Siluvery, Arun @ 2014-08-26 10:16 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 26/08/2014 11:09, Chris Wilson wrote: > On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: >> For BDW workarounds are currently initialized in init_clock_gating() but >> they are lost during reset, suspend/resume etc; this patch moves the WAs >> that are part of register state context to render ring init fn otherwise >> default context ends up with incorrect values as they don't get initialized >> until init_clock_gating fn. >> >> v2: Add workarounds to golden render state >> This method has its own issues, first of all this is different for >> each gen and it is generated using a tool so adding new workaround >> and mainitaining them across gens is not a straightforward process. >> >> v3: Use LRIs to emit these workarounds (Ville) >> Instead of modifying the golden render state the same LRIs are >> emitted from within the driver. >> >> For: VIZ-4092 >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> --- >> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ >> drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +++++++++++++++++++++++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + >> 4 files changed, 85 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >> index 9683e62..2debce4 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_context.c >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c >> @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, >> } >> >> uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; >> to->legacy_hw_ctx.initialized = true; >> >> done: >> i915_gem_context_reference(to); >> ring->last_context = to; >> >> if (uninitialized) { >> + if (IS_BROADWELL(ring->dev)) { >> + ret = bdw_init_workarounds(ring); >> + if (ret) >> + DRM_ERROR("init workarounds: %d\n", ret); > > A good rule of thumb is that if you are exporting gen specific routines, > the layering and abstraction is fishy. > -Chris > ok, so something like i915_init_workarounds() is ok? with a check for bdw/gen8 done inside that function. regards Arun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 10:16 ` Siluvery, Arun @ 2014-08-26 10:34 ` Chris Wilson 2014-08-26 11:42 ` Siluvery, Arun 0 siblings, 1 reply; 25+ messages in thread From: Chris Wilson @ 2014-08-26 10:34 UTC (permalink / raw) To: Siluvery, Arun; +Cc: intel-gfx On Tue, Aug 26, 2014 at 11:16:29AM +0100, Siluvery, Arun wrote: > On 26/08/2014 11:09, Chris Wilson wrote: > >On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: > >>For BDW workarounds are currently initialized in init_clock_gating() but > >>they are lost during reset, suspend/resume etc; this patch moves the WAs > >>that are part of register state context to render ring init fn otherwise > >>default context ends up with incorrect values as they don't get initialized > >>until init_clock_gating fn. > >> > >>v2: Add workarounds to golden render state > >>This method has its own issues, first of all this is different for > >>each gen and it is generated using a tool so adding new workaround > >>and mainitaining them across gens is not a straightforward process. > >> > >>v3: Use LRIs to emit these workarounds (Ville) > >>Instead of modifying the golden render state the same LRIs are > >>emitted from within the driver. > >> > >>For: VIZ-4092 > >>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > >>--- > >> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ > >> drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- > >> drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +++++++++++++++++++++++++++++++++ > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > >> 4 files changed, 85 insertions(+), 48 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > >>index 9683e62..2debce4 100644 > >>--- a/drivers/gpu/drm/i915/i915_gem_context.c > >>+++ b/drivers/gpu/drm/i915/i915_gem_context.c > >>@@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, > >> } > >> > >> uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; > >> to->legacy_hw_ctx.initialized = true; > >> > >> done: > >> i915_gem_context_reference(to); > >> ring->last_context = to; > >> > >> if (uninitialized) { > >>+ if (IS_BROADWELL(ring->dev)) { > >>+ ret = bdw_init_workarounds(ring); > >>+ if (ret) > >>+ DRM_ERROR("init workarounds: %d\n", ret); > > > >A good rule of thumb is that if you are exporting gen specific routines, > >the layering and abstraction is fishy. > >-Chris > > > ok, so something like i915_init_workarounds() is ok? with a check > for bdw/gen8 done inside that function. Except for init_workarounds is quite useless as a function name and we already have a structure that is already customised per-engine and per-gen that you could hook into. engine->ring_init_context() ? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-26 10:34 ` Chris Wilson @ 2014-08-26 11:42 ` Siluvery, Arun 0 siblings, 0 replies; 25+ messages in thread From: Siluvery, Arun @ 2014-08-26 11:42 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 26/08/2014 11:34, Chris Wilson wrote: > On Tue, Aug 26, 2014 at 11:16:29AM +0100, Siluvery, Arun wrote: >> On 26/08/2014 11:09, Chris Wilson wrote: >>> On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: >>>> For BDW workarounds are currently initialized in init_clock_gating() but >>>> they are lost during reset, suspend/resume etc; this patch moves the WAs >>>> that are part of register state context to render ring init fn otherwise >>>> default context ends up with incorrect values as they don't get initialized >>>> until init_clock_gating fn. >>>> >>>> v2: Add workarounds to golden render state >>>> This method has its own issues, first of all this is different for >>>> each gen and it is generated using a tool so adding new workaround >>>> and mainitaining them across gens is not a straightforward process. >>>> >>>> v3: Use LRIs to emit these workarounds (Ville) >>>> Instead of modifying the golden render state the same LRIs are >>>> emitted from within the driver. >>>> >>>> For: VIZ-4092 >>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ >>>> drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- >>>> drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +++++++++++++++++++++++++++++++++ >>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + >>>> 4 files changed, 85 insertions(+), 48 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >>>> index 9683e62..2debce4 100644 >>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c >>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c >>>> @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, >>>> } >>>> >>>> uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; >>>> to->legacy_hw_ctx.initialized = true; >>>> >>>> done: >>>> i915_gem_context_reference(to); >>>> ring->last_context = to; >>>> >>>> if (uninitialized) { >>>> + if (IS_BROADWELL(ring->dev)) { >>>> + ret = bdw_init_workarounds(ring); >>>> + if (ret) >>>> + DRM_ERROR("init workarounds: %d\n", ret); >>> >>> A good rule of thumb is that if you are exporting gen specific routines, >>> the layering and abstraction is fishy. >>> -Chris >>> >> ok, so something like i915_init_workarounds() is ok? with a check >> for bdw/gen8 done inside that function. > > Except for init_workarounds is quite useless as a function name and we > already have a structure that is already customised per-engine and > per-gen that you could hook into. > engine->ring_init_context() ? > -Chris > Ok thanks, I can create a new fn ring_init_context(). regards Arun ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn @ 2014-08-22 19:39 Arun Siluvery 2014-08-22 19:39 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 0 siblings, 1 reply; 25+ messages in thread From: Arun Siluvery @ 2014-08-22 19:39 UTC (permalink / raw) To: intel-gfx Workarounds for BDW are applied using LRIs during render ring initialization. Only those WA registers that are part of register state are initialized in this fn, remaining are still in its current place init_clock_gating() which are not affected by a gpu reset. I can send another patch where they can be moved to render ring init function but during testing I found their state doesn't change after reset. Arun Siluvery (2): drm/i915/bdw: Apply workarounds in render ring init function drm/i915/bdw: Export workaround data to debugfs drivers/gpu/drm/i915/i915_debugfs.c | 40 +++++++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 14 ++++++ drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 -------------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 77 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ++++++++ 6 files changed, 154 insertions(+), 48 deletions(-) -- 2.0.4 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-22 19:39 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery @ 2014-08-22 19:39 ` Arun Siluvery 2014-08-25 12:18 ` Ville Syrjälä 0 siblings, 1 reply; 25+ messages in thread From: Arun Siluvery @ 2014-08-22 19:39 UTC (permalink / raw) To: intel-gfx For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. For: VIZ-4092 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 ---------------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 70 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++++ 4 files changed, 83 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..2debce4 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; to->legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring->last_context = to; if (uninitialized) { + if (IS_BROADWELL(ring->dev)) { + ret = bdw_init_workarounds(ring); + if (ret) + DRM_ERROR("init workarounds: %d\n", ret); + } + ret = i915_gem_render_state_init(ring); if (ret) DRM_ERROR("init render state: %d\n", ret); } return 0; unpin_out: if (ring->id == RCS) i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..668acd9 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for - * pre-production hardware - */ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ for_each_pipe(pipe) { I915_WRITE(CHICKEN_PIPESL_1(pipe), I915_READ(CHICKEN_PIPESL_1(pipe)) | BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a - * workaround for for a possible hang in the unlikely event a TLB - * invalidation occurs during a PSD flush. - */ - I915_WRITE(HDC_CHICKEN0, - I915_READ(HDC_CHICKEN0) | - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); - /* WaVSRefCountFullforceMissDisable:bdw */ /* WaDSRefCountFullforceMissDisable:bdw */ I915_WRITE(GEN7_FF_THREAD_MODE, I915_READ(GEN7_FF_THREAD_MODE) & ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); - /* - * BSpec recommends 8x4 when MSAA is used, - * however in practice 16x4 seems fastest. - * - * Note that PS/WM thread counts depend on the WIZ hashing - * disable bit, which we don't touch here, but it's good - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). - */ - I915_WRITE(GEN7_GT_MODE, - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); - I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); /* WaDisableSDEUnitClockGating:bdw */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - - /* Wa4x4STCOptimizationDisable:bdw */ - I915_WRITE(CACHE_MODE_1, - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); } static void haswell_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; ilk_init_lp_watermarks(dev); /* L3 caching of data atomics doesn't work -- disable it. */ I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 13543f8..9e24073 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -643,20 +643,90 @@ intel_init_pipe_control(struct intel_engine_cs *ring) return 0; err_unpin: i915_gem_object_ggtt_unpin(ring->scratch.obj); err_unref: drm_gem_object_unreference(&ring->scratch.obj->base); err: return ret; } +int bdw_init_workarounds(struct intel_engine_cs *ring) +{ + int ret; + + /* + * workarounds applied in this fn are part of register state context, + * they need to be re-initialized followed by gpu reset, suspend/resume, + * module reload. + */ + + /* + * update the number of dwords required based on the + * actual number of workarounds applied + */ + ret = intel_ring_begin(ring, 24); + if (ret) + return ret; + + /* WaDisablePartialInstShootdown:bdw */ + /* WaDisableThreadStallDopClockGating:bdw */ + /* FIXME: Unclear whether we really need this on production bdw. */ + INTEL_RING_EMIT_WA(ring, GEN8_ROW_CHICKEN, + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE + | STALL_DOP_GATING_DISABLE)); + + /* WaDisableDopClockGating:bdw May not be needed for production */ + INTEL_RING_EMIT_WA(ring, GEN7_ROW_CHICKEN2, + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); + + /* + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for + * pre-production hardware + */ + INTEL_RING_EMIT_WA(ring, HALF_SLICE_CHICKEN3, + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS + | GEN8_SAMPLER_POWER_BYPASS_DIS)); + + INTEL_RING_EMIT_WA(ring, GEN7_HALF_SLICE_CHICKEN1, + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); + + INTEL_RING_EMIT_WA(ring, COMMON_SLICE_CHICKEN2, + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); + + /* Use Force Non-Coherent whenever executing a 3D context. This is a + * workaround for for a possible hang in the unlikely event a TLB + * invalidation occurs during a PSD flush. + */ + INTEL_RING_EMIT_WA(ring, HDC_CHICKEN0, + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); + + /* Wa4x4STCOptimizationDisable:bdw */ + INTEL_RING_EMIT_WA(ring, CACHE_MODE_1, + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); + + /* + * BSpec recommends 8x4 when MSAA is used, + * however in practice 16x4 seems fastest. + * + * Note that PS/WM thread counts depend on the WIZ hashing + * disable bit, which we don't touch here, but it's good + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). + */ + INTEL_RING_EMIT_WA(ring, GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); + + intel_ring_advance(ring); + + return 0; +} + static int init_render_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); if (ret) return ret; /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9cbf7b0..77fe667 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -66,20 +66,26 @@ struct intel_hw_status_page { break; \ } \ ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \ ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \ ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \ ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \ ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(ring, VCS2); \ ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \ } while(0) +#define INTEL_RING_EMIT_WA(_ring, _wa_reg, _wa_val) ({ \ + intel_ring_emit(_ring, MI_LOAD_REGISTER_IMM(1)); \ + intel_ring_emit(_ring, _wa_reg); \ + intel_ring_emit(_ring, _wa_val); \ + }) + enum intel_ring_hangcheck_action { HANGCHECK_IDLE = 0, HANGCHECK_WAIT, HANGCHECK_ACTIVE, HANGCHECK_ACTIVE_LOOP, HANGCHECK_KICK, HANGCHECK_HUNG, }; #define HANGCHECK_SCORE_RING_HUNG 31 @@ -411,20 +417,21 @@ int intel_ring_flush_all_caches(struct intel_engine_cs *ring); int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring); void intel_fini_pipe_control(struct intel_engine_cs *ring); int intel_init_pipe_control(struct intel_engine_cs *ring); int intel_init_render_ring_buffer(struct drm_device *dev); int intel_init_bsd_ring_buffer(struct drm_device *dev); int intel_init_bsd2_ring_buffer(struct drm_device *dev); int intel_init_blt_ring_buffer(struct drm_device *dev); int intel_init_vebox_ring_buffer(struct drm_device *dev); +int bdw_init_workarounds(struct intel_engine_cs *ring); u64 intel_ring_get_active_head(struct intel_engine_cs *ring); void intel_ring_setup_status_page(struct intel_engine_cs *ring); static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf) { return ringbuf->tail; } static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring) -- 2.0.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-22 19:39 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery @ 2014-08-25 12:18 ` Ville Syrjälä 2014-08-26 9:27 ` Siluvery, Arun 0 siblings, 1 reply; 25+ messages in thread From: Ville Syrjälä @ 2014-08-25 12:18 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx On Fri, Aug 22, 2014 at 08:39:11PM +0100, Arun Siluvery wrote: > For BDW workarounds are currently initialized in init_clock_gating() but > they are lost during reset, suspend/resume etc; this patch moves the WAs > that are part of register state context to render ring init fn otherwise > default context ends up with incorrect values as they don't get initialized > until init_clock_gating fn. > > v2: Add workarounds to golden render state > This method has its own issues, first of all this is different for > each gen and it is generated using a tool so adding new workaround > and mainitaining them across gens is not a straightforward process. > > v3: Use LRIs to emit these workarounds (Ville) > Instead of modifying the golden render state the same LRIs are > emitted from within the driver. > > For: VIZ-4092 > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ > drivers/gpu/drm/i915/intel_pm.c | 48 ---------------------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 70 +++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++++ > 4 files changed, 83 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 9683e62..2debce4 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, > } > > uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; > to->legacy_hw_ctx.initialized = true; > > done: > i915_gem_context_reference(to); > ring->last_context = to; > > if (uninitialized) { > + if (IS_BROADWELL(ring->dev)) { > + ret = bdw_init_workarounds(ring); > + if (ret) > + DRM_ERROR("init workarounds: %d\n", ret); > + } > + > ret = i915_gem_render_state_init(ring); > if (ret) > DRM_ERROR("init render state: %d\n", ret); > } > > return 0; > > unpin_out: > if (ring->id == RCS) > i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index c8f744c..668acd9 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) > struct drm_i915_private *dev_priv = dev->dev_private; > enum pipe pipe; > > I915_WRITE(WM3_LP_ILK, 0); > I915_WRITE(WM2_LP_ILK, 0); > I915_WRITE(WM1_LP_ILK, 0); > > /* FIXME(BDW): Check all the w/a, some might only apply to > * pre-production hw. */ > > - /* WaDisablePartialInstShootdown:bdw */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); > - > - /* WaDisableThreadStallDopClockGating:bdw */ > - /* FIXME: Unclear whether we really need this on production bdw. */ > - I915_WRITE(GEN8_ROW_CHICKEN, > - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); > > - /* > - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for > - * pre-production hardware > - */ > - I915_WRITE(HALF_SLICE_CHICKEN3, > - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); > - I915_WRITE(HALF_SLICE_CHICKEN3, > - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); > I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); > > I915_WRITE(_3D_CHICKEN3, > _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); > > - I915_WRITE(COMMON_SLICE_CHICKEN2, > - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); > - > - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, > - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); > - > - /* WaDisableDopClockGating:bdw May not be needed for production */ > - I915_WRITE(GEN7_ROW_CHICKEN2, > - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > > /* WaSwitchSolVfFArbitrationPriority:bdw */ > I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); > > /* WaPsrDPAMaskVBlankInSRD:bdw */ > I915_WRITE(CHICKEN_PAR1_1, > I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); > > /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ > for_each_pipe(pipe) { > I915_WRITE(CHICKEN_PIPESL_1(pipe), > I915_READ(CHICKEN_PIPESL_1(pipe)) | > BDW_DPRS_MASK_VBLANK_SRD); > } > > - /* Use Force Non-Coherent whenever executing a 3D context. This is a > - * workaround for for a possible hang in the unlikely event a TLB > - * invalidation occurs during a PSD flush. > - */ > - I915_WRITE(HDC_CHICKEN0, > - I915_READ(HDC_CHICKEN0) | > - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); > - > /* WaVSRefCountFullforceMissDisable:bdw */ > /* WaDSRefCountFullforceMissDisable:bdw */ > I915_WRITE(GEN7_FF_THREAD_MODE, > I915_READ(GEN7_FF_THREAD_MODE) & > ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); > > - /* > - * BSpec recommends 8x4 when MSAA is used, > - * however in practice 16x4 seems fastest. > - * > - * Note that PS/WM thread counts depend on the WIZ hashing > - * disable bit, which we don't touch here, but it's good > - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). > - */ > - I915_WRITE(GEN7_GT_MODE, > - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > - > I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, > _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); > > /* WaDisableSDEUnitClockGating:bdw */ > I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | > GEN8_SDEUNIT_CLOCK_GATE_DISABLE); > - > - /* Wa4x4STCOptimizationDisable:bdw */ > - I915_WRITE(CACHE_MODE_1, > - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); > } > > static void haswell_init_clock_gating(struct drm_device *dev) > { > struct drm_i915_private *dev_priv = dev->dev_private; > > ilk_init_lp_watermarks(dev); > > /* L3 caching of data atomics doesn't work -- disable it. */ > I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 13543f8..9e24073 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -643,20 +643,90 @@ intel_init_pipe_control(struct intel_engine_cs *ring) > return 0; > > err_unpin: > i915_gem_object_ggtt_unpin(ring->scratch.obj); > err_unref: > drm_gem_object_unreference(&ring->scratch.obj->base); > err: > return ret; > } > > +int bdw_init_workarounds(struct intel_engine_cs *ring) > +{ > + int ret; > + > + /* > + * workarounds applied in this fn are part of register state context, > + * they need to be re-initialized followed by gpu reset, suspend/resume, > + * module reload. > + */ > + > + /* > + * update the number of dwords required based on the > + * actual number of workarounds applied > + */ > + ret = intel_ring_begin(ring, 24); > + if (ret) > + return ret; > + > + /* WaDisablePartialInstShootdown:bdw */ > + /* WaDisableThreadStallDopClockGating:bdw */ > + /* FIXME: Unclear whether we really need this on production bdw. */ > + INTEL_RING_EMIT_WA(ring, GEN8_ROW_CHICKEN, > + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE > + | STALL_DOP_GATING_DISABLE)); > + > + /* WaDisableDopClockGating:bdw May not be needed for production */ > + INTEL_RING_EMIT_WA(ring, GEN7_ROW_CHICKEN2, > + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); > + > + /* > + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for > + * pre-production hardware > + */ > + INTEL_RING_EMIT_WA(ring, HALF_SLICE_CHICKEN3, > + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS > + | GEN8_SAMPLER_POWER_BYPASS_DIS)); > + > + INTEL_RING_EMIT_WA(ring, GEN7_HALF_SLICE_CHICKEN1, > + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); > + > + INTEL_RING_EMIT_WA(ring, COMMON_SLICE_CHICKEN2, > + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); > + > + /* Use Force Non-Coherent whenever executing a 3D context. This is a > + * workaround for for a possible hang in the unlikely event a TLB > + * invalidation occurs during a PSD flush. > + */ > + INTEL_RING_EMIT_WA(ring, HDC_CHICKEN0, > + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); > + > + /* Wa4x4STCOptimizationDisable:bdw */ > + INTEL_RING_EMIT_WA(ring, CACHE_MODE_1, > + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); > + > + /* > + * BSpec recommends 8x4 when MSAA is used, > + * however in practice 16x4 seems fastest. > + * > + * Note that PS/WM thread counts depend on the WIZ hashing > + * disable bit, which we don't touch here, but it's good > + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). > + */ > + INTEL_RING_EMIT_WA(ring, GEN7_GT_MODE, > + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > + > + intel_ring_advance(ring); > + > + return 0; > +} > + > static int init_render_ring(struct intel_engine_cs *ring) > { > struct drm_device *dev = ring->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > int ret = init_ring_common(ring); > if (ret) > return ret; > > /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ > if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > index 9cbf7b0..77fe667 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -66,20 +66,26 @@ struct intel_hw_status_page { > break; \ > } \ > ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \ > ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \ > ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \ > ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \ > ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(ring, VCS2); \ > ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \ > } while(0) > > +#define INTEL_RING_EMIT_WA(_ring, _wa_reg, _wa_val) ({ \ > + intel_ring_emit(_ring, MI_LOAD_REGISTER_IMM(1)); \ > + intel_ring_emit(_ring, _wa_reg); \ > + intel_ring_emit(_ring, _wa_val); \ > + }) do {} while (0) would suffice since the macro doesn't return anything. Not sure why it's a macro actually. A static function in intel_ringbuffer.c would do just as well. If you want to keep it a macro, the argument evaluations should have () around them. I was also going to say that you could just call it ring_emit_lri() or something, but I guess you plan to add the debugfs w/a stuff there which makes the current name more appropriate. I'll let others bikeshed the debugfs stuff since I have no real opinion on it. Apart from that the patch seems good to me. > + > enum intel_ring_hangcheck_action { > HANGCHECK_IDLE = 0, > HANGCHECK_WAIT, > HANGCHECK_ACTIVE, > HANGCHECK_ACTIVE_LOOP, > HANGCHECK_KICK, > HANGCHECK_HUNG, > }; > > #define HANGCHECK_SCORE_RING_HUNG 31 > @@ -411,20 +417,21 @@ int intel_ring_flush_all_caches(struct intel_engine_cs *ring); > int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring); > > void intel_fini_pipe_control(struct intel_engine_cs *ring); > int intel_init_pipe_control(struct intel_engine_cs *ring); > > int intel_init_render_ring_buffer(struct drm_device *dev); > int intel_init_bsd_ring_buffer(struct drm_device *dev); > int intel_init_bsd2_ring_buffer(struct drm_device *dev); > int intel_init_blt_ring_buffer(struct drm_device *dev); > int intel_init_vebox_ring_buffer(struct drm_device *dev); > +int bdw_init_workarounds(struct intel_engine_cs *ring); > > u64 intel_ring_get_active_head(struct intel_engine_cs *ring); > void intel_ring_setup_status_page(struct intel_engine_cs *ring); > > static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf) > { > return ringbuf->tail; > } > > static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring) > -- > 2.0.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function 2014-08-25 12:18 ` Ville Syrjälä @ 2014-08-26 9:27 ` Siluvery, Arun 0 siblings, 0 replies; 25+ messages in thread From: Siluvery, Arun @ 2014-08-26 9:27 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On 25/08/2014 13:18, Ville Syrjälä wrote: > On Fri, Aug 22, 2014 at 08:39:11PM +0100, Arun Siluvery wrote: >> For BDW workarounds are currently initialized in init_clock_gating() but >> they are lost during reset, suspend/resume etc; this patch moves the WAs >> that are part of register state context to render ring init fn otherwise >> default context ends up with incorrect values as they don't get initialized >> until init_clock_gating fn. >> >> v2: Add workarounds to golden render state >> This method has its own issues, first of all this is different for >> each gen and it is generated using a tool so adding new workaround >> and mainitaining them across gens is not a straightforward process. >> >> v3: Use LRIs to emit these workarounds (Ville) >> Instead of modifying the golden render state the same LRIs are >> emitted from within the driver. >> >> For: VIZ-4092 >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> --- >> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ >> drivers/gpu/drm/i915/intel_pm.c | 48 ---------------------- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 70 +++++++++++++++++++++++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++++ >> 4 files changed, 83 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >> index 9683e62..2debce4 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_context.c >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c >> @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, >> } >> >> uninitialized = !to->legacy_hw_ctx.initialized && from == NULL; >> to->legacy_hw_ctx.initialized = true; >> >> done: >> i915_gem_context_reference(to); >> ring->last_context = to; >> >> if (uninitialized) { >> + if (IS_BROADWELL(ring->dev)) { >> + ret = bdw_init_workarounds(ring); >> + if (ret) >> + DRM_ERROR("init workarounds: %d\n", ret); >> + } >> + >> ret = i915_gem_render_state_init(ring); >> if (ret) >> DRM_ERROR("init render state: %d\n", ret); >> } >> >> return 0; >> >> unpin_out: >> if (ring->id == RCS) >> i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state); >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >> index c8f744c..668acd9 100644 >> --- a/drivers/gpu/drm/i915/intel_pm.c >> +++ b/drivers/gpu/drm/i915/intel_pm.c >> @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) >> struct drm_i915_private *dev_priv = dev->dev_private; >> enum pipe pipe; >> >> I915_WRITE(WM3_LP_ILK, 0); >> I915_WRITE(WM2_LP_ILK, 0); >> I915_WRITE(WM1_LP_ILK, 0); >> >> /* FIXME(BDW): Check all the w/a, some might only apply to >> * pre-production hw. */ >> >> - /* WaDisablePartialInstShootdown:bdw */ >> - I915_WRITE(GEN8_ROW_CHICKEN, >> - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); >> - >> - /* WaDisableThreadStallDopClockGating:bdw */ >> - /* FIXME: Unclear whether we really need this on production bdw. */ >> - I915_WRITE(GEN8_ROW_CHICKEN, >> - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); >> >> - /* >> - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for >> - * pre-production hardware >> - */ >> - I915_WRITE(HALF_SLICE_CHICKEN3, >> - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); >> - I915_WRITE(HALF_SLICE_CHICKEN3, >> - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); >> I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); >> >> I915_WRITE(_3D_CHICKEN3, >> _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); >> >> - I915_WRITE(COMMON_SLICE_CHICKEN2, >> - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); >> - >> - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, >> - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); >> - >> - /* WaDisableDopClockGating:bdw May not be needed for production */ >> - I915_WRITE(GEN7_ROW_CHICKEN2, >> - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); >> >> /* WaSwitchSolVfFArbitrationPriority:bdw */ >> I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); >> >> /* WaPsrDPAMaskVBlankInSRD:bdw */ >> I915_WRITE(CHICKEN_PAR1_1, >> I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); >> >> /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ >> for_each_pipe(pipe) { >> I915_WRITE(CHICKEN_PIPESL_1(pipe), >> I915_READ(CHICKEN_PIPESL_1(pipe)) | >> BDW_DPRS_MASK_VBLANK_SRD); >> } >> >> - /* Use Force Non-Coherent whenever executing a 3D context. This is a >> - * workaround for for a possible hang in the unlikely event a TLB >> - * invalidation occurs during a PSD flush. >> - */ >> - I915_WRITE(HDC_CHICKEN0, >> - I915_READ(HDC_CHICKEN0) | >> - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); >> - >> /* WaVSRefCountFullforceMissDisable:bdw */ >> /* WaDSRefCountFullforceMissDisable:bdw */ >> I915_WRITE(GEN7_FF_THREAD_MODE, >> I915_READ(GEN7_FF_THREAD_MODE) & >> ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); >> >> - /* >> - * BSpec recommends 8x4 when MSAA is used, >> - * however in practice 16x4 seems fastest. >> - * >> - * Note that PS/WM thread counts depend on the WIZ hashing >> - * disable bit, which we don't touch here, but it's good >> - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). >> - */ >> - I915_WRITE(GEN7_GT_MODE, >> - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); >> - >> I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, >> _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); >> >> /* WaDisableSDEUnitClockGating:bdw */ >> I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | >> GEN8_SDEUNIT_CLOCK_GATE_DISABLE); >> - >> - /* Wa4x4STCOptimizationDisable:bdw */ >> - I915_WRITE(CACHE_MODE_1, >> - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); >> } >> >> static void haswell_init_clock_gating(struct drm_device *dev) >> { >> struct drm_i915_private *dev_priv = dev->dev_private; >> >> ilk_init_lp_watermarks(dev); >> >> /* L3 caching of data atomics doesn't work -- disable it. */ >> I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index 13543f8..9e24073 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -643,20 +643,90 @@ intel_init_pipe_control(struct intel_engine_cs *ring) >> return 0; >> >> err_unpin: >> i915_gem_object_ggtt_unpin(ring->scratch.obj); >> err_unref: >> drm_gem_object_unreference(&ring->scratch.obj->base); >> err: >> return ret; >> } >> >> +int bdw_init_workarounds(struct intel_engine_cs *ring) >> +{ >> + int ret; >> + >> + /* >> + * workarounds applied in this fn are part of register state context, >> + * they need to be re-initialized followed by gpu reset, suspend/resume, >> + * module reload. >> + */ >> + >> + /* >> + * update the number of dwords required based on the >> + * actual number of workarounds applied >> + */ >> + ret = intel_ring_begin(ring, 24); >> + if (ret) >> + return ret; >> + >> + /* WaDisablePartialInstShootdown:bdw */ >> + /* WaDisableThreadStallDopClockGating:bdw */ >> + /* FIXME: Unclear whether we really need this on production bdw. */ >> + INTEL_RING_EMIT_WA(ring, GEN8_ROW_CHICKEN, >> + _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE >> + | STALL_DOP_GATING_DISABLE)); >> + >> + /* WaDisableDopClockGating:bdw May not be needed for production */ >> + INTEL_RING_EMIT_WA(ring, GEN7_ROW_CHICKEN2, >> + _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); >> + >> + /* >> + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for >> + * pre-production hardware >> + */ >> + INTEL_RING_EMIT_WA(ring, HALF_SLICE_CHICKEN3, >> + _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS >> + | GEN8_SAMPLER_POWER_BYPASS_DIS)); >> + >> + INTEL_RING_EMIT_WA(ring, GEN7_HALF_SLICE_CHICKEN1, >> + _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); >> + >> + INTEL_RING_EMIT_WA(ring, COMMON_SLICE_CHICKEN2, >> + _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); >> + >> + /* Use Force Non-Coherent whenever executing a 3D context. This is a >> + * workaround for for a possible hang in the unlikely event a TLB >> + * invalidation occurs during a PSD flush. >> + */ >> + INTEL_RING_EMIT_WA(ring, HDC_CHICKEN0, >> + _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); >> + >> + /* Wa4x4STCOptimizationDisable:bdw */ >> + INTEL_RING_EMIT_WA(ring, CACHE_MODE_1, >> + _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); >> + >> + /* >> + * BSpec recommends 8x4 when MSAA is used, >> + * however in practice 16x4 seems fastest. >> + * >> + * Note that PS/WM thread counts depend on the WIZ hashing >> + * disable bit, which we don't touch here, but it's good >> + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). >> + */ >> + INTEL_RING_EMIT_WA(ring, GEN7_GT_MODE, >> + GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); >> + >> + intel_ring_advance(ring); >> + >> + return 0; >> +} >> + >> static int init_render_ring(struct intel_engine_cs *ring) >> { >> struct drm_device *dev = ring->dev; >> struct drm_i915_private *dev_priv = dev->dev_private; >> int ret = init_ring_common(ring); >> if (ret) >> return ret; >> >> /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ >> if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h >> index 9cbf7b0..77fe667 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h >> @@ -66,20 +66,26 @@ struct intel_hw_status_page { >> break; \ >> } \ >> ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \ >> ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \ >> ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \ >> ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \ >> ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(ring, VCS2); \ >> ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \ >> } while(0) >> >> +#define INTEL_RING_EMIT_WA(_ring, _wa_reg, _wa_val) ({ \ >> + intel_ring_emit(_ring, MI_LOAD_REGISTER_IMM(1)); \ >> + intel_ring_emit(_ring, _wa_reg); \ >> + intel_ring_emit(_ring, _wa_val); \ >> + }) > > do {} while (0) would suffice since the macro doesn't return anything. > Not sure why it's a macro actually. A static function in intel_ringbuffer.c > would do just as well. If you want to keep it a macro, the argument > evaluations should have () around them. > > I was also going to say that you could just call it ring_emit_lri() or > something, but I guess you plan to add the debugfs w/a stuff there which > makes the current name more appropriate. I'll let others bikeshed the > debugfs stuff since I have no real opinion on it. > > Apart from that the patch seems good to me. > It was started as a macro because I wanted to keep it similar as I915_WRITE() as initially these workarounds are applied using register writes, as you mentioned it could be a fn as well. I will change it as a fn and send updated patch. The debugfs part is mainly added to support testing. In i-g-t we need the list of workarounds applied to compare their state before and after test and adding them in i-g-t itself is not so nice; Daniel suggested to generate the list from within the driver and export them to a debugfs file. regards Arun >> + >> enum intel_ring_hangcheck_action { >> HANGCHECK_IDLE = 0, >> HANGCHECK_WAIT, >> HANGCHECK_ACTIVE, >> HANGCHECK_ACTIVE_LOOP, >> HANGCHECK_KICK, >> HANGCHECK_HUNG, >> }; >> >> #define HANGCHECK_SCORE_RING_HUNG 31 >> @@ -411,20 +417,21 @@ int intel_ring_flush_all_caches(struct intel_engine_cs *ring); >> int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring); >> >> void intel_fini_pipe_control(struct intel_engine_cs *ring); >> int intel_init_pipe_control(struct intel_engine_cs *ring); >> >> int intel_init_render_ring_buffer(struct drm_device *dev); >> int intel_init_bsd_ring_buffer(struct drm_device *dev); >> int intel_init_bsd2_ring_buffer(struct drm_device *dev); >> int intel_init_blt_ring_buffer(struct drm_device *dev); >> int intel_init_vebox_ring_buffer(struct drm_device *dev); >> +int bdw_init_workarounds(struct intel_engine_cs *ring); >> >> u64 intel_ring_get_active_head(struct intel_engine_cs *ring); >> void intel_ring_setup_status_page(struct intel_engine_cs *ring); >> >> static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf) >> { >> return ringbuf->tail; >> } >> >> static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring) >> -- >> 2.0.4 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-08-31 19:43 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-26 13:44 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery 2014-08-26 13:44 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 2014-08-26 14:37 ` Ville Syrjälä 2014-08-26 14:47 ` Siluvery, Arun 2014-08-26 15:32 ` Ville Syrjälä 2014-08-26 16:06 ` Chris Wilson 2014-08-27 14:33 ` [PATCH] drm/i915: Init some CHV workarounds via LRIs in ring->init_context() ville.syrjala 2014-08-27 15:02 ` Chris Wilson 2014-08-29 15:43 ` Barbalho, Rafael 2014-08-29 16:01 ` Daniel Vetter 2014-08-26 13:44 ` [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs Arun Siluvery 2014-08-27 15:44 ` Daniel Vetter 2014-08-27 15:54 ` Chris Wilson 2014-08-27 15:59 ` Siluvery, Arun 2014-08-27 16:19 ` Chris Wilson 2014-08-30 15:10 ` Damien Lespiau 2014-08-31 19:43 ` Siluvery, Arun -- strict thread matches above, loose matches on Subject: below -- 2014-08-26 9:33 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery 2014-08-26 9:33 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 2014-08-26 10:09 ` Chris Wilson 2014-08-26 10:16 ` Siluvery, Arun 2014-08-26 10:34 ` Chris Wilson 2014-08-26 11:42 ` Siluvery, Arun 2014-08-22 19:39 [PATCH 0/2] Apply BDW workarounds using LRIs in render init fn Arun Siluvery 2014-08-22 19:39 ` [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function Arun Siluvery 2014-08-25 12:18 ` Ville Syrjälä 2014-08-26 9:27 ` Siluvery, Arun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox