* [PATCH 0/3] ring/context initialization patches
@ 2015-03-16 13:46 Mika Kuoppala
2015-03-16 13:46 ` [PATCH 1/3] drm/i915: Wait for render state init Mika Kuoppala
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Mika Kuoppala @ 2015-03-16 13:46 UTC (permalink / raw)
To: intel-gfx
Hi,
Testing dynamic page tables series, there has
been also problems with our codebase interacting badly
with the patchset. These are the essentials I have needed
to testrun/debug the series on my gen7.
Chris Wilson (2):
drm/i915: Detect page faults during hangcheck
drm/i915: Reorder hw init to avoid executing with invalid context/mm
state
Mika Kuoppala (1):
drm/i915: Wait for render state init
drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++--------
drivers/gpu/drm/i915/i915_gem_render_state.c | 4 ++++
drivers/gpu/drm/i915/i915_irq.c | 5 +++++
drivers/gpu/drm/i915/intel_ringbuffer.c | 26 ++++++++++++++++++++++----
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
drivers/gpu/drm/i915/intel_uncore.c | 2 ++
6 files changed, 45 insertions(+), 12 deletions(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/3] drm/i915: Wait for render state init 2015-03-16 13:46 [PATCH 0/3] ring/context initialization patches Mika Kuoppala @ 2015-03-16 13:46 ` Mika Kuoppala 2015-03-16 13:51 ` Chris Wilson 2015-03-16 13:46 ` [PATCH 2/3] drm/i915: Detect page faults during hangcheck Mika Kuoppala 2015-03-16 13:46 ` [PATCH 3/3] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Mika Kuoppala 2 siblings, 1 reply; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 13:46 UTC (permalink / raw) To: intel-gfx We are freeing the batch that is just pushed to ring. Further, other ring inits should assume that the render context with the workarounds are pushed before their rings execute anything. This fixes (or papers over) a problem where with full ppgtt the blitter ring init sometimes fails, where the blt ring ACTHD runs wild through the address space until eventually hangcheck kills it. With dynamic page table series this problem became more easily reproduced by just normal boot failing to init blt ring, instead of torturing init with gem_reset_stats. Testcase: igt/gem_reset_stats --r ban-blt Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_gem_render_state.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c index 521548a..9484c64 100644 --- a/drivers/gpu/drm/i915/i915_gem_render_state.c +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c @@ -175,6 +175,10 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring) ret = __i915_add_request(ring, NULL, so.obj); /* __i915_add_request moves object to inactive if it fails */ + if (ret) + goto out; + + ret = intel_ring_idle(ring); out: i915_gem_render_state_fini(&so); return ret; -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] drm/i915: Wait for render state init 2015-03-16 13:46 ` [PATCH 1/3] drm/i915: Wait for render state init Mika Kuoppala @ 2015-03-16 13:51 ` Chris Wilson 2015-03-16 14:20 ` Mika Kuoppala 0 siblings, 1 reply; 14+ messages in thread From: Chris Wilson @ 2015-03-16 13:51 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx On Mon, Mar 16, 2015 at 03:46:34PM +0200, Mika Kuoppala wrote: > We are freeing the batch that is just pushed to ring. Kill this. It is not correct. > Further, other ring inits should assume that the render > context with the workarounds are pushed before their rings > execute anything. > > This fixes (or papers over) a problem where with full ppgtt > the blitter ring init sometimes fails, where the blt ring > ACTHD runs wild through the address space until eventually > hangcheck kills it. Interesting. We don't do anything inside the golden render state to be of relevance to the blitter engine. So I think this is pure paper and will not ultimately fix the problem. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] drm/i915: Wait for render state init 2015-03-16 13:51 ` Chris Wilson @ 2015-03-16 14:20 ` Mika Kuoppala 2015-03-16 14:38 ` Chris Wilson 0 siblings, 1 reply; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 14:20 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > On Mon, Mar 16, 2015 at 03:46:34PM +0200, Mika Kuoppala wrote: >> We are freeing the batch that is just pushed to ring. > > Kill this. It is not correct. > >> Further, other ring inits should assume that the render >> context with the workarounds are pushed before their rings >> execute anything. >> >> This fixes (or papers over) a problem where with full ppgtt >> the blitter ring init sometimes fails, where the blt ring >> ACTHD runs wild through the address space until eventually >> hangcheck kills it. > > Interesting. We don't do anything inside the golden render state to be > of relevance to the blitter engine. So I think this is pure paper and > will not ultimately fix the problem. > -Chris > Perhaps not in the golden state, but do we push any workarounds through render ring init, that are global in scope and required by the other rings? -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] drm/i915: Wait for render state init 2015-03-16 14:20 ` Mika Kuoppala @ 2015-03-16 14:38 ` Chris Wilson 2015-03-16 15:56 ` Mika Kuoppala 0 siblings, 1 reply; 14+ messages in thread From: Chris Wilson @ 2015-03-16 14:38 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx On Mon, Mar 16, 2015 at 04:20:46PM +0200, Mika Kuoppala wrote: > Perhaps not in the golden state, but do we push any workarounds through > render ring init, that are global in scope and required by the other rings? No. Afaict the w/a only apply to render. Or else we have been horribly misinformed. I still suspect it is something due to the ordering of pd loads between rings. Serialising the render context init should not fix the similar hangs we see at runtime. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] drm/i915: Wait for render state init 2015-03-16 14:38 ` Chris Wilson @ 2015-03-16 15:56 ` Mika Kuoppala 2015-03-16 15:58 ` [PATCH] drm/i915: Push mm switch immediately to ring Mika Kuoppala 0 siblings, 1 reply; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 15:56 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > On Mon, Mar 16, 2015 at 04:20:46PM +0200, Mika Kuoppala wrote: >> Perhaps not in the golden state, but do we push any workarounds through >> render ring init, that are global in scope and required by the other rings? > > No. Afaict the w/a only apply to render. Or else we have been horribly > misinformed. I still suspect it is something due to the ordering of pd > loads between rings. Serialising the render context init should not fix > the similar hangs we see at runtime. Ok please ignore this patch then. I will send another that does the immediate tail updates on ring->mm_switch. -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] drm/i915: Push mm switch immediately to ring 2015-03-16 15:56 ` Mika Kuoppala @ 2015-03-16 15:58 ` Mika Kuoppala 2015-03-16 21:35 ` Chris Wilson 0 siblings, 1 reply; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 15:58 UTC (permalink / raw) To: intel-gfx Sometimes when first batch is run with blitter ring, and even tho its PP_BASE has been properly set, the ring seems to be very confused about its memory view. The ring ACHTD keeps progressing past batch boundary, towards the end off address space. Eventually after quite amount of time, hangcheck will declare ring as hanged. But the proceeding reset can run into the same problem with the ring init. If we update the tail immediately after mm switch, the render ring will have a proper pd loaded, before the blitter ring is initialized. This fixes, or papers over, the blitter 'ACTHD proceeding through address space without progress' problem that has been quite a while in ppgtt=2 ring init. Also with Chris Wilson's patch to report GPU faults, I got page faults with unloaded PD's from address zero when running, igt/gem_ppgtt. This patch fixes also those. Testcase: igt/gem_ppgtt Testcase: igt/gem_reset_stats --r ban-blt Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 2034f7c..bbfcbb6 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -390,7 +390,7 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry, intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry)); intel_ring_emit(ring, (u32)(val)); - intel_ring_advance(ring); + __intel_ring_advance(ring); return 0; } @@ -896,7 +896,7 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt, intel_ring_emit(ring, RING_PP_DIR_BASE(ring)); intel_ring_emit(ring, get_pd_offset(ppgtt)); intel_ring_emit(ring, MI_NOOP); - intel_ring_advance(ring); + __intel_ring_advance(ring); return 0; } @@ -931,7 +931,6 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, intel_ring_emit(ring, RING_PP_DIR_BASE(ring)); intel_ring_emit(ring, get_pd_offset(ppgtt)); intel_ring_emit(ring, MI_NOOP); - intel_ring_advance(ring); /* XXX: RCS is the only one to auto invalidate the TLBs? */ if (ring->id != RCS) { @@ -940,6 +939,8 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, return ret; } + __intel_ring_advance(ring); + return 0; } -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: Push mm switch immediately to ring 2015-03-16 15:58 ` [PATCH] drm/i915: Push mm switch immediately to ring Mika Kuoppala @ 2015-03-16 21:35 ` Chris Wilson 2015-03-17 16:10 ` Mika Kuoppala 0 siblings, 1 reply; 14+ messages in thread From: Chris Wilson @ 2015-03-16 21:35 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx On Mon, Mar 16, 2015 at 05:58:12PM +0200, Mika Kuoppala wrote: > Sometimes when first batch is run with blitter ring, > @@ -931,7 +931,6 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, > intel_ring_emit(ring, RING_PP_DIR_BASE(ring)); > intel_ring_emit(ring, get_pd_offset(ppgtt)); > intel_ring_emit(ring, MI_NOOP); > - intel_ring_advance(ring); > > /* XXX: RCS is the only one to auto invalidate the TLBs? */ > if (ring->id != RCS) { > @@ -940,6 +939,8 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, > return ret; > } > > + __intel_ring_advance(ring); > + I really would like a comment before each of these, something like /* XXX This papers over a bug loading PD prior to batch execution */ This last chunk is buggy. The earlier unadvanced return will break the ringbuffer state tracking. Do you have equal success just with the plain s/intel_ring_advance/__intel_ring_advance/ ? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: Push mm switch immediately to ring 2015-03-16 21:35 ` Chris Wilson @ 2015-03-17 16:10 ` Mika Kuoppala 0 siblings, 0 replies; 14+ messages in thread From: Mika Kuoppala @ 2015-03-17 16:10 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > On Mon, Mar 16, 2015 at 05:58:12PM +0200, Mika Kuoppala wrote: >> Sometimes when first batch is run with blitter ring, >> @@ -931,7 +931,6 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, >> intel_ring_emit(ring, RING_PP_DIR_BASE(ring)); >> intel_ring_emit(ring, get_pd_offset(ppgtt)); >> intel_ring_emit(ring, MI_NOOP); >> - intel_ring_advance(ring); >> >> /* XXX: RCS is the only one to auto invalidate the TLBs? */ >> if (ring->id != RCS) { >> @@ -940,6 +939,8 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt, >> return ret; >> } >> >> + __intel_ring_advance(ring); >> + > > I really would like a comment before each of these, something like > /* XXX This papers over a bug loading PD prior to batch execution */ > > This last chunk is buggy. The earlier unadvanced return will break the > ringbuffer state tracking. Do you have equal success just with the plain > s/intel_ring_advance/__intel_ring_advance/ ? I have managed to create a series that doesn't need this tail update trickery at all. Please ignore this series. -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] drm/i915: Detect page faults during hangcheck 2015-03-16 13:46 [PATCH 0/3] ring/context initialization patches Mika Kuoppala 2015-03-16 13:46 ` [PATCH 1/3] drm/i915: Wait for render state init Mika Kuoppala @ 2015-03-16 13:46 ` Mika Kuoppala 2015-03-16 17:53 ` Daniel Vetter 2015-03-16 13:46 ` [PATCH 3/3] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Mika Kuoppala 2 siblings, 1 reply; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 13:46 UTC (permalink / raw) To: intel-gfx From: Chris Wilson <chris@chris-wilson.co.uk> On Sandybridge+, the GPU provides the ERROR register for detecting page faults. Hook this up to our hangcheck so that we can dump the error state soon after such an event occurs. This would be better inside an interrupt handler, but it serves a purpose here as it detects that our initial context setup is invalid... Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_irq.c | 5 +++++ drivers/gpu/drm/i915/intel_uncore.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 49ad5fb..ea668fc 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2929,6 +2929,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (!i915.enable_hangcheck) return; + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { + i915_handle_error(dev, false, "GPU reported a page fault"); + I915_WRITE(ERROR_GEN6, 0); + } + for_each_ring(ring, dev_priv, i) { u64 acthd; u32 seqno; diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index ab5cc94..c58535e 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -359,6 +359,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev, if (IS_GEN6(dev) || IS_GEN7(dev)) __raw_i915_write32(dev_priv, GTFIFODBG, __raw_i915_read32(dev_priv, GTFIFODBG)); + if (INTEL_INFO(dev)->gen >= 6) + __raw_i915_write32(dev_priv, ERROR_GEN6, 0); intel_uncore_forcewake_reset(dev, restore_forcewake); } -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] drm/i915: Detect page faults during hangcheck 2015-03-16 13:46 ` [PATCH 2/3] drm/i915: Detect page faults during hangcheck Mika Kuoppala @ 2015-03-16 17:53 ` Daniel Vetter 2015-03-16 21:30 ` Chris Wilson 0 siblings, 1 reply; 14+ messages in thread From: Daniel Vetter @ 2015-03-16 17:53 UTC (permalink / raw) To: Mika Kuoppala; +Cc: intel-gfx On Mon, Mar 16, 2015 at 03:46:35PM +0200, Mika Kuoppala wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > On Sandybridge+, the GPU provides the ERROR register for detecting page > faults. Hook this up to our hangcheck so that we can dump the error > state soon after such an event occurs. This would be better inside an > interrupt handler, but it serves a purpose here as it detects that our > initial context setup is invalid... > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_irq.c | 5 +++++ > drivers/gpu/drm/i915/intel_uncore.c | 2 ++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 49ad5fb..ea668fc 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2929,6 +2929,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > if (!i915.enable_hangcheck) > return; > > + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { > + i915_handle_error(dev, false, "GPU reported a page fault"); > + I915_WRITE(ERROR_GEN6, 0); Shouldn't we also at least report the bits from the ERROR register somewhere? Or are they supremely useless? -Daniel > + } > + > for_each_ring(ring, dev_priv, i) { > u64 acthd; > u32 seqno; > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index ab5cc94..c58535e 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -359,6 +359,8 @@ static void __intel_uncore_early_sanitize(struct drm_device *dev, > if (IS_GEN6(dev) || IS_GEN7(dev)) > __raw_i915_write32(dev_priv, GTFIFODBG, > __raw_i915_read32(dev_priv, GTFIFODBG)); > + if (INTEL_INFO(dev)->gen >= 6) > + __raw_i915_write32(dev_priv, ERROR_GEN6, 0); > > intel_uncore_forcewake_reset(dev, restore_forcewake); > } > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] drm/i915: Detect page faults during hangcheck 2015-03-16 17:53 ` Daniel Vetter @ 2015-03-16 21:30 ` Chris Wilson 2015-03-17 10:31 ` Daniel Vetter 0 siblings, 1 reply; 14+ messages in thread From: Chris Wilson @ 2015-03-16 21:30 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Mon, Mar 16, 2015 at 06:53:40PM +0100, Daniel Vetter wrote: > On Mon, Mar 16, 2015 at 03:46:35PM +0200, Mika Kuoppala wrote: > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > On Sandybridge+, the GPU provides the ERROR register for detecting page > > faults. Hook this up to our hangcheck so that we can dump the error > > state soon after such an event occurs. This would be better inside an > > interrupt handler, but it serves a purpose here as it detects that our > > initial context setup is invalid... > > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_irq.c | 5 +++++ > > drivers/gpu/drm/i915/intel_uncore.c | 2 ++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > > index 49ad5fb..ea668fc 100644 > > --- a/drivers/gpu/drm/i915/i915_irq.c > > +++ b/drivers/gpu/drm/i915/i915_irq.c > > @@ -2929,6 +2929,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > > if (!i915.enable_hangcheck) > > return; > > > > + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { > > + i915_handle_error(dev, false, "GPU reported a page fault"); > > + I915_WRITE(ERROR_GEN6, 0); > > Shouldn't we also at least report the bits from the ERROR register > somewhere? Or are they supremely useless? It's recorded in the hangcheck already - that's how I knew we were generating pagefaults during module init in the first place. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] drm/i915: Detect page faults during hangcheck 2015-03-16 21:30 ` Chris Wilson @ 2015-03-17 10:31 ` Daniel Vetter 0 siblings, 0 replies; 14+ messages in thread From: Daniel Vetter @ 2015-03-17 10:31 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Mika Kuoppala, intel-gfx On Mon, Mar 16, 2015 at 09:30:44PM +0000, Chris Wilson wrote: > On Mon, Mar 16, 2015 at 06:53:40PM +0100, Daniel Vetter wrote: > > On Mon, Mar 16, 2015 at 03:46:35PM +0200, Mika Kuoppala wrote: > > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > On Sandybridge+, the GPU provides the ERROR register for detecting page > > > faults. Hook this up to our hangcheck so that we can dump the error > > > state soon after such an event occurs. This would be better inside an > > > interrupt handler, but it serves a purpose here as it detects that our > > > initial context setup is invalid... > > > > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_irq.c | 5 +++++ > > > drivers/gpu/drm/i915/intel_uncore.c | 2 ++ > > > 2 files changed, 7 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > > > index 49ad5fb..ea668fc 100644 > > > --- a/drivers/gpu/drm/i915/i915_irq.c > > > +++ b/drivers/gpu/drm/i915/i915_irq.c > > > @@ -2929,6 +2929,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > > > if (!i915.enable_hangcheck) > > > return; > > > > > > + if (INTEL_INFO(dev_priv)->gen >= 6 && I915_READ(ERROR_GEN6)) { > > > + i915_handle_error(dev, false, "GPU reported a page fault"); > > > + I915_WRITE(ERROR_GEN6, 0); > > > > Shouldn't we also at least report the bits from the ERROR register > > somewhere? Or are they supremely useless? > > It's recorded in the hangcheck already - that's how I knew we were > generating pagefaults during module init in the first place. Ah right missed that, so I think we're covered. Maybe a comment that handle_error captures ERROR_GEN6? Doesn't really need to be though, just in case someone decides to reorder the clearing with the handling. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] drm/i915: Reorder hw init to avoid executing with invalid context/mm state 2015-03-16 13:46 [PATCH 0/3] ring/context initialization patches Mika Kuoppala 2015-03-16 13:46 ` [PATCH 1/3] drm/i915: Wait for render state init Mika Kuoppala 2015-03-16 13:46 ` [PATCH 2/3] drm/i915: Detect page faults during hangcheck Mika Kuoppala @ 2015-03-16 13:46 ` Mika Kuoppala 2 siblings, 0 replies; 14+ messages in thread From: Mika Kuoppala @ 2015-03-16 13:46 UTC (permalink / raw) To: intel-gfx From: Chris Wilson <chris@chris-wilson.co.uk> Currently we initialise the rings, add the first context switch to the ring and execute our golden state then enable (aliasing or full) ppgtt. However, as we enable ppgtt using direct MMIO but load the PD using MI_LRI, we end up executing the context switch and golden render state with an invalid PD generating page faults. To solve this issue, first do the ppgtt PD setup, then set the default context and write the commands to run the render state into the ring, before we activate the ring. This allows us to be sure that the register state is valid before we begin execution. This was spotted when writing the seqno/request conversion, but only with the ERROR capture did I realise that it was a necessity now. RFC: cleanup the error handling in i915_gem_init_hw. v2: added intel_ring_reset Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++-------- drivers/gpu/drm/i915/intel_ringbuffer.c | 26 ++++++++++++++++++++++---- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 3 files changed, 34 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0fe313d..e154770 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4817,6 +4817,9 @@ i915_gem_init_hw(struct drm_device *dev) } } + for_each_ring(ring, dev_priv, i) + intel_ring_reset(ring); + i915_gem_init_swizzling(dev); /* @@ -4827,19 +4830,19 @@ i915_gem_init_hw(struct drm_device *dev) */ init_unused_rings(dev); - for_each_ring(ring, dev_priv, i) { - ret = ring->init_hw(ring); - if (ret) - goto out; + ret = i915_ppgtt_init_hw(dev); + if (ret && ret != -EIO) { + DRM_ERROR("PPGTT enable failed %d\n", ret); + i915_gem_cleanup_ringbuffer(dev); } for (i = 0; i < NUM_L3_SLICES(dev); i++) i915_gem_l3_remap(&dev_priv->ring[RCS], i); - ret = i915_ppgtt_init_hw(dev); - if (ret && ret != -EIO) { - DRM_ERROR("PPGTT enable failed %d\n", ret); - i915_gem_cleanup_ringbuffer(dev); + for_each_ring(ring, dev_priv, i) { + ret = ring->init_hw(ring); + if (ret) + goto out; } ret = i915_gem_context_enable(dev_priv); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 441e250..0f97588 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -560,6 +560,22 @@ static bool stop_ring(struct intel_engine_cs *ring) return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0; } +void intel_ring_reset(struct intel_engine_cs *ring) +{ + struct intel_ringbuffer *ringbuf = ring->buffer; + + WARN_ON(!stop_ring(ring)); + + if (WARN_ON(ringbuf == NULL)) + return; + + ringbuf->head = 0; + ringbuf->tail = 0; + + ringbuf->last_retired_head = -1; +} + + static int init_ring_common(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; @@ -614,6 +630,9 @@ static int init_ring_common(struct intel_engine_cs *ring) I915_WRITE_HEAD(ring, 0); (void)I915_READ_HEAD(ring); + I915_WRITE_HEAD(ring, ringbuf->head); + I915_WRITE_TAIL(ring, ringbuf->head); + I915_WRITE_CTL(ring, ((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID); @@ -621,7 +640,7 @@ static int init_ring_common(struct intel_engine_cs *ring) /* If the head is still not zero, the ring is dead */ if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 && I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) && - (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) { + (I915_READ_HEAD(ring) & HEAD_ADDR) == ringbuf->head, 50)) { DRM_ERROR("%s initialization failed " "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n", ring->name, @@ -632,13 +651,12 @@ static int init_ring_common(struct intel_engine_cs *ring) goto out; } + I915_WRITE_TAIL(ring, ringbuf->tail); + ringbuf->last_retired_head = -1; - ringbuf->head = I915_READ_HEAD(ring); - ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR; intel_ring_update_space(ringbuf); memset(&ring->hangcheck, 0, sizeof(ring->hangcheck)); - out: intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index c761fe0..168a670 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -408,6 +408,7 @@ int __intel_ring_space(int head, int tail, int size); void intel_ring_update_space(struct intel_ringbuffer *ringbuf); int intel_ring_space(struct intel_ringbuffer *ringbuf); bool intel_ring_stopped(struct intel_engine_cs *ring); +void intel_ring_reset(struct intel_engine_cs *ring); void __intel_ring_advance(struct intel_engine_cs *ring); int __must_check intel_ring_idle(struct intel_engine_cs *ring); -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-03-17 16:10 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-16 13:46 [PATCH 0/3] ring/context initialization patches Mika Kuoppala 2015-03-16 13:46 ` [PATCH 1/3] drm/i915: Wait for render state init Mika Kuoppala 2015-03-16 13:51 ` Chris Wilson 2015-03-16 14:20 ` Mika Kuoppala 2015-03-16 14:38 ` Chris Wilson 2015-03-16 15:56 ` Mika Kuoppala 2015-03-16 15:58 ` [PATCH] drm/i915: Push mm switch immediately to ring Mika Kuoppala 2015-03-16 21:35 ` Chris Wilson 2015-03-17 16:10 ` Mika Kuoppala 2015-03-16 13:46 ` [PATCH 2/3] drm/i915: Detect page faults during hangcheck Mika Kuoppala 2015-03-16 17:53 ` Daniel Vetter 2015-03-16 21:30 ` Chris Wilson 2015-03-17 10:31 ` Daniel Vetter 2015-03-16 13:46 ` [PATCH 3/3] drm/i915: Reorder hw init to avoid executing with invalid context/mm state Mika Kuoppala
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox