* [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
@ 2016-07-21 6:57 Chris Wilson
2016-07-21 6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-21 6:57 UTC (permalink / raw)
To: intel-gfx
During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.
References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40047eb48826..9e826585edb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
rearm_hangcheck = false;
stuck_engines = intel_kick_waiters(dev_priv);
- if (unlikely(stuck_engines)) {
- DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
- dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
- }
+ if (unlikely(stuck_engines))
+ DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
+ stuck_engines);
if (INTEL_GEN(dev_priv) >= 6)
gen6_rps_idle(dev_priv);
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling 2016-07-21 6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson @ 2016-07-21 6:57 ` Chris Wilson 2016-07-21 6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Chris Wilson @ 2016-07-21 6:57 UTC (permalink / raw) To: intel-gfx In order to close a race with a long running hangcheck comparing a stale interrupt counter with a just started waiter, we need to first bump the counter as we start the fresh wait. References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> --- drivers/gpu/drm/i915/intel_breadcrumbs.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c index f0b56e3f4abe..d893ccdd62ac 100644 --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c @@ -51,6 +51,13 @@ static void irq_enable(struct intel_engine_cs *engine) */ engine->breadcrumbs.irq_posted = true; + /* Make sure the current hangcheck doesn't falsely accuse a just + * started irq handler from missing an interrupt (because the + * interrupt count still matches the stale value from when + * the irq handler was disabled, many hangchecks ago). + */ + engine->breadcrumbs.irq_wakeups++; + spin_lock_irq(&engine->i915->irq_lock); engine->irq_enable(engine); spin_unlock_irq(&engine->i915->irq_lock); -- 2.8.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt 2016-07-21 6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson 2016-07-21 6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson @ 2016-07-21 6:57 ` Chris Wilson 2016-07-21 18:34 ` Chris Wilson 2016-07-22 7:57 ` Chris Wilson 2016-07-21 7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork 2016-07-21 9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin 3 siblings, 2 replies; 12+ messages in thread From: Chris Wilson @ 2016-07-21 6:57 UTC (permalink / raw) To: intel-gfx As the interrupt wakeup counter only increments when we have a waiter, before testing to see if that counter is unchanged we have to first check that we do expect it to change (i.e. we have a waiter). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_irq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 7104dc1463eb..45afcdfe89b1 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3062,7 +3062,9 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine) struct drm_i915_private *i915 = engine->i915; unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups); - if (engine->hangcheck.user_interrupts == irq_count && + rcu_read_lock(); + if (intel_engine_wakeup(engine) && + engine->hangcheck.user_interrupts == irq_count && !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) { if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) DRM_ERROR("Hangcheck timer elapsed... %s idle\n", @@ -3070,6 +3072,7 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine) intel_engine_enable_fake_irq(engine); } + rcu_read_unlock(); return irq_count; } -- 2.8.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt 2016-07-21 6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson @ 2016-07-21 18:34 ` Chris Wilson 2016-07-22 7:57 ` Chris Wilson 1 sibling, 0 replies; 12+ messages in thread From: Chris Wilson @ 2016-07-21 18:34 UTC (permalink / raw) To: intel-gfx On Thu, Jul 21, 2016 at 07:57:39AM +0100, Chris Wilson wrote: > As the interrupt wakeup counter only increments when we have a waiter, > before testing to see if that counter is unchanged we have to first > check that we do expect it to change (i.e. we have a waiter). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fwiw, this series got a 10/10 pass rate for that bdw-u. It's not much, but maybe an improvement? https://bugs.freedesktop.org/show_bug.cgi?id=96974#c5 Tested-by: Humberto Israel Perez Rodriguez <humberto.i.perez.rodriguez@intel.com> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt 2016-07-21 6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson 2016-07-21 18:34 ` Chris Wilson @ 2016-07-22 7:57 ` Chris Wilson 1 sibling, 0 replies; 12+ messages in thread From: Chris Wilson @ 2016-07-22 7:57 UTC (permalink / raw) To: intel-gfx On Thu, Jul 21, 2016 at 07:57:39AM +0100, Chris Wilson wrote: > As the interrupt wakeup counter only increments when we have a waiter, > before testing to see if that counter is unchanged we have to first > check that we do expect it to change (i.e. we have a waiter). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_irq.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 7104dc1463eb..45afcdfe89b1 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -3062,7 +3062,9 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine) > struct drm_i915_private *i915 = engine->i915; > unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups); > > - if (engine->hangcheck.user_interrupts == irq_count && > + rcu_read_lock(); > + if (intel_engine_wakeup(engine) && > + engine->hangcheck.user_interrupts == irq_count && Sigh. Completely nerfs the detection of stuck waiters. Should be if (engine->hangcheck.user_interrupts == irq_count && intel_engine_wakeup(engine) && The test itself doesn't imply a missed interrupt either, there be a valid long lived batch causing a delay in the waiter. We can do better if we allow ourselves to take a spinlock here. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson 2016-07-21 6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson 2016-07-21 6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson @ 2016-07-21 7:55 ` Patchwork 2016-07-21 9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin 3 siblings, 0 replies; 12+ messages in thread From: Patchwork @ 2016-07-21 7:55 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker URL : https://patchwork.freedesktop.org/series/10124/ State : success == Summary == Series 10124v1 Series without cover letter http://patchwork.freedesktop.org/api/1.0/series/10124/revisions/1/mbox fi-hsw-i7-4770k total:244 pass:216 dwarn:0 dfail:0 fail:8 skip:20 fi-kbl-qkkr total:244 pass:180 dwarn:29 dfail:0 fail:8 skip:27 fi-skl-i5-6260u total:244 pass:224 dwarn:0 dfail:0 fail:8 skip:12 fi-skl-i7-6700k total:244 pass:210 dwarn:0 dfail:0 fail:8 skip:26 fi-snb-i7-2600 total:244 pass:196 dwarn:0 dfail:0 fail:8 skip:40 ro-bdw-i5-5250u total:244 pass:219 dwarn:4 dfail:0 fail:8 skip:13 ro-bdw-i7-5557U total:244 pass:220 dwarn:2 dfail:0 fail:8 skip:14 ro-bdw-i7-5600u total:244 pass:204 dwarn:0 dfail:0 fail:8 skip:32 ro-bsw-n3050 total:218 pass:173 dwarn:0 dfail:0 fail:2 skip:42 ro-byt-n2820 total:244 pass:197 dwarn:0 dfail:0 fail:9 skip:38 ro-hsw-i3-4010u total:244 pass:212 dwarn:0 dfail:0 fail:8 skip:24 ro-hsw-i7-4770r total:244 pass:212 dwarn:0 dfail:0 fail:8 skip:24 ro-ilk-i7-620lm total:244 pass:172 dwarn:0 dfail:0 fail:9 skip:63 ro-ilk1-i5-650 total:239 pass:172 dwarn:0 dfail:0 fail:9 skip:58 ro-ivb-i7-3770 total:244 pass:203 dwarn:0 dfail:0 fail:8 skip:33 ro-skl3-i5-6260u total:244 pass:224 dwarn:0 dfail:0 fail:8 skip:12 ro-snb-i7-2620M total:244 pass:193 dwarn:0 dfail:0 fail:9 skip:42 Results at /archive/results/CI_IGT_test/RO_Patchwork_1555/ 621e9dc drm-intel-nightly: 2016y-07m-20d-19h-19m-37s UTC integration manifest baf6dda drm/i915: Check for a stuck waiter before a missed interrupt 9bbfde3 drm/i915: Update the breadcrumb interrupt counter before enabling 9741fde drm/i915: Drop racy markup of missed-irqs from idle-worker _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson ` (2 preceding siblings ...) 2016-07-21 7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork @ 2016-07-21 9:58 ` Tvrtko Ursulin 2016-07-21 10:10 ` Chris Wilson 3 siblings, 1 reply; 12+ messages in thread From: Tvrtko Ursulin @ 2016-07-21 9:58 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 21/07/16 07:57, Chris Wilson wrote: > During the idle-worker we disable the hangcheck and so kick any waiters > that should have been completed (since the GPU is now idle). Unlike the > hangcheck, we do not take any care to avoid the race between the irq > handler and ourselves, and so it is possible for us to declare a missed > interrupt even as the bottom-half is being scheduled to run. Let's > ignore this race to stop a potential false-positive error. If the bottom half is scheduled to run then then.. > References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 40047eb48826..9e826585edb2 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) > rearm_hangcheck = false; > > stuck_engines = intel_kick_waiters(dev_priv); ... this will not return a stucked engine since the there is a bh task assigned all until the bh exits. So I don't get it. :) Regards, Tvrtko > - if (unlikely(stuck_engines)) { > - DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n"); > - dev_priv->gpu_error.missed_irq_rings |= stuck_engines; > - } > + if (unlikely(stuck_engines)) > + DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n", > + stuck_engines); > > if (INTEL_GEN(dev_priv) >= 6) > gen6_rps_idle(dev_priv); > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin @ 2016-07-21 10:10 ` Chris Wilson 2016-07-21 10:28 ` Tvrtko Ursulin 0 siblings, 1 reply; 12+ messages in thread From: Chris Wilson @ 2016-07-21 10:10 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: intel-gfx On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: > > On 21/07/16 07:57, Chris Wilson wrote: > >During the idle-worker we disable the hangcheck and so kick any waiters > >that should have been completed (since the GPU is now idle). Unlike the > >hangcheck, we do not take any care to avoid the race between the irq > >handler and ourselves, and so it is possible for us to declare a missed > >interrupt even as the bottom-half is being scheduled to run. Let's > >ignore this race to stop a potential false-positive error. > > If the bottom half is scheduled to run then then.. > > >References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 > >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > >--- > > drivers/gpu/drm/i915/i915_gem.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >index 40047eb48826..9e826585edb2 100644 > >--- a/drivers/gpu/drm/i915/i915_gem.c > >+++ b/drivers/gpu/drm/i915/i915_gem.c > >@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) > > rearm_hangcheck = false; > > > > stuck_engines = intel_kick_waiters(dev_priv); > > ... this will not return a stucked engine since the there is a bh > task assigned all until the bh exits. It reports if it wakes up a waiter on any engine. If the bh is already running, we cannot know if it has missed the seqno update. If it isn't running yet, we cannot know if it is about to be run. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 10:10 ` Chris Wilson @ 2016-07-21 10:28 ` Tvrtko Ursulin 2016-07-21 11:04 ` Chris Wilson 0 siblings, 1 reply; 12+ messages in thread From: Tvrtko Ursulin @ 2016-07-21 10:28 UTC (permalink / raw) To: Chris Wilson, intel-gfx, Joonas Lahtinen On 21/07/16 11:10, Chris Wilson wrote: > On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: >> >> On 21/07/16 07:57, Chris Wilson wrote: >>> During the idle-worker we disable the hangcheck and so kick any waiters >>> that should have been completed (since the GPU is now idle). Unlike the >>> hangcheck, we do not take any care to avoid the race between the irq >>> handler and ourselves, and so it is possible for us to declare a missed >>> interrupt even as the bottom-half is being scheduled to run. Let's >>> ignore this race to stop a potential false-positive error. >> >> If the bottom half is scheduled to run then then.. >> >>> References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_gem.c | 7 +++---- >>> 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >>> index 40047eb48826..9e826585edb2 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem.c >>> +++ b/drivers/gpu/drm/i915/i915_gem.c >>> @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) >>> rearm_hangcheck = false; >>> >>> stuck_engines = intel_kick_waiters(dev_priv); >> >> ... this will not return a stucked engine since the there is a bh >> task assigned all until the bh exits. > > It reports if it wakes up a waiter on any engine. If the bh is already > running, we cannot know if it has missed the seqno update. If it isn't > running yet, we cannot know if it is about to be run. Oh I read the logic as completely opposite than what it is. Since the idle worker runs 100ms after last retirement, that would mean a really slow waiter or what? Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 10:28 ` Tvrtko Ursulin @ 2016-07-21 11:04 ` Chris Wilson 2016-07-22 10:10 ` Tvrtko Ursulin 0 siblings, 1 reply; 12+ messages in thread From: Chris Wilson @ 2016-07-21 11:04 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: intel-gfx On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote: > > On 21/07/16 11:10, Chris Wilson wrote: > >On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: > >> > >>On 21/07/16 07:57, Chris Wilson wrote: > >>>During the idle-worker we disable the hangcheck and so kick any waiters > >>>that should have been completed (since the GPU is now idle). Unlike the > >>>hangcheck, we do not take any care to avoid the race between the irq > >>>handler and ourselves, and so it is possible for us to declare a missed > >>>interrupt even as the bottom-half is being scheduled to run. Let's > >>>ignore this race to stop a potential false-positive error. > >> > >>If the bottom half is scheduled to run then then.. > >> > >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 > >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >>>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > >>>--- > >>> drivers/gpu/drm/i915/i915_gem.c | 7 +++---- > >>> 1 file changed, 3 insertions(+), 4 deletions(-) > >>> > >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>>index 40047eb48826..9e826585edb2 100644 > >>>--- a/drivers/gpu/drm/i915/i915_gem.c > >>>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>>@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) > >>> rearm_hangcheck = false; > >>> > >>> stuck_engines = intel_kick_waiters(dev_priv); > >> > >>... this will not return a stucked engine since the there is a bh > >>task assigned all until the bh exits. > > > >It reports if it wakes up a waiter on any engine. If the bh is already > >running, we cannot know if it has missed the seqno update. If it isn't > >running yet, we cannot know if it is about to be run. > > Oh I read the logic as completely opposite than what it is. > > Since the idle worker runs 100ms after last retirement, that would > mean a really slow waiter or what? It is dubious. But the idle worker runs 100ms after the first time we detect all engines are idle and may be running as we detect all engines are idle again. The only thing for sure is that in some cases that bdw-u is reaching the idle-worker with an unwoken engine (and that there is a race here in declaring it as a missed interrupt). I wasn't that concerned about the race because of that 100ms delay where eveything should have been idle, but on reflection that 100ms is not guarranteed. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-21 11:04 ` Chris Wilson @ 2016-07-22 10:10 ` Tvrtko Ursulin 2016-07-22 10:22 ` Chris Wilson 0 siblings, 1 reply; 12+ messages in thread From: Tvrtko Ursulin @ 2016-07-22 10:10 UTC (permalink / raw) To: Chris Wilson, intel-gfx, Joonas Lahtinen On 21/07/16 12:04, Chris Wilson wrote: > On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote: >> On 21/07/16 11:10, Chris Wilson wrote: >>> On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: >>>> >>>> On 21/07/16 07:57, Chris Wilson wrote: >>>>> During the idle-worker we disable the hangcheck and so kick any waiters >>>>> that should have been completed (since the GPU is now idle). Unlike the >>>>> hangcheck, we do not take any care to avoid the race between the irq >>>>> handler and ourselves, and so it is possible for us to declare a missed >>>>> interrupt even as the bottom-half is being scheduled to run. Let's >>>>> ignore this race to stop a potential false-positive error. >>>> >>>> If the bottom half is scheduled to run then then.. >>>> >>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 >>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>>>> --- >>>>> drivers/gpu/drm/i915/i915_gem.c | 7 +++---- >>>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >>>>> index 40047eb48826..9e826585edb2 100644 >>>>> --- a/drivers/gpu/drm/i915/i915_gem.c >>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c >>>>> @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) >>>>> rearm_hangcheck = false; >>>>> >>>>> stuck_engines = intel_kick_waiters(dev_priv); >>>> >>>> ... this will not return a stucked engine since the there is a bh >>>> task assigned all until the bh exits. >>> >>> It reports if it wakes up a waiter on any engine. If the bh is already >>> running, we cannot know if it has missed the seqno update. If it isn't >>> running yet, we cannot know if it is about to be run. >> >> Oh I read the logic as completely opposite than what it is. >> >> Since the idle worker runs 100ms after last retirement, that would >> mean a really slow waiter or what? > > It is dubious. But the idle worker runs 100ms after the first time we > detect all engines are idle and may be running as we detect all engines > are idle again. The only thing for sure is that in some cases that bdw-u > is reaching the idle-worker with an unwoken engine (and that there is > a race here in declaring it as a missed interrupt). I wasn't that > concerned about the race because of that 100ms delay where eveything > should have been idle, but on reflection that 100ms is not guarranteed. Would canceling the idle worker be to expensive? Either way, looks OK to me. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker 2016-07-22 10:10 ` Tvrtko Ursulin @ 2016-07-22 10:22 ` Chris Wilson 0 siblings, 0 replies; 12+ messages in thread From: Chris Wilson @ 2016-07-22 10:22 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: intel-gfx On Fri, Jul 22, 2016 at 11:10:28AM +0100, Tvrtko Ursulin wrote: > > Would canceling the idle worker be to expensive? It wasn't as much as that, I was trying to keep runtime suspend simple. In that the GT takes the wakelock to prevent suspend as required and not have the knowledge about all the users of the device inside runtime management callbacks. (It means the users then have to be concious that if they don't hold an explicit wakelock, they should check rpm first.) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-07-22 10:22 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-21 6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson 2016-07-21 6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson 2016-07-21 6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson 2016-07-21 18:34 ` Chris Wilson 2016-07-22 7:57 ` Chris Wilson 2016-07-21 7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork 2016-07-21 9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin 2016-07-21 10:10 ` Chris Wilson 2016-07-21 10:28 ` Tvrtko Ursulin 2016-07-21 11:04 ` Chris Wilson 2016-07-22 10:10 ` Tvrtko Ursulin 2016-07-22 10:22 ` Chris Wilson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox