All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ramalingam C <ramalingam.c@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets
Date: Thu, 24 Oct 2019 12:07:20 +0530	[thread overview]
Message-ID: <20191024063720.GA24164@intel.com> (raw)
In-Reply-To: <157189769673.18724.14469138471695392502@skylake-alporthouse-com>

On 2019-10-24 at 07:14:56 +0100, Chris Wilson wrote:
> Quoting Ramalingam C (2019-10-24 02:32:01)
> > On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote:
> > > If we are doing a normal GPU reset triggered after detecting a long
> > > period of stalled work, we can take our time and allow the engines to
> > > quiesce. Since we've stopped submission to the engine, and if we wait
> > > long enough an innocent context should complete, leaving the engine idle.
> > > So by waiting a short amount of time, we should prevent clobbering other
> > > users when resetting a stuck context.
> > > 
> > > Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Suggested-by: Jon Bloomfield <jon.bloomfield@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
> > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
> > >  3 files changed, 34 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > > index 48df8889a88a..3a3881d5e44b 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> > >         May be 0 to disable the initial spin. In practice, we estimate
> > >         the cost of enabling the interrupt (if currently disabled) to be
> > >         a few microseconds.
> > > +
> > > +config DRM_I915_STOP_TIMEOUT
> > > +     int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > > +     default 100 # milliseconds
> > > +     help
> > > +       By stopping submission and sleeping for a short time before resetting
> > > +       the GPU, we allow the innocent contexts also on the system to quiesce.
> > > +       It is then less likely for a hanging context to cause collateral
> > > +       damage as the system is reset in order to recover. The corollary is
> > > +       that the reset itself may take longer and so be more disruptive to
> > > +       interactive or low latency workloads.
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 0e20713603ec..e4203eb44139 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > >       engine->instance = info->instance;
> > >       __sprint_engine_name(engine);
> > >  
> > > +     engine->props.stop_timeout_ms =
> > > +             CONFIG_DRM_I915_STOP_TIMEOUT;
> > Compare to previous version where you used the CONFIG variable directly,
> > what is the benefit of using it through a variable? So that we could
> > alter it in runtime?
> 
> That is what the next series proposes. Also it turns out to be more
> easily testable if one is able to alter the timeouts for testing.
Thanks for explaining Chris!

-Ram.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Ramalingam C <ramalingam.c@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets
Date: Thu, 24 Oct 2019 12:07:20 +0530	[thread overview]
Message-ID: <20191024063720.GA24164@intel.com> (raw)
Message-ID: <20191024063720.8bM6GjqTixDT5H9_agXp0fwC_x-2nZt4VicwxiRbjQ8@z> (raw)
In-Reply-To: <157189769673.18724.14469138471695392502@skylake-alporthouse-com>

On 2019-10-24 at 07:14:56 +0100, Chris Wilson wrote:
> Quoting Ramalingam C (2019-10-24 02:32:01)
> > On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote:
> > > If we are doing a normal GPU reset triggered after detecting a long
> > > period of stalled work, we can take our time and allow the engines to
> > > quiesce. Since we've stopped submission to the engine, and if we wait
> > > long enough an innocent context should complete, leaving the engine idle.
> > > So by waiting a short amount of time, we should prevent clobbering other
> > > users when resetting a stuck context.
> > > 
> > > Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Suggested-by: Jon Bloomfield <jon.bloomfield@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
> > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
> > >  3 files changed, 34 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > > index 48df8889a88a..3a3881d5e44b 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> > >         May be 0 to disable the initial spin. In practice, we estimate
> > >         the cost of enabling the interrupt (if currently disabled) to be
> > >         a few microseconds.
> > > +
> > > +config DRM_I915_STOP_TIMEOUT
> > > +     int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > > +     default 100 # milliseconds
> > > +     help
> > > +       By stopping submission and sleeping for a short time before resetting
> > > +       the GPU, we allow the innocent contexts also on the system to quiesce.
> > > +       It is then less likely for a hanging context to cause collateral
> > > +       damage as the system is reset in order to recover. The corollary is
> > > +       that the reset itself may take longer and so be more disruptive to
> > > +       interactive or low latency workloads.
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 0e20713603ec..e4203eb44139 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > >       engine->instance = info->instance;
> > >       __sprint_engine_name(engine);
> > >  
> > > +     engine->props.stop_timeout_ms =
> > > +             CONFIG_DRM_I915_STOP_TIMEOUT;
> > Compare to previous version where you used the CONFIG variable directly,
> > what is the benefit of using it through a variable? So that we could
> > alter it in runtime?
> 
> That is what the next series proposes. Also it turns out to be more
> easily testable if one is able to alter the timeouts for testing.
Thanks for explaining Chris!

-Ram.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-10-24  6:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-23 13:31 [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 2/5] drm/i915/execlists: Force preemption Chris Wilson
2019-10-23 13:31   ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 3/5] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-10-23 13:31   ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 4/5] drm/i915/gem: Cancel contexts when hangchecking is disabled Chris Wilson
2019-10-23 13:31   ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 5/5] drm/i915/gt: Replace hangcheck by heartbeats Chris Wilson
2019-10-23 13:31   ` [Intel-gfx] " Chris Wilson
2019-10-24  0:09 ` ✗ Fi.CI.BUILD: failure for series starting with [CI,1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets Patchwork
2019-10-24  0:09   ` [Intel-gfx] " Patchwork
2019-10-24  1:32 ` [CI 1/5] " Ramalingam C
2019-10-24  1:32   ` [Intel-gfx] " Ramalingam C
2019-10-24  6:14   ` Chris Wilson
2019-10-24  6:14     ` [Intel-gfx] " Chris Wilson
2019-10-24  6:37     ` Ramalingam C [this message]
2019-10-24  6:37       ` Ramalingam C

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191024063720.GA24164@intel.com \
    --to=ramalingam.c@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.