From: Ramalingam C <ramalingam.c@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets
Date: Thu, 24 Oct 2019 12:07:20 +0530 [thread overview]
Message-ID: <20191024063720.GA24164@intel.com> (raw)
In-Reply-To: <157189769673.18724.14469138471695392502@skylake-alporthouse-com>
On 2019-10-24 at 07:14:56 +0100, Chris Wilson wrote:
> Quoting Ramalingam C (2019-10-24 02:32:01)
> > On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote:
> > > If we are doing a normal GPU reset triggered after detecting a long
> > > period of stalled work, we can take our time and allow the engines to
> > > quiesce. Since we've stopped submission to the engine, and if we wait
> > > long enough an innocent context should complete, leaving the engine idle.
> > > So by waiting a short amount of time, we should prevent clobbering other
> > > users when resetting a stuck context.
> > >
> > > Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Suggested-by: Jon Bloomfield <jon.bloomfield@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/i915/Kconfig.profile | 11 +++++++++++
> > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 +++++++++++++++++++-
> > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 4 ++++
> > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > > index 48df8889a88a..3a3881d5e44b 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> > > May be 0 to disable the initial spin. In practice, we estimate
> > > the cost of enabling the interrupt (if currently disabled) to be
> > > a few microseconds.
> > > +
> > > +config DRM_I915_STOP_TIMEOUT
> > > + int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > > + default 100 # milliseconds
> > > + help
> > > + By stopping submission and sleeping for a short time before resetting
> > > + the GPU, we allow the innocent contexts also on the system to quiesce.
> > > + It is then less likely for a hanging context to cause collateral
> > > + damage as the system is reset in order to recover. The corollary is
> > > + that the reset itself may take longer and so be more disruptive to
> > > + interactive or low latency workloads.
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 0e20713603ec..e4203eb44139 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > > engine->instance = info->instance;
> > > __sprint_engine_name(engine);
> > >
> > > + engine->props.stop_timeout_ms =
> > > + CONFIG_DRM_I915_STOP_TIMEOUT;
> > Compare to previous version where you used the CONFIG variable directly,
> > what is the benefit of using it through a variable? So that we could
> > alter it in runtime?
>
> That is what the next series proposes. Also it turns out to be more
> easily testable if one is able to alter the timeouts for testing.
Thanks for explaining Chris!
-Ram.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Ramalingam C <ramalingam.c@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets
Date: Thu, 24 Oct 2019 12:07:20 +0530 [thread overview]
Message-ID: <20191024063720.GA24164@intel.com> (raw)
Message-ID: <20191024063720.8bM6GjqTixDT5H9_agXp0fwC_x-2nZt4VicwxiRbjQ8@z> (raw)
In-Reply-To: <157189769673.18724.14469138471695392502@skylake-alporthouse-com>
On 2019-10-24 at 07:14:56 +0100, Chris Wilson wrote:
> Quoting Ramalingam C (2019-10-24 02:32:01)
> > On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote:
> > > If we are doing a normal GPU reset triggered after detecting a long
> > > period of stalled work, we can take our time and allow the engines to
> > > quiesce. Since we've stopped submission to the engine, and if we wait
> > > long enough an innocent context should complete, leaving the engine idle.
> > > So by waiting a short amount of time, we should prevent clobbering other
> > > users when resetting a stuck context.
> > >
> > > Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Suggested-by: Jon Bloomfield <jon.bloomfield@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/i915/Kconfig.profile | 11 +++++++++++
> > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 +++++++++++++++++++-
> > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 4 ++++
> > > 3 files changed, 34 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > > index 48df8889a88a..3a3881d5e44b 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> > > May be 0 to disable the initial spin. In practice, we estimate
> > > the cost of enabling the interrupt (if currently disabled) to be
> > > a few microseconds.
> > > +
> > > +config DRM_I915_STOP_TIMEOUT
> > > + int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > > + default 100 # milliseconds
> > > + help
> > > + By stopping submission and sleeping for a short time before resetting
> > > + the GPU, we allow the innocent contexts also on the system to quiesce.
> > > + It is then less likely for a hanging context to cause collateral
> > > + damage as the system is reset in order to recover. The corollary is
> > > + that the reset itself may take longer and so be more disruptive to
> > > + interactive or low latency workloads.
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 0e20713603ec..e4203eb44139 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > > engine->instance = info->instance;
> > > __sprint_engine_name(engine);
> > >
> > > + engine->props.stop_timeout_ms =
> > > + CONFIG_DRM_I915_STOP_TIMEOUT;
> > Compare to previous version where you used the CONFIG variable directly,
> > what is the benefit of using it through a variable? So that we could
> > alter it in runtime?
>
> That is what the next series proposes. Also it turns out to be more
> easily testable if one is able to alter the timeouts for testing.
Thanks for explaining Chris!
-Ram.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-10-24 6:37 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-23 13:31 [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 2/5] drm/i915/execlists: Force preemption Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 3/5] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 4/5] drm/i915/gem: Cancel contexts when hangchecking is disabled Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-23 13:31 ` [CI 5/5] drm/i915/gt: Replace hangcheck by heartbeats Chris Wilson
2019-10-23 13:31 ` [Intel-gfx] " Chris Wilson
2019-10-24 0:09 ` ✗ Fi.CI.BUILD: failure for series starting with [CI,1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets Patchwork
2019-10-24 0:09 ` [Intel-gfx] " Patchwork
2019-10-24 1:32 ` [CI 1/5] " Ramalingam C
2019-10-24 1:32 ` [Intel-gfx] " Ramalingam C
2019-10-24 6:14 ` Chris Wilson
2019-10-24 6:14 ` [Intel-gfx] " Chris Wilson
2019-10-24 6:37 ` Ramalingam C [this message]
2019-10-24 6:37 ` Ramalingam C
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191024063720.GA24164@intel.com \
--to=ramalingam.c@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).