From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
intel-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org
Cc: john.c.harrison@intel.com
Subject: Re: [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest
Date: Thu, 21 Oct 2021 08:15:49 +0200 [thread overview]
Message-ID: <f8f1ae021e8cabc2c6d76996b5e74912cb0913db.camel@linux.intel.com> (raw)
In-Reply-To: <20211011234705.30853-1-matthew.brost@intel.com>
Hi, Matthew,
On Mon, 2021-10-11 at 16:47 -0700, Matthew Brost wrote:
> The hangcheck selftest blocks per engine resets by setting magic bits
> in
> the reset flags. This is incorrect for GuC submission because if the
> GuC
> fails to reset an engine we would like to do a full GT reset. Do no
> set
> these magic bits when using GuC submission.
>
> Side note this lockless algorithm with magic bits to block resets
> really
> should be ripped out.
>
Lockless algorithm aside, from a quick look at the code in
intel_reset.c it appears to me like the interface that falls back to a
full GT reset is intel_gt_handle_error() whereas intel_engine_reset()
is explicitly intended to not do that, so is there a discrepancy
between GuC and non-GuC here?
/Thomas
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 7e2d99dd012d..90a03c60c80c 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -734,7 +734,8 @@ static int __igt_reset_engine(struct intel_gt
> *gt, bool active)
> reset_engine_count = i915_reset_engine_count(global,
> engine);
>
> st_engine_heartbeat_disable(engine);
> - set_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + set_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> count = 0;
> do {
> struct i915_request *rq = NULL;
> @@ -824,7 +825,8 @@ static int __igt_reset_engine(struct intel_gt
> *gt, bool active)
> if (err)
> break;
> } while (time_before(jiffies, end_time));
> - clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + clear_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> st_engine_heartbeat_enable(engine);
> pr_info("%s: Completed %lu %s resets\n",
> engine->name, count, active ? "active" :
> "idle");
> @@ -1042,7 +1044,8 @@ static int __igt_reset_engines(struct intel_gt
> *gt,
> yield(); /* start all threads before we begin */
>
> st_engine_heartbeat_disable_no_pm(engine);
> - set_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + set_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> do {
> struct i915_request *rq = NULL;
> struct intel_selftest_saved_policy saved;
> @@ -1165,7 +1168,8 @@ static int __igt_reset_engines(struct intel_gt
> *gt,
> if (err)
> break;
> } while (time_before(jiffies, end_time));
> - clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + clear_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> st_engine_heartbeat_enable_no_pm(engine);
>
> pr_info("i915_reset_engine(%s:%s): %lu resets\n",
WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
intel-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org
Cc: john.c.harrison@intel.com
Subject: Re: [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest
Date: Thu, 21 Oct 2021 08:15:49 +0200 [thread overview]
Message-ID: <f8f1ae021e8cabc2c6d76996b5e74912cb0913db.camel@linux.intel.com> (raw)
In-Reply-To: <20211011234705.30853-1-matthew.brost@intel.com>
Hi, Matthew,
On Mon, 2021-10-11 at 16:47 -0700, Matthew Brost wrote:
> The hangcheck selftest blocks per engine resets by setting magic bits
> in
> the reset flags. This is incorrect for GuC submission because if the
> GuC
> fails to reset an engine we would like to do a full GT reset. Do no
> set
> these magic bits when using GuC submission.
>
> Side note this lockless algorithm with magic bits to block resets
> really
> should be ripped out.
>
Lockless algorithm aside, from a quick look at the code in
intel_reset.c it appears to me like the interface that falls back to a
full GT reset is intel_gt_handle_error() whereas intel_engine_reset()
is explicitly intended to not do that, so is there a discrepancy
between GuC and non-GuC here?
/Thomas
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 7e2d99dd012d..90a03c60c80c 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -734,7 +734,8 @@ static int __igt_reset_engine(struct intel_gt
> *gt, bool active)
> reset_engine_count = i915_reset_engine_count(global,
> engine);
>
> st_engine_heartbeat_disable(engine);
> - set_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + set_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> count = 0;
> do {
> struct i915_request *rq = NULL;
> @@ -824,7 +825,8 @@ static int __igt_reset_engine(struct intel_gt
> *gt, bool active)
> if (err)
> break;
> } while (time_before(jiffies, end_time));
> - clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + clear_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> st_engine_heartbeat_enable(engine);
> pr_info("%s: Completed %lu %s resets\n",
> engine->name, count, active ? "active" :
> "idle");
> @@ -1042,7 +1044,8 @@ static int __igt_reset_engines(struct intel_gt
> *gt,
> yield(); /* start all threads before we begin */
>
> st_engine_heartbeat_disable_no_pm(engine);
> - set_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + set_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> do {
> struct i915_request *rq = NULL;
> struct intel_selftest_saved_policy saved;
> @@ -1165,7 +1168,8 @@ static int __igt_reset_engines(struct intel_gt
> *gt,
> if (err)
> break;
> } while (time_before(jiffies, end_time));
> - clear_bit(I915_RESET_ENGINE + id, >->reset.flags);
> + if (!using_guc)
> + clear_bit(I915_RESET_ENGINE + id, >-
> >reset.flags);
> st_engine_heartbeat_enable_no_pm(engine);
>
> pr_info("i915_reset_engine(%s:%s): %lu resets\n",
next prev parent reply other threads:[~2021-10-21 6:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-11 23:47 [Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest Matthew Brost
2021-10-11 23:47 ` Matthew Brost
2021-10-12 0:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-10-12 4:46 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-10-21 6:15 ` Thomas Hellström [this message]
2021-10-21 6:15 ` [PATCH] " Thomas Hellström
2021-10-21 20:37 ` [Intel-gfx] " Matthew Brost
2021-10-21 20:37 ` Matthew Brost
2021-10-22 6:23 ` [Intel-gfx] " Thomas Hellström
2021-10-22 6:23 ` Thomas Hellström
2021-10-22 17:03 ` [Intel-gfx] " Matthew Brost
2021-10-22 17:03 ` Matthew Brost
2021-10-22 18:09 ` [Intel-gfx] " John Harrison
2021-10-22 18:09 ` John Harrison
2021-10-23 17:46 ` [Intel-gfx] " Thomas Hellström
2021-10-23 17:46 ` Thomas Hellström
2021-10-23 18:18 ` [Intel-gfx] " Matthew Brost
2021-10-23 18:18 ` Matthew Brost
2021-10-23 18:36 ` [Intel-gfx] " Thomas Hellström
2021-10-23 18:36 ` Thomas Hellström
2021-10-25 17:32 ` [Intel-gfx] " John Harrison
2021-10-25 17:32 ` John Harrison
2021-10-26 19:55 ` [Intel-gfx] " John Harrison
2021-10-26 19:55 ` John Harrison
2021-10-27 6:36 ` [Intel-gfx] " Thomas Hellström
2021-10-27 6:36 ` Thomas Hellström
2021-10-27 20:34 ` [Intel-gfx] " John Harrison
2021-10-27 20:34 ` John Harrison
2021-10-27 20:47 ` [Intel-gfx] " Thomas Hellström
2021-10-27 20:47 ` Thomas Hellström
2021-10-26 8:22 ` [Intel-gfx] " Thomas Hellström
2021-10-26 8:22 ` Thomas Hellström
2021-10-26 19:48 ` [Intel-gfx] " John Harrison
2021-10-26 19:48 ` John Harrison
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8f1ae021e8cabc2c6d76996b5e74912cb0913db.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=john.c.harrison@intel.com \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.