From: Matthew Brost <matthew.brost@intel.com>
To: John.C.Harrison@Intel.com
Cc: Intel-GFX@Lists.FreeDesktop.Org, DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Don't error on reset of banned context
Date: Tue, 11 Jan 2022 10:45:41 -0800 [thread overview]
Message-ID: <20220111184541.GA14954@jons-linux-dev-box> (raw)
In-Reply-To: <20220107003143.326046-1-John.C.Harrison@Intel.com>
On Thu, Jan 06, 2022 at 04:31:43PM -0800, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a race (already documented in the code) whereby a context can
> be (re-)queued for submission at the same time as it is being banned
> due to a hang and reset. That leads to a hang/reset report from GuC
> for a context which i915 thinks is already banned.
>
I think there are 2 issues here.
1. Banning of context (e.g. user closes a non-persistent context)
results in an context reset. In this case we will receive a G2H
indicating a context reset and we want to convert the context reset to a
nop.
2. A GT reset races with a context reset result in the context getting
banned before the G2H is processed. Again we want to convert the context
reset to a nop. This race should be sealed when we can flush the G2H
handler in the reset path. Flushing G2H handler depends on the error
capture not allocating memory in non-sleeping contexts. Thomas H had a
patch for this.
In both cases we shouldn't print an error.
> While the race is indented to be fixed in a future GuC update, there
> is no actual harm beyond the wasted execution time of that new hang
> detection period. The context has already been banned for bad
> behaviour so a fresh hang is hardly surprising and certainly isn't
> going to be losing any work that wouldn't already have been lost if
> there was no race.
>
See above, I think you are confusing the issues here. This won't be
fixed by an updated GuC firmware.
> So don't treat this situation as an error. The error message is seen
> by the CI system as something fatal and causes test failures. Instead,
> just print an informational so the user at least knows a context reset
> occurred (given that the error capture is being skipped).
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9989d121127d..e8a32a7e7daf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -3978,6 +3978,10 @@ static void guc_handle_context_reset(struct intel_guc *guc,
> !context_blocked(ce))) {
> capture_error_state(guc, ce);
> guc_context_replay(ce);
> + } else if (intel_context_is_banned(ce)) {
> + drm_info(&guc_to_gt(guc)->i915->drm,
> + "Reset notificaion for banned context 0x%04X on %s",
> + ce->guc_id.id, ce->engine->name);
The context being blocking isn't an error either. I think real fix is
changing the below drm_err to drm_info and call it a day.
Matt
> } else {
> drm_err(&guc_to_gt(guc)->i915->drm,
> "Invalid GuC engine reset notificaion for 0x%04X on %s: banned = %d, blocked = %d",
> --
> 2.25.1
>
prev parent reply other threads:[~2022-01-11 18:51 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-07 0:31 [Intel-gfx] [PATCH] drm/i915/guc: Don't error on reset of banned context John.C.Harrison
2022-01-07 0:35 ` John Harrison
2022-01-07 1:49 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-01-07 17:06 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-01-11 18:45 ` Matthew Brost [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220111184541.GA14954@jons-linux-dev-box \
--to=matthew.brost@intel.com \
--cc=DRI-Devel@Lists.FreeDesktop.Org \
--cc=Intel-GFX@Lists.FreeDesktop.Org \
--cc=John.C.Harrison@Intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox