From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org,
Chris Wilson <chris.p.wilson@intel.com>,
Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
Date: Tue, 28 Jun 2022 22:35:13 -0700 [thread overview]
Message-ID: <87o7ycowvi.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <20220628191741.28866-1-ashutosh.dixit@intel.com>
On Tue, 28 Jun 2022 12:17:41 -0700, Ashutosh Dixit wrote:
>
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> When resuming after hibernate sometimes we see hangs in unrelated kernel
> subsystems. These hangs often result in the following i915 trace:
>
> i915 0000:00:02.0: [drm] \
> *ERROR* intel_gt_reset_global timed out, cancelling all in-flight rendering.
>
> implying our reset task has been starved by the hanging kernel subsystem,
> causing us to inappropiately declare the system as wedged beyond recovery.
>
> The trace would be caused by our synchronize_srcu_expedited() taking more
> than the allowed 5s due to the unrelated kernel hang. But we neither need
> to perform that synchronisation inside the reset watchdog, nor do we need
> such a short timeout before declaring the device as unrecoverable.
>
> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a5338c3fde7a0..e72744f6faedc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1259,12 +1259,9 @@ static void intel_gt_reset_global(struct intel_gt *gt,
> kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
>
> /* Use a watchdog to ensure that our reset completes */
> - intel_wedge_on_timeout(&w, gt, 5 * HZ) {
> + intel_wedge_on_timeout(&w, gt, 60 * HZ) {
How about we take one step at a time so if we are moving
synchronize_srcu_expedited() out of the reset watchdog, we leave the
timeout to the previous 5s? With the original timeout restored this patch
is:
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> intel_display_prepare_reset(gt->i915);
>
> - /* Flush everyone using a resource about to be clobbered */
> - synchronize_srcu_expedited(>->reset.backoff_srcu);
> -
> intel_gt_reset(gt, engine_mask, reason);
>
> intel_display_finish_reset(gt->i915);
> @@ -1373,6 +1370,9 @@ void intel_gt_handle_error(struct intel_gt *gt,
> }
> }
>
> + /* Flush everyone using a resource about to be clobbered */
> + synchronize_srcu_expedited(>->reset.backoff_srcu);
> +
> intel_gt_reset_global(gt, engine_mask, msg);
>
> if (!intel_uc_uses_guc_submission(>->uc)) {
> --
> 2.36.1
>
next prev parent reply other threads:[~2022-06-29 5:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 19:17 [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs Ashutosh Dixit
2022-06-28 22:58 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-06-28 23:21 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-29 5:35 ` Dixit, Ashutosh [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-06-30 4:39 [Intel-gfx] [PATCH] " Ashutosh Dixit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7ycowvi.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=chris.p.wilson@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox