Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
@ 2022-06-28 19:17 Ashutosh Dixit
  2022-06-28 22:58 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ashutosh Dixit @ 2022-06-28 19:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, dri-devel, Chris Wilson

From: Chris Wilson <chris@chris-wilson.co.uk>

When resuming after hibernate sometimes we see hangs in unrelated kernel
subsystems. These hangs often result in the following i915 trace:

i915 0000:00:02.0: [drm] \
	*ERROR* intel_gt_reset_global timed out, cancelling all in-flight rendering.

implying our reset task has been starved by the hanging kernel subsystem,
causing us to inappropiately declare the system as wedged beyond recovery.

The trace would be caused by our synchronize_srcu_expedited() taking more
than the allowed 5s due to the unrelated kernel hang. But we neither need
to perform that synchronisation inside the reset watchdog, nor do we need
such a short timeout before declaring the device as unrecoverable.

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a0..e72744f6faedc 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1259,12 +1259,9 @@ static void intel_gt_reset_global(struct intel_gt *gt,
 	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
 
 	/* Use a watchdog to ensure that our reset completes */
-	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
+	intel_wedge_on_timeout(&w, gt, 60 * HZ) {
 		intel_display_prepare_reset(gt->i915);
 
-		/* Flush everyone using a resource about to be clobbered */
-		synchronize_srcu_expedited(&gt->reset.backoff_srcu);
-
 		intel_gt_reset(gt, engine_mask, reason);
 
 		intel_display_finish_reset(gt->i915);
@@ -1373,6 +1370,9 @@ void intel_gt_handle_error(struct intel_gt *gt,
 		}
 	}
 
+	/* Flush everyone using a resource about to be clobbered */
+	synchronize_srcu_expedited(&gt->reset.backoff_srcu);
+
 	intel_gt_reset_global(gt, engine_mask, msg);
 
 	if (!intel_uc_uses_guc_submission(&gt->uc)) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
  2022-06-28 19:17 [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs Ashutosh Dixit
@ 2022-06-28 22:58 ` Patchwork
  2022-06-28 23:21 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  2022-06-29  5:35 ` [Intel-gfx] [PATCH] " Dixit, Ashutosh
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2022-06-28 22:58 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
URL   : https://patchwork.freedesktop.org/series/105748/
State : warning

== Summary ==

Error: dim checkpatch failed
ac39982eb930 drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
-:11: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#11: 
	*ERROR* intel_gt_reset_global timed out, cancelling all in-flight rendering.

total: 0 errors, 1 warnings, 0 checks, 22 lines checked



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
  2022-06-28 19:17 [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs Ashutosh Dixit
  2022-06-28 22:58 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2022-06-28 23:21 ` Patchwork
  2022-06-29  5:35 ` [Intel-gfx] [PATCH] " Dixit, Ashutosh
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2022-06-28 23:21 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 11712 bytes --]

== Series Details ==

Series: drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
URL   : https://patchwork.freedesktop.org/series/105748/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11820 -> Patchwork_105748v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_105748v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_105748v1, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/index.html

Participating hosts (38 -> 38)
------------------------------

  Additional (2): fi-icl-u2 bat-dg2-9 
  Missing    (2): bat-dg2-8 bat-adlp-4 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105748v1:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_parallel@engines@fds:
    - fi-bsw-nick:        [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-bsw-nick/igt@gem_exec_parallel@engines@fds.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-bsw-nick/igt@gem_exec_parallel@engines@fds.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-cfl-8109u:       [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-cfl-8109u/igt@i915_suspend@basic-s3-without-i915.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-cfl-8109u/igt@i915_suspend@basic-s3-without-i915.html

  
Known issues
------------

  Here are the changes found in Patchwork_105748v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-icl-u2:          NOTRUN -> [SKIP][5] ([i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-icl-u2:          NOTRUN -> [SKIP][6] ([i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@gem_lmem_swapping@random-engines.html

  * igt@i915_pm_rpm@module-reload:
    - fi-cfl-8109u:       [PASS][7] -> [DMESG-FAIL][8] ([i915#62])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-cfl-8109u/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        NOTRUN -> [DMESG-FAIL][9] ([i915#4528])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-pnv-d510/igt@i915_selftest@live@gem.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-icl-u2:          NOTRUN -> [SKIP][10] ([i915#5903])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-blb-e6850:       NOTRUN -> [SKIP][11] ([fdo#109271])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-blb-e6850/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          NOTRUN -> [SKIP][12] ([fdo#111827]) +8 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-icl-u2:          NOTRUN -> [SKIP][13] ([i915#4103])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_flip@basic-flip-vs-dpms@a-edp1:
    - fi-tgl-u2:          [PASS][14] -> [DMESG-WARN][15] ([i915#402])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-tgl-u2/igt@kms_flip@basic-flip-vs-dpms@a-edp1.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-tgl-u2/igt@kms_flip@basic-flip-vs-dpms@a-edp1.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-icl-u2:          NOTRUN -> [WARN][16] ([i915#6008])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-icl-u2:          NOTRUN -> [SKIP][17] ([fdo#109285])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cfl-8109u:       [PASS][18] -> [DMESG-WARN][19] ([i915#62]) +12 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - fi-icl-u2:          NOTRUN -> [SKIP][20] ([i915#3555])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
    - fi-icl-u2:          NOTRUN -> [SKIP][21] ([fdo#109295] / [i915#3301])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-icl-u2/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-cfl-8109u:       NOTRUN -> [FAIL][22] ([i915#4312])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-cfl-8109u/igt@runner@aborted.html
    - fi-bsw-nick:        NOTRUN -> [FAIL][23] ([i915#4312])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-bsw-nick/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_pm_rps@basic-api:
    - fi-hsw-4770:        [FAIL][24] -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-hsw-4770/igt@i915_pm_rps@basic-api.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-hsw-4770/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@gtt:
    - fi-bdw-5557u:       [DMESG-FAIL][26] ([i915#3674]) -> [PASS][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-bdw-5557u/igt@i915_selftest@live@gtt.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-bdw-5557u/igt@i915_selftest@live@gtt.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          [DMESG-FAIL][28] ([i915#4494] / [i915#4957]) -> [PASS][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [DMESG-FAIL][30] ([i915#4528]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-blb-e6850/igt@i915_selftest@live@requests.html
    - fi-pnv-d510:        [DMESG-FAIL][32] ([i915#4528]) -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@kms_busy@basic@modeset:
    - {bat-adln-1}:       [DMESG-WARN][34] ([i915#3576]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/bat-adln-1/igt@kms_busy@basic@modeset.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/bat-adln-1/igt@kms_busy@basic@modeset.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - {bat-adlp-6}:       [DMESG-WARN][36] ([i915#3576]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11820/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/bat-adlp-6/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595
  [i915#3674]: https://gitlab.freedesktop.org/drm/intel/issues/3674
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#5153]: https://gitlab.freedesktop.org/drm/intel/issues/5153
  [i915#5174]: https://gitlab.freedesktop.org/drm/intel/issues/5174
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6008]: https://gitlab.freedesktop.org/drm/intel/issues/6008
  [i915#6106]: https://gitlab.freedesktop.org/drm/intel/issues/6106
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#6299]: https://gitlab.freedesktop.org/drm/intel/issues/6299


Build changes
-------------

  * Linux: CI_DRM_11820 -> Patchwork_105748v1

  CI-20190529: 20190529
  CI_DRM_11820: 8f4a9176de36698b5b3ba72c4f68f1cf7a15c0c9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6549: 9b9371c8da32533022ad700a7c023b4c3a085fbc @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105748v1: 8f4a9176de36698b5b3ba72c4f68f1cf7a15c0c9 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

313852d6d8af drm/i915/reset: Handle reset timeouts under unrelated kernel hangs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105748v1/index.html

[-- Attachment #2: Type: text/html, Size: 11883 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
  2022-06-28 19:17 [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs Ashutosh Dixit
  2022-06-28 22:58 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
  2022-06-28 23:21 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2022-06-29  5:35 ` Dixit, Ashutosh
  2 siblings, 0 replies; 5+ messages in thread
From: Dixit, Ashutosh @ 2022-06-29  5:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Chris Wilson, Chris Wilson

On Tue, 28 Jun 2022 12:17:41 -0700, Ashutosh Dixit wrote:
>
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> When resuming after hibernate sometimes we see hangs in unrelated kernel
> subsystems. These hangs often result in the following i915 trace:
>
> i915 0000:00:02.0: [drm] \
>	*ERROR* intel_gt_reset_global timed out, cancelling all in-flight rendering.
>
> implying our reset task has been starved by the hanging kernel subsystem,
> causing us to inappropiately declare the system as wedged beyond recovery.
>
> The trace would be caused by our synchronize_srcu_expedited() taking more
> than the allowed 5s due to the unrelated kernel hang. But we neither need
> to perform that synchronisation inside the reset watchdog, nor do we need
> such a short timeout before declaring the device as unrecoverable.
>
> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a5338c3fde7a0..e72744f6faedc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1259,12 +1259,9 @@ static void intel_gt_reset_global(struct intel_gt *gt,
>	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
>
>	/* Use a watchdog to ensure that our reset completes */
> -	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
> +	intel_wedge_on_timeout(&w, gt, 60 * HZ) {

How about we take one step at a time so if we are moving
synchronize_srcu_expedited() out of the reset watchdog, we leave the
timeout to the previous 5s? With the original timeout restored this patch
is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>		intel_display_prepare_reset(gt->i915);
>
> -		/* Flush everyone using a resource about to be clobbered */
> -		synchronize_srcu_expedited(&gt->reset.backoff_srcu);
> -
>		intel_gt_reset(gt, engine_mask, reason);
>
>		intel_display_finish_reset(gt->i915);
> @@ -1373,6 +1370,9 @@ void intel_gt_handle_error(struct intel_gt *gt,
>		}
>	}
>
> +	/* Flush everyone using a resource about to be clobbered */
> +	synchronize_srcu_expedited(&gt->reset.backoff_srcu);
> +
>	intel_gt_reset_global(gt, engine_mask, msg);
>
>	if (!intel_uc_uses_guc_submission(&gt->uc)) {
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
@ 2022-06-30  4:39 Ashutosh Dixit
  0 siblings, 0 replies; 5+ messages in thread
From: Ashutosh Dixit @ 2022-06-30  4:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, dri-devel

From: Chris Wilson <chris@chris-wilson.co.uk>

When resuming after hibernate sometimes we see hangs in unrelated kernel
subsystems. These hangs often result in the following i915 trace:

i915 0000:00:02.0: [drm] *ERROR* \
	intel_gt_reset_global timed out, cancelling all in-flight rendering

implying our reset task has been starved by the hanging kernel subsystem,
causing us to inappropiately declare the system as wedged beyond recovery.

The trace would be caused by our synchronize_srcu_expedited() taking more
than the allowed 5s due to the unrelated kernel hang. But we neither need
to perform that synchronisation inside the reset watchdog, nor do we need
such a short timeout before declaring the device as unrecoverable.

v2: Restore watchdog timeout to the previous 5 seconds (Ashutosh)

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a..1cbe65a5b0fd 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1262,9 +1262,6 @@ static void intel_gt_reset_global(struct intel_gt *gt,
 	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
 		intel_display_prepare_reset(gt->i915);
 
-		/* Flush everyone using a resource about to be clobbered */
-		synchronize_srcu_expedited(&gt->reset.backoff_srcu);
-
 		intel_gt_reset(gt, engine_mask, reason);
 
 		intel_display_finish_reset(gt->i915);
@@ -1373,6 +1370,9 @@ void intel_gt_handle_error(struct intel_gt *gt,
 		}
 	}
 
+	/* Flush everyone using a resource about to be clobbered */
+	synchronize_srcu_expedited(&gt->reset.backoff_srcu);
+
 	intel_gt_reset_global(gt, engine_mask, msg);
 
 	if (!intel_uc_uses_guc_submission(&gt->uc)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-30  4:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-28 19:17 [Intel-gfx] [PATCH] drm/i915/reset: Handle reset timeouts under unrelated kernel hangs Ashutosh Dixit
2022-06-28 22:58 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-06-28 23:21 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-06-29  5:35 ` [Intel-gfx] [PATCH] " Dixit, Ashutosh
  -- strict thread matches above, loose matches on Subject: below --
2022-06-30  4:39 Ashutosh Dixit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox