* [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter
@ 2020-08-18 11:36 Marcin Ślusarz
2020-08-18 11:37 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message Marcin Ślusarz
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Marcin Ślusarz @ 2020-08-18 11:36 UTC (permalink / raw)
To: intel-gfx
From: Marcin Ślusarz <marcin.slusarz@intel.com>
For some reason intel_gt_reset attempts to reset the GPU twice.
On one code path (do_reset) "reset" parameter is obeyed, but is
not on the other one (__intel_gt_set_wedged).
Fix this.
I noticed this because I stumbled on a bug which completely locks
up a machine on reset (preventing me from saving the error state)
and i915.reset=0 wasn't working as expected.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
---
drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 39070b514e65..f4823ca2d71f 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -816,7 +816,8 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
awake = reset_prepare(gt);
/* Even if the GPU reset fails, it should still stop the engines */
- if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+ if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display &&
+ i915_modparams.reset)
__intel_gt_reset(gt, ALL_ENGINES);
for_each_engine(engine, gt, id)
--
2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message
2020-08-18 11:36 [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter Marcin Ślusarz
@ 2020-08-18 11:37 ` Marcin Ślusarz
2020-08-18 11:44 ` Chris Wilson
2020-08-18 11:56 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/gt: obey "reset" module parameter Patchwork
2020-08-18 11:58 ` [Intel-gfx] [PATCH 1/2] " Chris Wilson
2 siblings, 1 reply; 7+ messages in thread
From: Marcin Ślusarz @ 2020-08-18 11:37 UTC (permalink / raw)
To: intel-gfx
From: Marcin Ślusarz <marcin.slusarz@intel.com>
Few lines above there's drm_notice saying that the gpu will be reset.
Printing "gpu reset disabled" using lower logging level makes it
harder to figure out what happened.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
---
drivers/gpu/drm/i915/gt/intel_reset.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c
b/drivers/gpu/drm/i915/gt/intel_reset.c
index f4823ca2d71f..0384315c190d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1042,7 +1042,7 @@ void intel_gt_reset(struct intel_gt *gt,
if (i915_modparams.reset)
drm_err(>->i915->drm, "GPU reset not supported\n");
else
- drm_dbg(>->i915->drm, "GPU reset disabled\n");
+ drm_notice(>->i915->drm, "GPU reset disabled\n");
goto error;
}
--
2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message
2020-08-18 11:37 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message Marcin Ślusarz
@ 2020-08-18 11:44 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2020-08-18 11:44 UTC (permalink / raw)
To: Marcin Ślusarz, intel-gfx
Quoting Marcin Ślusarz (2020-08-18 12:37:20)
> From: Marcin Ślusarz <marcin.slusarz@intel.com>
>
> Few lines above there's drm_notice saying that the gpu will be reset.
> Printing "gpu reset disabled" using lower logging level makes it
> harder to figure out what happened.
It's disabled by user request, and we may do this very very frequently.
If you are wise enough to touch an unsafe module parameter, you get to
keep all the pieces.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/gt: obey "reset" module parameter
2020-08-18 11:36 [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter Marcin Ślusarz
2020-08-18 11:37 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message Marcin Ślusarz
@ 2020-08-18 11:56 ` Patchwork
2020-08-18 11:58 ` [Intel-gfx] [PATCH 1/2] " Chris Wilson
2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2020-08-18 11:56 UTC (permalink / raw)
To: Marcin Ślusarz; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/2] drm/i915/gt: obey "reset" module parameter
URL : https://patchwork.freedesktop.org/series/80734/
State : failure
== Summary ==
Applying: drm/i915/gt: obey "reset" module parameter
error: git diff header lacks filename information when removing 1 leading pathname component (line 6)
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 drm/i915/gt: obey "reset" module parameter
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter
2020-08-18 11:36 [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter Marcin Ślusarz
2020-08-18 11:37 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message Marcin Ślusarz
2020-08-18 11:56 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/gt: obey "reset" module parameter Patchwork
@ 2020-08-18 11:58 ` Chris Wilson
2020-08-18 17:49 ` Rodrigo Vivi
2 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2020-08-18 11:58 UTC (permalink / raw)
To: Marcin Ślusarz, intel-gfx
Quoting Marcin Ślusarz (2020-08-18 12:36:07)
> From: Marcin Ślusarz <marcin.slusarz@intel.com>
>
> For some reason intel_gt_reset attempts to reset the GPU twice.
> On one code path (do_reset) "reset" parameter is obeyed, but is
> not on the other one (__intel_gt_set_wedged).
It's not that simple, we do want to force __intel_gt_set_wedged() to
cancel whatever is running on the GPU as it is used for more than just
failing resets (e.g. around control boundaries) regardless of what the
user may want.
I'm loathe to add a parameter just to enable unsafe behaviour, but that
may be the compromise.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter
2020-08-18 11:58 ` [Intel-gfx] [PATCH 1/2] " Chris Wilson
@ 2020-08-18 17:49 ` Rodrigo Vivi
2020-08-18 18:12 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Rodrigo Vivi @ 2020-08-18 17:49 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx, Marcin Ślusarz
On Tue, Aug 18, 2020 at 12:58:00PM +0100, Chris Wilson wrote:
> Quoting Marcin Ślusarz (2020-08-18 12:36:07)
> > From: Marcin Ślusarz <marcin.slusarz@intel.com>
> >
> > For some reason intel_gt_reset attempts to reset the GPU twice.
> > On one code path (do_reset) "reset" parameter is obeyed, but is
> > not on the other one (__intel_gt_set_wedged).
>
> It's not that simple, we do want to force __intel_gt_set_wedged() to
> cancel whatever is running on the GPU as it is used for more than just
> failing resets (e.g. around control boundaries) regardless of what the
> user may want.
>
> I'm loathe to add a parameter just to enable unsafe behaviour, but that
> may be the compromise.
we probably need this compromise for these cases Marcin faced...
what about moving this to intel_get_gpu_reset()?
@bool intel_has_gpu_reset(const struct intel_gt *gt)
- if (!gt->i915->params.reset)
- return NULL;
@ static reset_func intel_get_gpu_reset(const struct intel_gt *gt)
+ if (!gt->i915->params.reset)
+ return NULL;
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter
2020-08-18 17:49 ` Rodrigo Vivi
@ 2020-08-18 18:12 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2020-08-18 18:12 UTC (permalink / raw)
To: Rodrigo Vivi; +Cc: intel-gfx, Marcin Ślusarz
Quoting Rodrigo Vivi (2020-08-18 18:49:19)
> On Tue, Aug 18, 2020 at 12:58:00PM +0100, Chris Wilson wrote:
> > Quoting Marcin Ślusarz (2020-08-18 12:36:07)
> > > From: Marcin Ślusarz <marcin.slusarz@intel.com>
> > >
> > > For some reason intel_gt_reset attempts to reset the GPU twice.
> > > On one code path (do_reset) "reset" parameter is obeyed, but is
> > > not on the other one (__intel_gt_set_wedged).
> >
> > It's not that simple, we do want to force __intel_gt_set_wedged() to
> > cancel whatever is running on the GPU as it is used for more than just
> > failing resets (e.g. around control boundaries) regardless of what the
> > user may want.
> >
> > I'm loathe to add a parameter just to enable unsafe behaviour, but that
> > may be the compromise.
>
> we probably need this compromise for these cases Marcin faced...
You can always say those who risk unsafe parameters are always capable
of patching the kernel to break it.
> what about moving this to intel_get_gpu_reset()?
When it was there, we didn't have the reason why, so we ended up
duplicating the tests anyway to suppress the error messages for CI.
And it breaks the control boundary cases where we have to reset the GPU,
or when we need the wedge to undeadlock modesetting which will also
lockup the machine. In short, we should remove the parameter; we'll
still end up having to bisect through the GPU features [atomic ops, it's
always atomic ops] to find which one is killing the machine.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-08-18 18:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-18 11:36 [Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter Marcin Ślusarz
2020-08-18 11:37 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: raise logging level of "gpu reset disabled" message Marcin Ślusarz
2020-08-18 11:44 ` Chris Wilson
2020-08-18 11:56 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/gt: obey "reset" module parameter Patchwork
2020-08-18 11:58 ` [Intel-gfx] [PATCH 1/2] " Chris Wilson
2020-08-18 17:49 ` Rodrigo Vivi
2020-08-18 18:12 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.