From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: maarten.lankhorst@linux.intel.com, matthew.auld@intel.com,
Matthew Brost <matthew.brost@intel.com>,
John Harrison <John.C.Harrison@Intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout
Date: Thu, 23 Sep 2021 13:47:46 +0200 [thread overview]
Message-ID: <0f1050c9-b9fe-b587-2aac-cceae4032638@linux.intel.com> (raw)
In-Reply-To: <f276fe3d-5ed8-7ac9-440d-3703f6f0e5e5@linux.intel.com>
Hi, Tvrtko,
On 9/23/21 12:13 PM, Tvrtko Ursulin wrote:
>
> On 22/09/2021 07:25, Thomas Hellström wrote:
>> With GuC submission on DG1, the execution of the requests times out
>> for the gem_exec_suspend igt test case after executing around 800-900
>> of 1000 submitted requests.
>>
>> Given the time we allow elsewhere for fences to signal (in the order of
>> seconds), increase the timeout before we mark the gt wedged and proceed.
>
> I suspect it is not about requests not retiring in time but about the
> intel_guc_wait_for_idle part of intel_gt_wait_for_idle. Although I
> don't know which G2H message is the code waiting for at suspend time
> so perhaps something to run past the GuC experts.
So what's happening here is that the tests submits 1000 requests, each
writing a value to an object, and then that object content is checked
after resume. With GuC it turns out that only 800-900 or so values are
actually written before we time out, and the test (basic-S3) fails, but
not on every run.
This is a bit interesting in itself, because I never saw the hang-S3
test fail, which from what I can tell basically is an identical test but
with a spinner submitted after the 1000th request. Could be that the
suspend backup code ends up waiting for something before we end up in
intel_gt_wait_for_idle, giving more requests time to execute.
>
> Anyway, if that turns out to be correct then perhaps it would be
> better to split the two timeouts (like if required GuC timeout is
> perhaps fundamentally independent) so it's clear who needs how much
> time. Adding Matt and John to comment.
You mean we have separate timeouts depending on whether we're using GuC
or execlists submission?
>
> To be clear, as timeout is AFAIK an arbitrary value, I don't have
> fundamental objections here. Just think it would be good to have
> accurate story in the commit message.
Ok. yes. I wonder whether we actually should increase this timeout even
more since now the watchdog times out after 10+ seconds? I guess those
long-running requests could be executing also at suspend time.
/Thomas
>
> Regards,
>
> Tvrtko
>
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_gt_pm.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> index dea8e2479897..f84f2bfe2de0 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> @@ -19,6 +19,8 @@
>> #include "intel_rps.h"
>> #include "intel_wakeref.h"
>> +#define I915_GT_SUSPEND_IDLE_TIMEOUT (HZ / 2)
>> +
>> static void user_forcewake(struct intel_gt *gt, bool suspend)
>> {
>> int count = atomic_read(>->user_wakeref);
>> @@ -279,7 +281,7 @@ static void wait_for_suspend(struct intel_gt *gt)
>> if (!intel_gt_pm_is_awake(gt))
>> return;
>> - if (intel_gt_wait_for_idle(gt, I915_GEM_IDLE_TIMEOUT) ==
>> -ETIME) {
>> + if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) ==
>> -ETIME) {
>> /*
>> * Forcibly cancel outstanding work and leave
>> * the gpu quiet.
>>
next prev parent reply other threads:[~2021-09-23 11:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-22 6:25 [Intel-gfx] [PATCH v6 0/9] drm/i915: Suspend / resume backup- and restore of LMEM Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 1/9] drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 2/9] drm/i915/gem: Implement a function to process all gem objects of a region Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout Thomas Hellström
2021-09-23 9:18 ` Matthew Auld
2021-09-23 10:13 ` Tvrtko Ursulin
2021-09-23 11:47 ` Thomas Hellström [this message]
2021-09-23 12:59 ` Tvrtko Ursulin
2021-09-23 13:19 ` Thomas Hellström
2021-09-23 14:33 ` Tvrtko Ursulin
2021-09-23 15:43 ` Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 4/9] drm/i915 Implement LMEM backup and restore for suspend / resume Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 5/9] drm/i915/gt: Register the migrate contexts with their engines Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 6/9] drm/i915: Don't back up pinned LMEM context images and rings during suspend Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 7/9] drm/i915: Reduce the number of objects subject to memcpy recover Thomas Hellström
2021-09-23 9:44 ` Matthew Auld
2021-09-23 9:58 ` Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 8/9] HAX: component: do not leave master devres group open after bind Thomas Hellström
2021-09-22 6:25 ` [Intel-gfx] [PATCH v6 9/9] HAX: drm/i915/gem: Fix the __i915_gem_is_lmem() function Thomas Hellström
2021-09-22 7:23 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Suspend / resume backup- and restore of LMEM. (rev9) Patchwork
2021-09-22 7:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-09-22 7:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-09-22 9:05 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-09-22 18:06 ` Thomas Hellström
2021-09-23 2:11 ` Vudum, Lakshminarayana
2021-09-23 0:27 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f1050c9-b9fe-b587-2aac-cceae4032638@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=John.C.Harrison@Intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox