From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Cc: "Jana, Mousumi" <mousumi.jana@intel.com>,
"intel.com@freedesktop.org" <intel.com@freedesktop.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending
Date: Thu, 16 Nov 2023 10:19:20 +0000 [thread overview]
Message-ID: <0a15860d-6fa0-4f1c-b12b-2f9b65911fbc@linux.intel.com> (raw)
In-Reply-To: <b075538a522adbbb3b83934279b93bde9913b9e8.camel@intel.com>
On 14/11/2023 19:48, Teres Alexis, Alan Previn wrote:
> On Tue, 2023-11-14 at 17:52 +0000, Tvrtko Ursulin wrote:
>> On 14/11/2023 17:37, Teres Alexis, Alan Previn wrote:
>>> On Tue, 2023-11-14 at 17:27 +0000, Tvrtko Ursulin wrote:
>>>> On 13/11/2023 17:57, Teres Alexis, Alan Previn wrote:
>>>>> On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
>>>>>> On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
>>>>>>> On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
>>>>>>>> On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
>>
> alan:snip
>
>>> alan: So i did trace back the gt->wakeref before i posted these patches and
>>> see that within these runtime get/put calls, i believe the first 'get' leads
>>> to __intel_wakeref_get_first which calls intel_runtime_pm_get via rpm_get
>>> helper and eventually executes a pm_runtime_get_sync(rpm->kdev); (hanging off
>>> i915_device). (naturally there is a corresponding mirros for the '_put_last').
>>> So this means the first-get and last-put lets the kernel know. Thats why when
>>> i tested this patch, it did actually cause the suspend to abort from kernel side
>>> and the kernel would print a message indicating i915 was the one that didnt
>>> release all refs.
>>
>> Ah that would be much better then.
>>
>> Do you know if everything gets resumed/restored correctly in that case
>> or we would need some additional work to maybe early exit from callers
>> of wait_for_suspend()?
> alan: So assuming we are still discussing about a "potentially new
> future leaked-wakeref bug" (i.e. putting aside the fact that
> Patch #1 + #2 resolves this specific series' bug), based on the
> previous testing we did, after this timeout-bail trigger,
> the suspend flow bails and gt/guc operation does actually continue
> as normal. However, its been a long time since we tested this so
> i am not sure of how accidental-new-future bugs might play to this
> assumption especially if some other subsystem that leaked the rpm
> wakref but that subsystem did NOT get reset like how GuC is reset
> at the end of suspend.
>
>>
>> What I would also ask is to see if something like injecting a probing
>> failure is feasible, so we can have this new timeout exit path
>> constantly/regularly tested in CI.
> alan: Thats a good idea. In line with this, i would like to point out that
> rev6 of this series has been posted but i removed this Patch #3. However i did
> post this Patch #3 as a standalone patch here: https://patchwork.freedesktop.org/series/126414/
> as i anticipate this patch will truly help with future issue debuggability.
>
> That said, i shall post a review on that patch with your suggestion to add
> an injected probe error for the suspend-resume flow and follow up on that one
> separately.
Cool! I don't know exactly how to do it but if we come up with way and
so gain IGT coverage then I am okay with the patch in principle.
Like perhaps some new debugfs api needs to be added to provoke the
timeout error path on suspend, or something.
Regards,
Tvrtko
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> alan: Anyways, i have pulled this patch out of rev6 of this series and created a
>>> separate standalone patch for this patch #3 that we review independently.
>>>
>
next prev parent reply other threads:[~2023-11-16 10:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 19:05 [Intel-gfx] [PATCH v4 0/3] Resolve suspend-resume racing with GuC destroy-context-worker Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 1/3] drm/i915/guc: Flush context destruction worker at suspend Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-loss Alan Previn
2023-10-04 6:34 ` Gupta, Anshuman
2023-10-04 16:46 ` Teres Alexis, Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending Alan Previn
2023-09-27 9:02 ` Tvrtko Ursulin
2023-09-27 16:36 ` Teres Alexis, Alan Previn
2023-09-28 12:46 ` Tvrtko Ursulin
2023-10-04 17:59 ` Teres Alexis, Alan Previn
2023-10-25 12:58 ` Tvrtko Ursulin
2023-11-13 17:57 ` Teres Alexis, Alan Previn
2023-11-14 17:27 ` Tvrtko Ursulin
2023-11-14 17:36 ` Rodrigo Vivi
2023-11-14 19:34 ` Teres Alexis, Alan Previn
2023-11-14 17:37 ` Teres Alexis, Alan Previn
2023-11-14 17:52 ` Tvrtko Ursulin
2023-11-14 19:48 ` Teres Alexis, Alan Previn
2023-11-16 10:19 ` Tvrtko Ursulin [this message]
2023-09-27 1:23 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Resolve suspend-resume racing with GuC destroy-context-worker (rev4) Patchwork
2023-09-27 1:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-09-27 13:17 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a15860d-6fa0-4f1c-b12b-2f9b65911fbc@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel.com@freedesktop.org \
--cc=mousumi.jana@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox