From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Cc: "Jana, Mousumi" <mousumi.jana@intel.com>,
"intel.com@freedesktop.org" <intel.com@freedesktop.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending
Date: Tue, 14 Nov 2023 17:52:12 +0000 [thread overview]
Message-ID: <dc97f378-c8c6-4841-95da-740f3ecca187@linux.intel.com> (raw)
In-Reply-To: <246ecbb0dcbbc96d9e48f8de6798bd9d16961c8f.camel@intel.com>
On 14/11/2023 17:37, Teres Alexis, Alan Previn wrote:
> On Tue, 2023-11-14 at 17:27 +0000, Tvrtko Ursulin wrote:
>> On 13/11/2023 17:57, Teres Alexis, Alan Previn wrote:
>>> On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
>>>> On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
>>>>> On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
>>>>>> On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
> alan: snip
>>
>>> alan: I won't say its NOT fixing anything - i am saying it's not fixing
>>> this specific bug where we have the outstanding G2H from a context destruction
>>> operation that got dropped. I am okay with dropping this patch in the next rev
>>> but shall post a separate stand alone version of Patch3 - because I believe
>>> all other i915 subsystems that take runtime-pm's DO NOT wait forever when entering
>>> suspend - but GT is doing this. This means if there ever was a bug introduced,
>>> it would require serial port or ramoops collection to debug. So i think such a
>>> patch, despite not fixing this specific bug will be very helpful for debuggability
>>> of future issues. After all, its better to fail our suspend when we have a bug
>>> rather than to hang the kernel forever.
>>
>> Issue I have is that I am not seeing how it fails the suspend when
>> nothing is passed out from *void* wait_suspend(gt). To me it looks the
>> suspend just carries on. And if so, it sounds dangerous to allow it to
>> do that with any future/unknown bugs in the suspend sequence. Existing
>> timeout wedges the GPU (and all that entails). New code just says "meh
>> I'll just carry on regardless".
>
> alan: So i did trace back the gt->wakeref before i posted these patches and
> see that within these runtime get/put calls, i believe the first 'get' leads
> to __intel_wakeref_get_first which calls intel_runtime_pm_get via rpm_get
> helper and eventually executes a pm_runtime_get_sync(rpm->kdev); (hanging off
> i915_device). (naturally there is a corresponding mirros for the '_put_last').
> So this means the first-get and last-put lets the kernel know. Thats why when
> i tested this patch, it did actually cause the suspend to abort from kernel side
> and the kernel would print a message indicating i915 was the one that didnt
> release all refs.
Ah that would be much better then.
Do you know if everything gets resumed/restored correctly in that case
or we would need some additional work to maybe early exit from callers
of wait_for_suspend()?
What I would also ask is to see if something like injecting a probing
failure is feasible, so we can have this new timeout exit path
constantly/regularly tested in CI.
Regards,
Tvrtko
> alan: Anyways, i have pulled this patch out of rev6 of this series and created a
> separate standalone patch for this patch #3 that we review independently.
>
next prev parent reply other threads:[~2023-11-14 17:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 19:05 [Intel-gfx] [PATCH v4 0/3] Resolve suspend-resume racing with GuC destroy-context-worker Alan Previn
2023-09-26 19:05 ` Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 1/3] drm/i915/guc: Flush context destruction worker at suspend Alan Previn
2023-09-26 19:05 ` Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-loss Alan Previn
2023-09-26 19:05 ` Alan Previn
2023-10-04 6:34 ` [Intel-gfx] " Gupta, Anshuman
2023-10-04 6:34 ` Gupta, Anshuman
2023-10-04 16:46 ` [Intel-gfx] " Teres Alexis, Alan Previn
2023-10-04 16:46 ` Teres Alexis, Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending Alan Previn
2023-09-26 19:05 ` Alan Previn
2023-09-27 9:02 ` [Intel-gfx] " Tvrtko Ursulin
2023-09-27 16:36 ` Teres Alexis, Alan Previn
2023-09-28 12:46 ` Tvrtko Ursulin
2023-10-04 17:59 ` Teres Alexis, Alan Previn
2023-10-25 12:58 ` Tvrtko Ursulin
2023-11-13 17:57 ` Teres Alexis, Alan Previn
2023-11-14 17:27 ` Tvrtko Ursulin
2023-11-14 17:36 ` Rodrigo Vivi
2023-11-14 19:34 ` Teres Alexis, Alan Previn
2023-11-14 17:37 ` Teres Alexis, Alan Previn
2023-11-14 17:52 ` Tvrtko Ursulin [this message]
2023-11-14 19:48 ` Teres Alexis, Alan Previn
2023-11-16 10:19 ` Tvrtko Ursulin
2023-09-27 1:23 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Resolve suspend-resume racing with GuC destroy-context-worker (rev4) Patchwork
2023-09-27 1:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-09-27 13:17 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc97f378-c8c6-4841-95da-740f3ecca187@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel.com@freedesktop.org \
--cc=mousumi.jana@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.