public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
	"Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture output register list.
Date: Fri, 24 Dec 2021 12:09:38 +0000	[thread overview]
Message-ID: <c0fa7466-ecdc-4768-0584-6937e7f0d71a@linux.intel.com> (raw)
In-Reply-To: <fb0f6eace4bd1c243544a0804ffa9fa5b16159a6.camel@intel.com>


Hi,

Somehow I stumbled on this while browsing through the mailing list.

On 23/12/2021 18:54, Teres Alexis, Alan Previn wrote:
> Revisiting below hunk of patch-7 comment, as per offline discussion with Matt,
> there is little benefit to even making that guc-id lookup because:
> 
> 1. the delay between the context reset notification (when the vmas are copied
>     and when we verify we had received a guc err capture dump) may be subjectively
>     large enough and not tethered that the guc-id may have already been re-assigned.
> 
> 2. I was really looking for some kind of unique context handle to print out that could
>     be correlated (by user inspecting the dump) back to a unique app or process or
>     context-id but cant find such a param in struct intel_context.
> 
> As part of further reviewing the end to end flows and possible error scenarios, there
> also may potentially be a mismatch between "which context was reset by guc at time-n"
> vs "which context's vma buffers is being printed out at time-n+x" if
> we are experiencing back-to-back resets and the user dumped the debugfs x-time later.

What does this all actually mean, because it sounds rather alarming, 
that it just won't be possible to know which context, belonging to which 
process, was reset? And because of guc_id potentially re-assigned even 
the captured VMAs may not be the correct ones?

Regards,

Tvrtko

> 
> (Recap: First, guc notifies capture event, second, guc notifies context reset during
> which we trigger i915_gpu_coredump. In this second step, the vma's are dumped and we
> verify that the guc capture happened but don't parse the guc-err-capture-logs yet.
> Third step is when user triggers the debugfs to dump which is when we parse the error
> capture logs.)
> 
> As a fix, what we can do in the guc_error_capture report out is to ensure that
> we dont re-print the previously dumped vmas if we end up finding multiple
> guc-error-capture dumps since the i915_gpu_coredump would have only captured the vma's
> for the very first context that was reset. And with guc-submission, that would always
> correlate to the "next-yet-to-be-parsed" guc-err-capture dump (since the guc-error-capture
> logs are large enough to hold data for multiple dumps).
> 
> The changes (removal of below-hunk and adding of only-print-the-first-vma") is trivial
> but i felt it warranted a good explanation. Apologies for the inbox noise.
> 
> ...alan
> 
> On Tue, 2021-12-07 at 22:32 -0800, Alan Previn Teres Alexis wrote:
>> Thanks again for the detailed review here.
>> Will fix all the rest on next rev.
>> One special response for this one:
>>
>>
>> On Tue, 2021-12-07 at 16:22 -0800, Matthew Brost wrote:
>>> On Mon, Nov 22, 2021 at 03:04:02PM -0800, Alan Previn wrote:
>>>> +			if (datatype == GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE) {
>>>> +				GCAP_PRINT_GUC_INST_INFO(i915, ebuf, data);
>>>> +				eng_inst = FIELD_GET(GUC_CAPTURE_DATAHDR_SRC_INSTANCE, data.info);
>>>> +				eng = guc_lookup_engine(guc, engineclass, eng_inst);
>>>> +				if (eng) {
>>>> +					GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng);
>>>> +				} else {
>>>> +					PRINT(&i915->drm, ebuf, "    i915-Eng-Lookup Fail!\n");
>>>> +				}
>>>> +				ce = guc_context_lookup(guc, data.guc_ctx_id);
>>>
>>> You are going to need to reference count the 'ce' here. See
>>> intel_guc_context_reset_process_msg for an example.
>>>
>>
>> Oh crap - i missed this one - which you had explicitly mentioned offline when i was doing the
>> development. Sorry about that i just totally missed it from my todo-notes.
>>
>> ...alan
> 

  reply	other threads:[~2021-12-24 12:09 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 23:03 [Intel-gfx] [RFC 0/7] Add GuC Error Capture Support Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 1/7] drm/i915/guc: Add basic support for error capture lists Alan Previn
2021-11-23 21:12   ` Michal Wajdeczko
2021-12-08 18:23     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 2/7] drm/i915/guc: Update GuC ADS size " Alan Previn
2021-11-23 21:46   ` Michal Wajdeczko
2021-11-24  9:52     ` Jani Nikula
2021-11-24 17:34     ` Teres Alexis, Alan Previn
2021-12-21 23:15       ` Teres Alexis, Alan Previn
2021-12-22  1:49       ` Teres Alexis, Alan Previn
2021-12-22 20:13     ` Teres Alexis, Alan Previn
2021-11-24 10:06   ` Jani Nikula
2021-11-24 17:37     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 3/7] drm/i915/guc: Populate XE_LP register lists for GuC error state capture Alan Previn
2021-11-23 21:55   ` Michal Wajdeczko
2021-11-24 17:16     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 4/7] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2021-11-24 10:08   ` Jani Nikula
2021-11-24 17:37     ` Teres Alexis, Alan Previn
2021-12-07 21:01   ` Matthew Brost
2021-12-07 23:35     ` Teres Alexis, Alan Previn
2021-11-22 23:04 ` [Intel-gfx] [RFC 5/7] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2021-12-07 22:31   ` Matthew Brost
2021-12-07 23:33     ` Teres Alexis, Alan Previn
2021-12-07 23:30       ` Matthew Brost
2021-11-22 23:04 ` [Intel-gfx] [RFC 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification Alan Previn
2021-12-07 22:58   ` Matthew Brost
2021-12-08  5:14     ` Teres Alexis, Alan Previn
2021-12-08 18:22       ` Teres Alexis, Alan Previn
2021-11-22 23:04 ` [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2021-11-23  0:25   ` Teres Alexis, Alan Previn
2021-12-08  0:22   ` Matthew Brost
2021-12-08  6:31     ` Teres Alexis, Alan Previn
2021-12-23 18:54       ` Teres Alexis, Alan Previn
2021-12-24 12:09         ` Tvrtko Ursulin [this message]
2021-12-24 13:34           ` Teres Alexis, Alan Previn
2022-01-04 13:56             ` Tvrtko Ursulin
2022-01-05 17:30               ` Teres Alexis, Alan Previn
2022-01-06  9:38                 ` Tvrtko Ursulin
2022-01-06 18:33                   ` Teres Alexis, Alan Previn
2022-01-07  9:03                     ` Tvrtko Ursulin
2022-01-07 17:03                       ` Teres Alexis, Alan Previn
2022-01-10  8:07                         ` Tvrtko Ursulin
2022-01-10 18:19                           ` Teres Alexis, Alan Previn
2022-01-11 10:08                             ` Tvrtko Ursulin
2022-01-14  7:16                               ` Teres Alexis, Alan Previn
2021-11-22 23:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support Patchwork
2021-11-22 23:45 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-11-23  0:16 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-11-23  0:40 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add GuC Error Capture Support (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0fa7466-ecdc-4768-0584-6937e7f0d71a@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox