From: John Harrison <john.c.harrison@intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
"Intel-GFX@Lists.FreeDesktop.Org"
<Intel-GFX@Lists.FreeDesktop.Org>
Cc: "DRI-Devel@Lists.FreeDesktop.Org" <DRI-Devel@Lists.FreeDesktop.Org>
Subject: Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Record CTB info in error logs
Date: Tue, 2 Aug 2022 17:20:53 -0700 [thread overview]
Message-ID: <bbb4cb7f-8158-1d3b-7adf-39e628e0b06c@intel.com> (raw)
In-Reply-To: <d9f6c68a1795ffd207bcaec3c7482241c1dce1ce.camel@intel.com>
On 8/2/2022 11:27, Teres Alexis, Alan Previn wrote:
> One minor NIT (though i hope it could be fixed otw in as it adds a bit of ease-of-log-readibility).
> That said, everything else looks good.
>
> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
>
> On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> When debugging GuC communication issues, it is useful to have the CTB
>> info available. So add the state and buffer contents to the error
>> capture log.
>>
>> Also, add a sub-structure for the GuC specific error capture info as
>> it is now becoming numerous.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_gpu_error.c | 59 +++++++++++++++++++++++----
>> drivers/gpu/drm/i915/i915_gpu_error.h | 20 +++++++--
>> 2 files changed, 67 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index addba75252343..543ba63f958ea 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -671,6 +671,18 @@ static void err_print_pciid(struct drm_i915_error_state_buf *m,
>> pdev->subsystem_device);
>> }
>>
>> +static void err_print_guc_ctb(struct drm_i915_error_state_buf *m,
>> + const char *name,
>> + const struct intel_ctb_coredump *ctb)
>> +{
>> + if (!ctb->size)
>> + return;
>> +
>> + err_printf(m, "GuC %s CTB: raw: 0x%08X, 0x%08X/%08X, cached: 0x%08X/%08X, desc = 0x%08X, buf = 0x%08X x 0x%08X\n",
>> + name, ctb->raw_status, ctb->raw_head, ctb->raw_tail,
>> + ctb->head, ctb->tail, ctb->desc_offset, ctb->cmds_offset, ctb->size);
>>
> NIT: to make it more readible on first glance, would be nice to add more descriptive text like "raw: Sts:0x%08X,
> Hd:0x%08X,Tl:0x@08X..." also, the not sure why cmds_offset is presented with a "x size" as opposed to just "desc-off =
> foo1, cmd-off = foo2, size = foo3"?
The line is long enough as it is. I'd rather not make it even longer.
Same for '<name>: <address> x <size>' rather than '<name> _addr =
<address>, <name>_size = <size>'. It's useful for readability to keep a
single CTB channel on a single line but not if that line is excessively
long.
John.
>> +}
>> +
>> static void err_print_uc(struct drm_i915_error_state_buf *m,
>> const struct intel_uc_coredump *error_uc)
>> {
>> @@ -678,8 +690,12 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
>>
>> intel_uc_fw_dump(&error_uc->guc_fw, &p);
>> intel_uc_fw_dump(&error_uc->huc_fw, &p);
>> - err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
>> - intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
>> + err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->guc.timestamp);
>> + intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_log);
>> + err_printf(m, "GuC CTB fence: %d\n", error_uc->guc.last_fence);
>> + err_print_guc_ctb(m, "Send", error_uc->guc.ctb + 0);
>> + err_print_guc_ctb(m, "Recv", error_uc->guc.ctb + 1);
>> + intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_ctb);
>> }
>>
>> static void err_free_sgl(struct scatterlist *sgl)
>> @@ -854,7 +870,7 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>> if (error->gt) {
>> bool print_guc_capture = false;
>>
>> - if (error->gt->uc && error->gt->uc->is_guc_capture)
>> + if (error->gt->uc && error->gt->uc->guc.is_guc_capture)
>> print_guc_capture = true;
>>
>> err_print_gt_display(m, error->gt);
>> @@ -1009,7 +1025,8 @@ static void cleanup_uc(struct intel_uc_coredump *uc)
>> {
>> kfree(uc->guc_fw.path);
>> kfree(uc->huc_fw.path);
>> - i915_vma_coredump_free(uc->guc_log);
>> + i915_vma_coredump_free(uc->guc.vma_log);
>> + i915_vma_coredump_free(uc->guc.vma_ctb);
>>
>> kfree(uc);
>> }
>> @@ -1658,6 +1675,23 @@ gt_record_engines(struct intel_gt_coredump *gt,
>> }
>> }
>>
>> +static void gt_record_guc_ctb(struct intel_ctb_coredump *saved,
>> + const struct intel_guc_ct_buffer *ctb,
>> + const void *blob_ptr, struct intel_guc *guc)
>> +{
>> + if (!ctb || !ctb->desc)
>> + return;
>> +
>> + saved->raw_status = ctb->desc->status;
>> + saved->raw_head = ctb->desc->head;
>> + saved->raw_tail = ctb->desc->tail;
>> + saved->head = ctb->head;
>> + saved->tail = ctb->tail;
>> + saved->size = ctb->size;
>> + saved->desc_offset = ((void *)ctb->desc) - blob_ptr;
>> + saved->cmds_offset = ((void *)ctb->cmds) - blob_ptr;
>> +}
>> +
>> static struct intel_uc_coredump *
>> gt_record_uc(struct intel_gt_coredump *gt,
>> struct i915_vma_compress *compress)
>> @@ -1684,9 +1718,16 @@ gt_record_uc(struct intel_gt_coredump *gt,
>> * log times to system times (in conjunction with the error->boottime and
>> * gt->clock_frequency fields saved elsewhere).
>> */
>> - error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>> - error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>> - "GuC log buffer", compress);
>> + error_uc->guc.timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>> + error_uc->guc.vma_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>> + "GuC log buffer", compress);
>> + error_uc->guc.vma_ctb = create_vma_coredump(gt->_gt, uc->guc.ct.vma,
>> + "GuC CT buffer", compress);
>> + error_uc->guc.last_fence = uc->guc.ct.requests.last_fence;
>> + gt_record_guc_ctb(error_uc->guc.ctb + 0, &uc->guc.ct.ctbs.send,
>> + uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
>> + gt_record_guc_ctb(error_uc->guc.ctb + 1, &uc->guc.ct.ctbs.recv,
>> + uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
>>
>> return error_uc;
>> }
>> @@ -2039,9 +2080,9 @@ __i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask, u32 du
>> error->gt->uc = gt_record_uc(error->gt, compress);
>> if (error->gt->uc) {
>> if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE)
>> - error->gt->uc->is_guc_capture = true;
>> + error->gt->uc->guc.is_guc_capture = true;
>> else
>> - GEM_BUG_ON(error->gt->uc->is_guc_capture);
>> + GEM_BUG_ON(error->gt->uc->guc.is_guc_capture);
>> }
>> }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>> index d8a8b3d529e09..efc75cc2ffdb9 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>> @@ -125,6 +125,15 @@ struct intel_engine_coredump {
>> struct intel_engine_coredump *next;
>> };
>>
>> +struct intel_ctb_coredump {
>> + u32 raw_head, head;
>> + u32 raw_tail, tail;
>> + u32 raw_status;
>> + u32 desc_offset;
>> + u32 cmds_offset;
>> + u32 size;
>> +};
>> +
>> struct intel_gt_coredump {
>> const struct intel_gt *_gt;
>> bool awake;
>> @@ -165,9 +174,14 @@ struct intel_gt_coredump {
>> struct intel_uc_coredump {
>> struct intel_uc_fw guc_fw;
>> struct intel_uc_fw huc_fw;
>> - struct i915_vma_coredump *guc_log;
>> - u32 timestamp;
>> - bool is_guc_capture;
>> + struct guc_info {
>> + struct intel_ctb_coredump ctb[2];
>> + struct i915_vma_coredump *vma_ctb;
>> + struct i915_vma_coredump *vma_log;
>> + u32 timestamp;
>> + u16 last_fence;
>> + bool is_guc_capture;
>> + } guc;
>> } *uc;
>>
>> struct intel_gt_coredump *next;
>> --
>> 2.37.1
>>
next prev parent reply other threads:[~2022-08-03 0:21 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 2:20 [Intel-gfx] [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
2022-07-28 2:20 ` [Intel-gfx] [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
2022-08-02 17:37 ` Teres Alexis, Alan Previn
2022-08-03 0:29 ` John Harrison
2022-07-28 2:20 ` [Intel-gfx] [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size John.C.Harrison
2022-08-02 17:46 ` Teres Alexis, Alan Previn
2022-07-28 2:20 ` [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
2022-08-05 0:40 ` Teres Alexis, Alan Previn
2022-08-08 18:43 ` John Harrison
2022-08-15 4:55 ` Teres Alexis, Alan Previn
2022-08-19 10:45 ` Jani Nikula
2022-08-19 21:02 ` John Harrison
2022-08-23 10:09 ` Jani Nikula
2022-07-28 2:20 ` [Intel-gfx] [PATCH 4/7] drm/i915/guc: Record CTB info in error logs John.C.Harrison
2022-08-02 18:27 ` Teres Alexis, Alan Previn
2022-08-03 0:20 ` John Harrison [this message]
2022-07-28 2:20 ` [Intel-gfx] [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
2022-08-02 18:48 ` Teres Alexis, Alan Previn
2022-08-03 0:14 ` John Harrison
2022-07-28 2:20 ` [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
2022-08-15 5:43 ` Teres Alexis, Alan Previn
2022-08-24 9:01 ` Joonas Lahtinen
[not found] ` <4bd7b51a-caf0-d987-c7df-6cfb24f36597@intel.com>
2022-08-25 7:15 ` Joonas Lahtinen
2022-08-25 16:31 ` John Harrison
2022-08-26 6:23 ` Joonas Lahtinen
2022-09-12 7:12 ` Joonas Lahtinen
2022-09-12 23:46 ` John Harrison
2022-07-28 2:20 ` [Intel-gfx] [PATCH 7/7] drm/i915/guc: Reduce spam from error capture John.C.Harrison
2022-08-02 18:54 ` Teres Alexis, Alan Previn
2022-07-28 2:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fixes and improvements to GuC logging and " Patchwork
2022-07-28 2:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-07-28 2:57 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-07-28 9:31 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-08-16 0:53 ` John Harrison
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbb4cb7f-8158-1d3b-7adf-39e628e0b06c@intel.com \
--to=john.c.harrison@intel.com \
--cc=DRI-Devel@Lists.FreeDesktop.Org \
--cc=Intel-GFX@Lists.FreeDesktop.Org \
--cc=alan.previn.teres.alexis@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox