From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v9 4/4] drm/xe/guc: Extract GuC capture lists to register snapshot
Date: Wed, 19 Jun 2024 16:17:53 -0400 [thread overview]
Message-ID: <348bcc7d-7c9b-42fd-b8f3-ef9309a1ac23@intel.com> (raw)
In-Reply-To: <42eec22527828fe82a5edf60ed02ba7a5e85b6d8.camel@intel.com>
See my comments below.
Regards,
Zhanjun
On 2024-06-13 7:26 p.m., Teres Alexis, Alan Previn wrote:
> alan: Hi there Zhanjun, as per offline conversation, i was surprised to
> see you squash the "Guc-error-capture-extraction" together with the
> "Plumb GuC-capture into dev coredump" patch but now realize i made a
> regrettable typo in my rev8 review comments - when i mentioned about
> posisbly squashing pre-allocate-nodes with node-extraction i said
> "patch 4 and patch 6" but what i meant was "patch 4 and patch 5".
> I'm terribly sorry about this mistake. Since we know that you
> have to redo this patch anyways based on offline conversation to
> resolve race between drm-tdr vs guc-context-reset, thanks for agreeing
> to keep extraction separate again. And ofc we don't need pre-allocating
> of nodes now.
will put patch 4 back
>
> On Thu, 2024-06-06 at 17:07 -0700, Zhanjun Dong wrote:
>> Upon the G2H Notify-Err-Capture event, parse through the
>> GuC Log Buffer (error-capture-subregion) and generate one or
>> more capture-nodes. A single node represents a single "engine-
>> instance-capture-dump" and contains at least 3 register lists:
>> global, engine-class and engine-instance. An internal link
>> list is maintained to store one or more nodes.
>> Because the link-list node generation happen before the call
>> to devcoredump, duplicate global and engine-class register
>> lists for each engine-instance register dump if we find
>> dependent-engine resets in a engine-capture-group.
>> When xe_devcoredump calls into snapshot_from_capture_engine,
...
>> +
>> +static inline u16 xe_engine_class_to_guc_capture_class(enum
>> xe_engine_class class)
>> +{
>> + return
>> xe_guc_class_to_capture_class(xe_guc_class_to_capture_class(class));
>> +}
>> +
>> static inline struct xe_gt *guc_to_gt(struct xe_guc *guc)
>> {
>> return container_of(guc, struct xe_gt, uc.guc);
>>
>
> alan:snip.
>
>> +static void
>> +__guc_capture_create_prealloc_nodes(struct xe_guc *guc)
>> +{
>> + struct __guc_capture_parsed_output *node = NULL;
>> + int i;
>> +
>> + for (i = 0; i < PREALLOC_NODES_MAX_COUNT; ++i) {
>> + node = guc_capture_alloc_one_node(guc);
>> + if (!node) {
>> + xe_gt_warn(guc_to_gt(guc), "Register capture
>> pre-alloc-cache failure\n");
>> + /* dont free the priors, use what we got and
>> cleanup at shutdown */
>> + return;
>> + }
>> + guc_capture_add_node_to_cachelist(guc->capture,
>> node);
>> + }
>> +}
>> +
> btw, why are we still using this pre-allocated-nodes framework and
> functions? i thought we decided we didn't need it for the case of Xe?
> (since we'll serialize the extraction-flow vs any reset-flow vs the
> devcoredump-printing by putting them on the same workqueue). In that
> case we can remove the prealocated nodes' cachelist extraction code
> requiring new ndes can dynamically by calling
> guc_capture_alloc_one_node directly
Current code is still using the pre-alloced node, I will try dynamic
alloc later. Another concern is the GFP_ATOMIC may alloc from emergency
pools, it might not be the better choice for debug purposed feature like
this.
>
> alan:snip. same comment above for below hnk.
>> +static void
>> +guc_capture_create_prealloc_nodes(struct xe_guc *guc)
>> +{
>> + /* skip if we've already done the pre-alloc */
>
...
>> /**
>> * xe_hw_engine_snapshot_capture - Take a quick snapshot of the HW
>> Engine.
>> * @hwe: Xe HW Engine.
>> @@ -839,8 +899,12 @@ struct xe_hw_engine_snapshot *
>> xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe)
>> {
>> struct xe_hw_engine_snapshot *snapshot;
>> + struct xe_gt *gt = hwe->gt;
>> + struct xe_device *xe = gt_to_xe(gt);
>> + struct xe_guc *guc = >->uc.guc;
>>
> alan:snip
>
>> + /* Check GuC settings, job is set and capture outlist not
>> empty,
>> + * otherwise take it from engine
>> + */
>> + if (xe_device_uc_enabled(xe) && xe->wedged.mode >= 1 &&
>> + !list_empty(&guc->capture->outlist) && xe-
>>> devcoredump.job)
>> + xe_hw_engine_find_and_copy_guc_capture_snapshot(hwe,
>> snapshot);
>> + else
>> + xe_hw_engine_snapshot_from_hw(hwe, snapshot);
>
> alan: in the above if-else check, we are assuming that guc's capture
> list not being empty means we have the captured node for this
> devcoredump triger... however, in larger scale platforms we could have
> multiple captured nodes from multiple engines so we should not
> be relying on just whether its empty or not, rather the check should
> call guc capture to find matching node based on lrc, guc-id, engnine
> etc (i.e. basically call
> xe_hw_engine_find_and_copy_guc_capture_snapshot directly to see if we
> have a dump.)
Good point, I will add the capture ready for job functions in next rev.
>
> alan:snip.
>
> I'll review the rest of this patch after rev9 is out since there are
> other changes incoming as per my first review comments on top.
next prev parent reply other threads:[~2024-06-19 20:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-07 0:07 [PATCH v9 0/4] drm/xe/guc: Add GuC based register capture for error capture Zhanjun Dong
2024-06-07 0:07 ` [PATCH v9 1/4] drm/xe/guc: Prepare GuC register list and update ADS size " Zhanjun Dong
2024-06-14 11:50 ` Michal Wajdeczko
2024-06-19 19:36 ` Dong, Zhanjun
2024-06-07 0:07 ` [PATCH v9 2/4] drm/xe/guc: Add XE_LP steered register lists Zhanjun Dong
2024-06-13 19:02 ` Teres Alexis, Alan Previn
2024-06-07 0:07 ` [PATCH v9 3/4] drm/xe/guc: Add capture size check in GuC log buffer Zhanjun Dong
2024-06-14 12:13 ` Michal Wajdeczko
2024-06-19 19:44 ` Dong, Zhanjun
2024-06-19 22:28 ` Michal Wajdeczko
2024-06-19 22:56 ` Dong, Zhanjun
2024-06-07 0:07 ` [PATCH v9 4/4] drm/xe/guc: Extract GuC capture lists to register snapshot Zhanjun Dong
2024-06-13 23:26 ` Teres Alexis, Alan Previn
2024-06-19 20:17 ` Dong, Zhanjun [this message]
2024-06-14 12:31 ` Michal Wajdeczko
2024-06-19 20:04 ` Dong, Zhanjun
2024-06-07 0:12 ` ✓ CI.Patch_applied: success for drm/xe/guc: Add GuC based register capture for error capture (rev9) Patchwork
2024-06-07 0:12 ` ✗ CI.checkpatch: warning " Patchwork
2024-06-07 0:13 ` ✓ CI.KUnit: success " Patchwork
2024-06-07 0:25 ` ✓ CI.Build: " Patchwork
2024-06-07 0:27 ` ✗ CI.Hooks: failure " Patchwork
2024-06-07 0:28 ` ✓ CI.checksparse: success " Patchwork
2024-06-07 1:11 ` ✓ CI.BAT: " Patchwork
2024-06-07 10:57 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=348bcc7d-7c9b-42fd-b8f3-ef9309a1ac23@intel.com \
--to=zhanjun.dong@intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox