Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Shekhar Chauhan <shekhar.chauhan@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <rodrigo.vivi@intel.com>
Subject: Re: [PATCH] Decouple devcoredump from accessing core driver structures
Date: Thu, 12 Feb 2026 13:46:26 +0530	[thread overview]
Message-ID: <fde05d02-4eb8-4caa-a871-b3ee9205d747@intel.com> (raw)
In-Reply-To: <aYvKt64AeBC6Nvj6@lstrano-desk.jf.intel.com>


On 2/11/2026 5:47, Matthew Brost wrote:
> On Tue, Feb 10, 2026 at 10:48:09PM +0530, Shekhar Chauhan wrote:
>> The patch should have a 'drm/xe' prefix, missed it somehow. Will add it in
>> the next revision other feedback for the patch is provided!
>>
>> -shekhar
>>
>> On 2/10/2026 12:59, Shekhar Chauhan wrote:
>>> During devcoredump, print should not access any live gt pointers.
> Can you explain 'why'?
My thought process was, if we're accessing the live gt pointer while 
printing (and if the device has already crashed) we might face into 
issues since the gt pointer would be freed.
>
>>> Decouple the snapshot_print from accessing gt pointer by storing the
>>> captured data in a separate list and then print data from that list.
>>>
>>> Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_guc_capture.c     | 25 +++++++------------------
>>>    drivers/gpu/drm/xe/xe_hw_engine.c       | 20 ++++++++++++++++++++
>>>    drivers/gpu/drm/xe/xe_hw_engine_types.h |  7 +++++++
>>>    3 files changed, 34 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c
>>> index 2f5816c78fba..0555fb07c499 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_capture.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_capture.c
>>> @@ -1695,8 +1695,7 @@ static void
>>>    snapshot_print_by_list_order(struct xe_hw_engine_snapshot *snapshot, struct drm_printer *p,
>>>    			     u32 type, const struct __guc_mmio_reg_descr_group *list)
>>>    {
>>> -	struct xe_gt *gt = snapshot->hwe->gt;
>>> -	struct xe_device *xe = gt_to_xe(gt);
>>> +	struct xe_device *xe = snapshot->xe;
> If an Xe object is available - so is a 'hwe' and 'gt'. All of these
> persistent objects created during driver load and the memory is free'd
> upin driver unload. i.e., It is not possible for a GT or HWE to invalid
> but to have a valid Xe object unless I'm missing something.
Yeah, I just realised that my change doesn't make sense here.
>
>>>    	struct xe_devcoredump *devcoredump = &xe->devcoredump;
>>>    	struct xe_devcoredump_snapshot *devcore_snapshot = &devcoredump->snapshot;
>>>    	struct gcap_reg_list_info *reginfo = NULL;
>>> @@ -1806,7 +1805,6 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm
>>>    	};
>>>    	int type;
>>>    	const struct __guc_mmio_reg_descr_group *list;
>>> -	enum guc_capture_list_class_type capture_class;
>>>    	struct xe_gt *gt;
>>>    	struct xe_device *xe;
>>> @@ -1826,8 +1824,6 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm
>>>    	xe_gt_assert(gt, snapshot->hwe);
>>> -	capture_class = xe_engine_class_to_guc_capture_class(snapshot->hwe->class);
>>> -
>>>    	drm_printf(p, "%s (physical), logical instance=%d\n",
>>>    		   snapshot->name ? snapshot->name : "",
>>>    		   snapshot->logical_instance);
>>> @@ -1841,23 +1837,16 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm
>>>    		   str_yes_no(snapshot->kernel_reserved));
>>>    	for (type = GUC_STATE_CAPTURE_TYPE_GLOBAL; type < GUC_STATE_CAPTURE_TYPE_MAX; type++) {
>>> -		/*
>>> -		 * FIXME: During devcoredump print we should avoid accessing the
>>> -		 * driver pointers for gt or engine. Printing should be done only
>>> -		 * using the snapshot captured. Here we are accessing the gt
>>> -		 * pointer. It should be fixed.
>>> -		 */
>>> -		list = xe_guc_capture_get_reg_desc_list(gt, GUC_CAPTURE_LIST_INDEX_PF, type,
>>> -							capture_class, false);
> Again unless I'm completely missing something this is accessing driver
> persistent data which seems fine to access during
> xe_engine_snapshot_print... What we shouldn't do during print is access
> the hardware which is not what this is doing.
>
> Matt

I was under the impression that xe_guc_capture_get_reg_desc_list is 
accessing the hw, so better to store the data while capturing and then 
print it later with that list. If that is not the case, then I think the 
FIXME which is present above is wrong.

-shekhar

>>> +
>>> +		list = snapshot->list[type];
>>> +
>>>    		snapshot_print_by_list_order(snapshot, p, type, list);
>>>    	}
>>> -	if (capture_class == GUC_CAPTURE_LIST_CLASS_RENDER_COMPUTE) {
>>> -		list = xe_guc_capture_get_reg_desc_list(gt, GUC_CAPTURE_LIST_INDEX_PF,
>>> -							GUC_STATE_CAPTURE_TYPE_ENGINE_CLASS,
>>> -							capture_class, true);
>>> +	if (snapshot->render_compute_list) {
>>> +
>>>    		snapshot_print_by_list_order(snapshot, p, GUC_STATE_CAPTURE_TYPE_ENGINE_CLASS,
>>> -					     list);
>>> +					     snapshot->render_compute_list);
>>>    	}
>>>    	drm_puts(p, "\n");
>>> diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
>>> index 4d3ee5226e3a..a246b4f860e6 100644
>>> --- a/drivers/gpu/drm/xe/xe_hw_engine.c
>>> +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
>>> @@ -923,6 +923,8 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_exec_queue *q)
>>>    {
>>>    	struct xe_hw_engine_snapshot *snapshot;
>>>    	struct __guc_capture_parsed_output *node;
>>> +	enum guc_capture_list_class_type capture_class;
>>> +	int type;
>>>    	if (!xe_hw_engine_is_valid(hwe))
>>>    		return NULL;
>>> @@ -941,6 +943,24 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_exec_queue *q)
>>>    	snapshot->mmio_base = hwe->mmio_base;
>>>    	snapshot->kernel_reserved = xe_hw_engine_is_reserved(hwe);
>>> +	snapshot->xe = gt_to_xe(hwe->gt);
>>> +
>>> +	capture_class = xe_engine_class_to_guc_capture_class(hwe->class);
>>> +
>>> +	/* Capture the register list */
>>> +	for (type = GUC_STATE_CAPTURE_TYPE_GLOBAL; type < GUC_STATE_CAPTURE_TYPE_MAX; type++) {
>>> +		snapshot->list[type] = xe_guc_capture_get_reg_desc_list(hwe->gt,
>>> +				GUC_CAPTURE_LIST_INDEX_PF, type, capture_class, false);
>>> +	}
>>> +
>>> +	/* Capture the register list from RENDER / COMPUTE */
>>> +	if (capture_class == GUC_CAPTURE_LIST_CLASS_RENDER_COMPUTE) {
>>> +		snapshot->render_compute_list = xe_guc_capture_get_reg_desc_list(hwe->gt,
>>> +				GUC_CAPTURE_LIST_INDEX_PF, GUC_STATE_CAPTURE_TYPE_ENGINE_CLASS,
>>> +				capture_class, true);
>>> +	}
>>> +
>>> +
>>>    	/* no more VF accessible data below this point */
>>>    	if (IS_SRIOV_VF(gt_to_xe(hwe->gt)))
>>>    		return snapshot;
>>> diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
>>> index e4191a7a2c31..87ad1e68fb47 100644
>>> --- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
>>> @@ -6,6 +6,7 @@
>>>    #ifndef _XE_HW_ENGINE_TYPES_H_
>>>    #define _XE_HW_ENGINE_TYPES_H_
>>> +#include "abi/guc_capture_abi.h"
>>>    #include "xe_force_wake_types.h"
>>>    #include "xe_lrc_types.h"
>>>    #include "xe_reg_sr_types.h"
>>> @@ -167,6 +168,12 @@ struct xe_hw_engine_snapshot {
>>>    	char *name;
>>>    	/** @hwe: hw engine */
>>>    	struct xe_hw_engine *hwe;
>>> +	/** @xe: device pointer */
>>> +	struct xe_device *xe;
>>> +	/** @list: register list */
>>> +	const struct __guc_mmio_reg_descr_group *list[GUC_STATE_CAPTURE_TYPE_MAX];
>>> +	/** @render_compute_list: render/compute list */
>>> +	const struct __guc_mmio_reg_descr_group *render_compute_list;
>>>    	/** @logical_instance: logical instance of this hw engine */
>>>    	u16 logical_instance;
>>>    	/** @forcewake: Force Wake information snapshot */

      reply	other threads:[~2026-02-12  8:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10  7:29 [PATCH] Decouple devcoredump from accessing core driver structures Shekhar Chauhan
2026-02-10  7:34 ` ✗ CI.checkpatch: warning for " Patchwork
2026-02-10  7:35 ` ✓ CI.KUnit: success " Patchwork
2026-02-10  8:09 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-10 12:33 ` ✓ Xe.CI.FULL: " Patchwork
2026-02-10 17:18 ` [PATCH] " Shekhar Chauhan
2026-02-11  0:17   ` Matthew Brost
2026-02-12  8:16     ` Shekhar Chauhan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fde05d02-4eb8-4caa-a871-b3ee9205d747@intel.com \
    --to=shekhar.chauhan@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox