From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
<alan.previn.teres.alexis@intel.com>, <zhanjun.dong@intel.com>
Subject: Re: [PATCH 5/7] drm/xe: Add exec queue param to devcoredump
Date: Fri, 8 Nov 2024 17:21:08 -0500 [thread overview]
Message-ID: <Zy6O1H7st17n-JoH@intel.com> (raw)
In-Reply-To: <20241108174312.272792-6-matthew.brost@intel.com>
On Fri, Nov 08, 2024 at 09:43:10AM -0800, Matthew Brost wrote:
> Add job may unavailable at capture time (e.g., LR mode) while an exec
> queue is. Add exec queue param for such use cases.
why?! if so, don't we have other problems?
>
> Cc: Zhanjun Dong <zhanjun.dong@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_devcoredump.c | 15 +++++++++------
> drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++++--
> drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
> 3 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index d3570d3d573c..c32cbb46ef8c 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -238,10 +238,10 @@ static void xe_devcoredump_free(void *data)
> }
>
> static void devcoredump_snapshot(struct xe_devcoredump *coredump,
> + struct xe_exec_queue *q,
> struct xe_sched_job *job)
> {
> struct xe_devcoredump_snapshot *ss = &coredump->snapshot;
> - struct xe_exec_queue *q = job->q;
> struct xe_guc *guc = exec_queue_to_guc(q);
> u32 adj_logical_mask = q->logical_mask;
> u32 width_mask = (0x1 << q->width) - 1;
> @@ -278,10 +278,12 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
> ss->guc.log = xe_guc_log_snapshot_capture(&guc->log, true);
> ss->guc.ct = xe_guc_ct_snapshot_capture(&guc->ct);
> ss->ge = xe_guc_exec_queue_snapshot_capture(q);
> - ss->job = xe_sched_job_snapshot_capture(job);
> + if (job)
> + ss->job = xe_sched_job_snapshot_capture(job);
> ss->vm = xe_vm_snapshot_capture(q->vm);
>
> - xe_engine_snapshot_capture_for_job(job);
> + if (job)
> + xe_engine_snapshot_capture_for_job(job);
>
> queue_work(system_unbound_wq, &ss->work);
>
> @@ -291,15 +293,16 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
>
> /**
> * xe_devcoredump - Take the required snapshots and initialize coredump device.
> + * @q: The faulty xe_exec_queue, where the issue was detected.
> * @job: The faulty xe_sched_job, where the issue was detected.
> *
> * This function should be called at the crash time within the serialized
> * gt_reset. It is skipped if we still have the core dump device available
> * with the information of the 'first' snapshot.
> */
> -void xe_devcoredump(struct xe_sched_job *job)
> +void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job)
> {
> - struct xe_device *xe = gt_to_xe(job->q->gt);
> + struct xe_device *xe = gt_to_xe(q->gt);
> struct xe_devcoredump *coredump = &xe->devcoredump;
>
> if (coredump->captured) {
> @@ -308,7 +311,7 @@ void xe_devcoredump(struct xe_sched_job *job)
> }
>
> coredump->captured = true;
> - devcoredump_snapshot(coredump, job);
> + devcoredump_snapshot(coredump, q, job);
>
> drm_info(&xe->drm, "Xe device coredump has been created\n");
> drm_info(&xe->drm, "Check your /sys/class/drm/card%d/device/devcoredump/data\n",
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
> index a4eebc285fc8..c04a534e3384 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.h
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.h
> @@ -10,13 +10,15 @@
>
> struct drm_printer;
> struct xe_device;
> +struct xe_exec_queue;
> struct xe_sched_job;
>
> #ifdef CONFIG_DEV_COREDUMP
> -void xe_devcoredump(struct xe_sched_job *job);
> +void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job);
> int xe_devcoredump_init(struct xe_device *xe);
> #else
> -static inline void xe_devcoredump(struct xe_sched_job *job)
> +static inline void xe_devcoredump(struct xe_exec_queue *q,
> + struct xe_sched_job *job)
> {
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 2cf4750bc24d..974c7af7064d 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1162,7 +1162,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> trace_xe_sched_job_timedout(job);
>
> if (!exec_queue_killed(q))
> - xe_devcoredump(job);
> + xe_devcoredump(q, job);
>
> /*
> * Kernel jobs should never fail, nor should VM jobs if they do
> --
> 2.34.1
>
next prev parent reply other threads:[~2024-11-08 22:21 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 17:43 [PATCH 0/7] Devcoredump Improvements Matthew Brost
2024-11-08 17:43 ` [PATCH 1/7] drm/xe: Add xe_lrc_is_idle() helper Matthew Brost
2024-11-08 20:11 ` Rodrigo Vivi
2024-11-08 22:00 ` Cavitt, Jonathan
2024-11-08 22:06 ` Dong, Zhanjun
2024-11-08 22:58 ` Matthew Brost
2024-11-08 17:43 ` [PATCH 2/7] drm/xe: Add ring address to LRC snapshot Matthew Brost
2024-11-08 20:12 ` Rodrigo Vivi
2024-11-08 22:05 ` Cavitt, Jonathan
2024-11-08 23:10 ` Matthew Brost
2024-11-08 23:34 ` Cavitt, Jonathan
2024-11-12 17:59 ` John Harrison
2024-11-12 18:18 ` Matthew Brost
2024-11-12 20:16 ` Cavitt, Jonathan
2024-11-12 20:30 ` Matt Roper
2024-11-12 20:46 ` Rodrigo Vivi
2024-11-12 21:21 ` Cavitt, Jonathan
2024-11-12 22:26 ` Matt Roper
2024-11-08 17:43 ` [PATCH 3/7] drm/xe: Add ring start " Matthew Brost
2024-11-08 22:07 ` Cavitt, Jonathan
2024-11-08 17:43 ` [PATCH 4/7] drm/xe: Improve schedule disable response failure Matthew Brost
2024-11-08 22:07 ` Cavitt, Jonathan
2024-11-08 17:43 ` [PATCH 5/7] drm/xe: Add exec queue param to devcoredump Matthew Brost
2024-11-08 22:21 ` Rodrigo Vivi [this message]
2024-11-08 22:56 ` Matthew Brost
2024-11-08 22:22 ` Cavitt, Jonathan
2024-11-08 17:43 ` [PATCH 6/7] drm/xe: Change xe_engine_snapshot_capture_for_job to be for_queue Matthew Brost
2024-11-08 22:27 ` Cavitt, Jonathan
2024-11-11 22:15 ` Dong, Zhanjun
2024-11-11 22:41 ` Dong, Zhanjun
2024-11-08 17:43 ` [PATCH 7/7] drm/xe: Wire devcoredump to LR TDR Matthew Brost
2024-11-08 22:27 ` Cavitt, Jonathan
2024-11-08 17:47 ` ✓ CI.Patch_applied: success for Devcoredump Improvements Patchwork
2024-11-08 17:48 ` ✓ CI.checkpatch: " Patchwork
2024-11-08 17:49 ` ✓ CI.KUnit: " Patchwork
2024-11-08 18:00 ` ✓ CI.Build: " Patchwork
2024-11-08 18:03 ` ✗ CI.Hooks: failure " Patchwork
2024-11-08 18:04 ` ✓ CI.checksparse: success " Patchwork
2024-11-08 18:21 ` ✓ CI.BAT: " Patchwork
2024-11-09 20:30 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zy6O1H7st17n-JoH@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=zhanjun.dong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.