From: John Harrison <john.c.harrison@intel.com>
To: <Intel-Xe@Lists.FreeDesktop.Org>
Subject: Re: [PATCH v7 07/10] drm/xe/guc: Dead CT helper
Date: Tue, 10 Sep 2024 17:09:34 -0700 [thread overview]
Message-ID: <86d3b041-3233-4a01-9186-9caf741161bc@intel.com> (raw)
In-Reply-To: <20240905205106.1063091-8-John.C.Harrison@Intel.com>
On 9/5/2024 13:51, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> Add a worker function helper for asynchronously dumping state when an
> internal/fatal error is detected in CT processing. Being asynchronous
> is required to avoid deadlocks and scheduling-while-atomic or
> process-stalled-for-too-long issues. Also check for a bunch more error
> conditions and improve the handling of some existing checks.
>
> v2: Use compile time CONFIG check for new (but not directly CT_DEAD
> related) checks and use unsigned int for a bitmask, rename
> CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments,
> rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W.
> Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the
> entire mask to zero.
> v3: Fix kerneldoc
> v4: Nullify some floating pointers after free.
> v5: Add section headings and device info to make the state dump look
> more like a devcoredump to allow parsing by the same tools (eventual
> aim is to just call the devcoredump code itself, but that currently
> requires an xe_sched_job, which is not available in the CT code).
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
> .../drm/xe/abi/guc_communication_ctb_abi.h | 1 +
> drivers/gpu/drm/xe/xe_guc.c | 2 +-
> drivers/gpu/drm/xe/xe_guc_ct.c | 280 ++++++++++++++++--
> drivers/gpu/drm/xe/xe_guc_ct.h | 2 +-
> drivers/gpu/drm/xe/xe_guc_ct_types.h | 23 ++
> 5 files changed, 280 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
> index 8f86a16dc577..f58198cf2cf6 100644
> --- a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
> @@ -52,6 +52,7 @@ struct guc_ct_buffer_desc {
> #define GUC_CTB_STATUS_OVERFLOW (1 << 0)
> #define GUC_CTB_STATUS_UNDERFLOW (1 << 1)
> #define GUC_CTB_STATUS_MISMATCH (1 << 2)
> +#define GUC_CTB_STATUS_DISABLED (1 << 3)
> u32 reserved[13];
> } __packed;
> static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 34cdb08b6e27..3fef24c965c4 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -1176,7 +1176,7 @@ void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p)
>
> xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>
> - xe_guc_ct_print(&guc->ct, p, false);
> + xe_guc_ct_print(&guc->ct, p);
> xe_guc_submit_print(guc, p);
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index a63fe0a9077a..e31b1f0b855f 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -25,12 +25,57 @@
> #include "xe_gt_sriov_pf_monitor.h"
> #include "xe_gt_tlb_invalidation.h"
> #include "xe_guc.h"
> +#include "xe_guc_log.h"
> #include "xe_guc_relay.h"
> #include "xe_guc_submit.h"
> #include "xe_map.h"
> #include "xe_pm.h"
> #include "xe_trace_guc.h"
>
> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> +enum {
> + CT_DEAD_REARM, /* 0x0001 - not an error condition */
> + CT_DEAD_SETUP, /* 0x0002 */
> + CT_DEAD_H2G_WRITE, /* 0x0004 */
> + CT_DEAD_H2G_HAS_ROOM, /* 0x0008 */
> + CT_DEAD_G2H_READ, /* 0x0010 */
> + CT_DEAD_G2H_RECV, /* 0x0020 */
> + CT_DEAD_G2H_RELEASE, /* 0x0040 */
> + CT_DEAD_DEADLOCK, /* 0x0080 */
> + CT_DEAD_PROCESS_FAILED, /* 0x0100 */
> + CT_DEAD_FAST_G2H, /* 0x0200 */
> + CT_DEAD_PARSE_G2H_RESPONSE, /* 0x0400 */
> + CT_DEAD_PARSE_G2H_UNKNOWN, /* 0x0800 */
> + CT_DEAD_PARSE_G2H_ORIGIN, /* 0x1000 */
> + CT_DEAD_PARSE_G2H_TYPE, /* 0x2000 */
> +};
> +
> +static void ct_dead_worker_func(struct work_struct *w);
> +
> +#define CT_DEAD(ct, ctb, reason_code) \
> + do { \
> + struct guc_ctb *_ctb = (ctb); \
> + if (_ctb) \
> + _ctb->info.broken = true; \
> + if (!(ct)->dead.reported) { \
> + struct xe_guc *guc = ct_to_guc(ct); \
> + spin_lock_irq(&ct->dead.lock); \
This needs to be spin_lock_irqsave because CT_DEAD can be called inside
an ISR. Without the save/restore, it will explicitly enable interrupts
again and, as seen in the CI failure, that causes a WARNING:
irq 210 handler dg1_irq_handler+0x0/0x240 [xe] enabled interrupts
John.
next prev parent reply other threads:[~2024-09-11 0:09 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-05 20:50 [PATCH v7 00/10] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-09-05 20:50 ` [PATCH v7 01/10] drm/xe/guc: Remove spurious line feed in debug print John.C.Harrison
2024-09-10 19:32 ` Julia Filipchuk
2024-09-05 20:50 ` [PATCH v7 02/10] drm/xe/devcoredump: Add a section heading for the submission backend John.C.Harrison
2024-09-10 19:33 ` Julia Filipchuk
2024-09-05 20:50 ` [PATCH v7 03/10] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison
2024-09-06 1:54 ` Lucas De Marchi
2024-09-06 2:01 ` John Harrison
2024-09-06 3:04 ` Lucas De Marchi
2024-09-07 2:06 ` John Harrison
2024-09-10 1:31 ` John Harrison
2024-09-10 19:43 ` Lucas De Marchi
2024-09-10 20:17 ` John Harrison
2024-09-11 19:12 ` Lucas De Marchi
2024-09-11 19:30 ` Souza, Jose
2024-09-11 19:35 ` John Harrison
2024-09-11 19:54 ` Souza, Jose
2024-09-11 19:59 ` John Harrison
2024-09-12 13:57 ` Rodrigo Vivi
2024-09-12 18:06 ` John Harrison
2024-09-16 15:32 ` Rodrigo Vivi
2024-09-16 17:46 ` John Harrison
2024-09-11 19:31 ` John Harrison
2024-09-10 19:33 ` Julia Filipchuk
2024-09-11 1:27 ` John Harrison
2024-09-05 20:50 ` [PATCH v7 04/10] drm/xe/guc: Copy GuC log prior to dumping John.C.Harrison
2024-09-11 0:48 ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 05/10] drm/xe/guc: Use a two stage dump for GuC logs and add more info John.C.Harrison
2024-09-11 0:48 ` Julia Filipchuk
2024-09-11 1:14 ` John Harrison
2024-09-05 20:51 ` [PATCH v7 06/10] drm/print: Introduce drm_line_printer John.C.Harrison
2024-09-05 20:51 ` [PATCH v7 07/10] drm/xe/guc: Dead CT helper John.C.Harrison
2024-09-11 0:09 ` John Harrison [this message]
2024-09-11 19:55 ` Julia Filipchuk
2024-09-11 20:13 ` John Harrison
2024-09-11 20:57 ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 08/10] drm/xe/guc: Dump entire CTB on errors John.C.Harrison
2024-09-11 20:12 ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 09/10] drm/xe/guc: Add GuC log to devcoredump captures John.C.Harrison
2024-09-11 20:25 ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 10/10] drm/xe/guc: Add a helper function for dumping GuC log to dmesg John.C.Harrison
2024-09-11 20:36 ` Julia Filipchuk
2024-09-11 20:41 ` John Harrison
2024-09-05 20:57 ` ✓ CI.Patch_applied: success for drm/xe/guc: Improve GuC log dumping and add to devcoredump (rev2) Patchwork
2024-09-05 20:57 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-05 20:58 ` ✓ CI.KUnit: success " Patchwork
2024-09-05 21:10 ` ✓ CI.Build: " Patchwork
2024-09-05 21:13 ` ✓ CI.Hooks: " Patchwork
2024-09-05 21:14 ` ✗ CI.checksparse: warning " Patchwork
2024-09-05 22:06 ` ✗ CI.BAT: failure " Patchwork
2024-09-08 0:02 ` ✗ CI.FULL: " Patchwork
2024-09-12 9:16 ` [PATCH v7 00/10] drm/xe/guc: Improve GuC log dumping and add to devcoredump Jani Nikula
2024-09-12 18:50 ` John Harrison
2024-09-13 7:26 ` Jani Nikula
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86d3b041-3233-4a01-9186-9caf741161bc@intel.com \
--to=john.c.harrison@intel.com \
--cc=Intel-Xe@Lists.FreeDesktop.Org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox