From: Jani Nikula <jani.nikula@linux.intel.com>
To: Zhanjun Dong <zhanjun.dong@intel.com>, intel-xe@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/xe/guc: Add register defines for GuC based register capture
Date: Wed, 17 Jan 2024 10:23:27 +0200 [thread overview]
Message-ID: <87ply0qsg0.fsf@intel.com> (raw)
In-Reply-To: <20240116231600.154097-2-zhanjun.dong@intel.com>
On Tue, 16 Jan 2024, Zhanjun Dong <zhanjun.dong@intel.com> wrote:
> Add registers defines and list of registers for GuC based error state capture.
>
> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
> ---
> drivers/gpu/drm/xe/Kconfig | 11 +++
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/regs/xe_engine_regs.h | 12 +++
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 20 +++++
> drivers/gpu/drm/xe/xe_guc.c | 7 ++
> drivers/gpu/drm/xe/xe_guc_capture.c | 101 +++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_guc_capture.h | 19 +++++
> 7 files changed, 171 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.c
> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.h
>
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index 1b57ae38210d..236763569877 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -83,6 +83,17 @@ config DRM_XE_FORCE_PROBE
>
> Use "!*" to block the probe of the driver for all known devices.
>
> +config DRM_XE_CAPTURE_ERROR
> + bool "Enable capturing GPU state following a hang"
> + depends on DRM_XE
> + default y
> + help
> + This option enables capturing the GPU state when a hang is detected.
> + This information is vital for triaging hangs and assists in debugging.
> + Please report any hang to your Intel representative to help with triaging.
> +
> + If in doubt, say "Y".
> +
> menu "drm/Xe Debugging"
> depends on DRM_XE
> depends on EXPERT
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index e16b84f79ddf..4cba24ff55e2 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -91,6 +91,7 @@ xe-y += xe_bb.o \
> xe_gt_topology.o \
> xe_guc.o \
> xe_guc_ads.o \
> + xe_guc_capture.o \
> xe_guc_ct.o \
> xe_guc_db_mgr.o \
> xe_guc_debugfs.o \
> diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> index 0b1266c88a6a..06015703a33e 100644
> --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> @@ -64,10 +64,16 @@
>
> #define RING_ACTHD_UDW(base) XE_REG((base) + 0x5c)
> #define RING_DMA_FADD_UDW(base) XE_REG((base) + 0x60)
> +#define RING_IPEIR(base) XE_REG((base) + 0x64)
> #define RING_IPEHR(base) XE_REG((base) + 0x68)
> +#define RING_INSTDONE(base) XE_REG((base) + 0x6c)
> +#define RING_INSTPS(base) XE_REG((base) + 0x70)
> +
> #define RING_ACTHD(base) XE_REG((base) + 0x74)
> #define RING_DMA_FADD(base) XE_REG((base) + 0x78)
> #define RING_HWS_PGA(base) XE_REG((base) + 0x80)
> +#define IPEIR(base) XE_REG((base) + 0x88)
> +
> #define RING_HWSTAM(base) XE_REG((base) + 0x98)
> #define RING_MI_MODE(base) XE_REG((base) + 0x9c)
> #define RING_NOPID(base) XE_REG((base) + 0x94)
> @@ -111,9 +117,12 @@
> #define FF_DOP_CLOCK_GATE_DISABLE REG_BIT(1)
> #define REPLAY_MODE_GRANULARITY REG_BIT(0)
>
> +#define RING_BBSTATE(base) XE_REG((base) + 0x110)
> #define RING_BBADDR(base) XE_REG((base) + 0x140)
> #define RING_BBADDR_UDW(base) XE_REG((base) + 0x168)
>
> +#define CCID(base) XE_REG((base) + 0x180)
> +
> #define BCS_SWCTRL(base) XE_REG((base) + 0x200, XE_REG_OPTION_MASKED)
> #define BCS_SWCTRL_DISABLE_256B REG_BIT(2)
>
> @@ -129,6 +138,9 @@
> #define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH REG_BIT(3)
> #define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT REG_BIT(0)
>
> +#define RING_PDP_UDW(base, n) XE_REG((base) + 0x270 + (n) * 8 + 4)
> +#define RING_PDP_LDW(base, n) XE_REG((base) + 0x270 + (n) * 8)
> +
> #define RING_MODE(base) XE_REG((base) + 0x29c)
> #define GFX_DISABLE_LEGACY_MODE REG_BIT(3)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 4017319c6300..d68a8b447842 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -67,6 +67,8 @@
> #define VE1_AUX_INV XE_REG(0x42b8)
> #define AUX_INV REG_BIT(0)
>
> +#define AUX_ERR_DBG XE_REG(0x43f4)
> +
> #define XEHP_TILE_ADDR_RANGE(_idx) XE_REG_MCR(0x4900 + (_idx) * 4)
> #define XEHP_FLAT_CCS_BASE_ADDR XE_REG_MCR(0x4910)
>
> @@ -94,6 +96,8 @@
> #define FF_MODE2_TDS_TIMER_MASK REG_GENMASK(23, 16)
> #define FF_MODE2_TDS_TIMER_128 REG_FIELD_PREP(FF_MODE2_TDS_TIMER_MASK, 4)
>
> +#define XEHPG_INSTDONE_GEOM_SVG XE_REG_MCR(0x666c)
> +
> #define CACHE_MODE_1 XE_REG(0x7004, XE_REG_OPTION_MASKED)
> #define MSAA_OPTIMIZATION_REDUC_DISABLE REG_BIT(11)
>
> @@ -110,6 +114,10 @@
> #define FLSH_IGNORES_PSD REG_BIT(10)
> #define FD_END_COLLECT REG_BIT(5)
>
> +#define SC_INSTDONE XE_REG(0x7100)
> +#define SC_INSTDONE_EXTRA XE_REG(0x7104)
> +#define SC_INSTDONE_EXTRA2 XE_REG(0x7108)
> +
> #define COMMON_SLICE_CHICKEN4 XE_REG(0x7300, XE_REG_OPTION_MASKED)
> #define DISABLE_TDC_LOAD_BALANCING_CALC REG_BIT(6)
>
> @@ -299,6 +307,11 @@
>
> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>
> +#define FAULT_TLB_DATA0 XE_REG(0xceb8)
> +#define FAULT_TLB_DATA1 XE_REG(0xcebc)
> +
> +#define RING_FAULT_REG XE_REG(0xcec4)
> +
> #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28)
> #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c)
> #define COMP_MOD_CTRL XE_REG_MCR(0xcf30)
> @@ -317,6 +330,11 @@
> #define INVALIDATION_BROADCAST_MODE_DIS REG_BIT(12)
> #define GLOBAL_INVALIDATION_MODE REG_BIT(2)
>
> +#define GAM_DONE XE_REG(0xcf68)
> +
> +#define SAMPLER_INSTDONE XE_REG_MCR(0xe160)
> +#define ROW_INSTDONE XE_REG_MCR(0xe164)
> +
> #define HALF_SLICE_CHICKEN5 XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED)
> #define DISABLE_SAMPLE_G_PERFORMANCE REG_BIT(0)
>
> @@ -482,6 +500,8 @@
> #define GT_CS_MASTER_ERROR_INTERRUPT REG_BIT(3)
> #define GT_RENDER_USER_INTERRUPT REG_BIT(0)
>
> +#define SFC_DONE(n) XE_REG(0x1cc000 + (n) * 0x1000)
> +
> #define PVC_GT0_PACKAGE_ENERGY_STATUS XE_REG(0x281004)
> #define PVC_GT0_PACKAGE_RAPL_LIMIT XE_REG(0x281008)
> #define PVC_GT0_PACKAGE_POWER_SKU_UNIT XE_REG(0x281068)
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 235d27b17ff9..4f9257075905 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -17,6 +17,7 @@
> #include "xe_force_wake.h"
> #include "xe_gt.h"
> #include "xe_guc_ads.h"
> +#include "xe_guc_capture.h"
> #include "xe_guc_ct.h"
> #include "xe_guc_hwconfig.h"
> #include "xe_guc_log.h"
> @@ -277,6 +278,12 @@ int xe_guc_init(struct xe_guc *guc)
> if (ret)
> goto out;
>
> +#if IS_ENABLED(CONFIG_DRM_XE_CAPTURE_ERROR)
> + ret = xe_guc_capture_init(guc);
> + if (ret)
> + goto out;
> +#endif
No. Please make xe_guc_capture_init() a static inline no-op stub in the
header for CONFIG_DRM_XE_CAPTURE_ERROR=n. Examples of this are literally
everywhere in kernel. Do not add conditional compilation in code.
> +
> ret = xe_guc_ads_init(&guc->ads);
> if (ret)
> goto out;
> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c
> new file mode 100644
> index 000000000000..0870bfd1b88d
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_capture.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021-2022 Intel Corporation
It's 2024.
> + */
> +
> +#include <linux/types.h>
> +
> +#include <drm/drm_print.h>
> +
> +#include "abi/guc_actions_abi.h"
> +#include "regs/xe_regs.h"
> +#include "regs/xe_engine_regs.h"
> +#include "regs/xe_gt_regs.h"
> +#include "regs/xe_guc_regs.h"
> +
> +#include "xe_bo.h"
> +#include "xe_device.h"
> +#include "xe_exec_queue_types.h"
> +#include "xe_hw_engine_types.h"
> +#include "xe_gt.h"
> +#include "xe_gt_printk.h"
> +#include "xe_guc.h"
> +#include "xe_guc_capture.h"
> +#include "xe_guc_ct.h"
> +
> +#include "xe_guc_log.h"
> +#include "xe_gt_mcr.h"
> +#include "xe_guc_submit.h"
> +#include "xe_macros.h"
> +#include "xe_map.h"
Superfluous newlines, not sorted.
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_CAPTURE_ERROR)
Please make the build of the entire file conditional in Makefile
instead.
> +
> +/*
> + * Define all device tables of GuC error capture register lists
> + * NOTE: For engine-registers, GuC only needs the register offsets
> + * from the engine-mmio-base
> + */
> +#define COMMON_XELP_BASE_GLOBAL \
> + { FORCEWAKE_GT, 0, 0, "FORCEWAKE" }, \
> + { FAULT_TLB_DATA0, 0, 0, "FAULT_TLB_DATA0" }, \
> + { FAULT_TLB_DATA1, 0, 0, "FAULT_TLB_DATA1" }, \
> + { AUX_ERR_DBG, 0, 0, "AUX_ERR_DBG" }, \
> + { GAM_DONE, 0, 0, "GAM_DONE" }, \
> + { RING_FAULT_REG, 0, 0, "FAULT_REG" }
> +
> +#define COMMON_BASE_ENGINE_INSTANCE \
> + { RING_PSMI_CTL(0), 0, 0, "RC PSMI" }, \
> + { RING_ESR(0), 0, 0, "ESR" }, \
> + { RING_EMR(0), 0, 0, "EMR" }, \
> + { RING_EIR(0), 0, 0, "EIR" }, \
> + { RING_EXECLIST_STATUS_HI(0), 0, 0, "RING_EXECLIST_STATUS_HI" }, \
> + { RING_EXECLIST_STATUS_LO(0), 0, 0, "RING_EXECLIST_STATUS_LO" }, \
> + { RING_DMA_FADD(0), 0, 0, "RING_DMA_FADD_LDW" }, \
> + { RING_DMA_FADD_UDW(0), 0, 0, "RING_DMA_FADD_UDW" }, \
> + { RING_IPEIR(0), 0, 0, "IPEIR" }, \
> + { RING_IPEHR(0), 0, 0, "IPEHR" }, \
> + { RING_INSTPS(0), 0, 0, "INSTPS" }, \
> + { RING_BBADDR(0), 0, 0, "RING_BBADDR_LOW32" }, \
> + { RING_BBADDR_UDW(0), 0, 0, "RING_BBADDR_UP32" }, \
> + { RING_BBSTATE(0), 0, 0, "BB_STATE" }, \
> + { CCID(0), 0, 0, "CCID" }, \
> + { RING_ACTHD(0), 0, 0, "ACTHD_LDW" }, \
> + { RING_ACTHD_UDW(0), 0, 0, "ACTHD_UDW" }, \
> + { INSTPM(0), 0, 0, "INSTPM" }, \
> + { RING_INSTDONE(0), 0, 0, "INSTDONE" }, \
> + { RING_NOPID(0), 0, 0, "RING_NOPID" }, \
> + { RING_START(0), 0, 0, "START" }, \
> + { RING_HEAD(0), 0, 0, "HEAD" }, \
> + { RING_TAIL(0), 0, 0, "TAIL" }, \
> + { RING_CTL(0), 0, 0, "CTL" }, \
> + { RING_MI_MODE(0), 0, 0, "MODE" }, \
> + { RING_CONTEXT_CONTROL(0), 0, 0, "RING_CONTEXT_CONTROL" }, \
> + { RING_HWS_PGA(0), 0, 0, "HWS" }, \
> + { RING_MODE(0), 0, 0, "GFX_MODE" }, \
> + { RING_PDP_LDW(0, 0), 0, 0, "PDP0_LDW" }, \
> + { RING_PDP_UDW(0, 0), 0, 0, "PDP0_UDW" }, \
> + { RING_PDP_LDW(0, 1), 0, 0, "PDP1_LDW" }, \
> + { RING_PDP_UDW(0, 1), 0, 0, "PDP1_UDW" }, \
> + { RING_PDP_LDW(0, 2), 0, 0, "PDP2_LDW" }, \
> + { RING_PDP_UDW(0, 2), 0, 0, "PDP2_UDW" }, \
> + { RING_PDP_LDW(0, 3), 0, 0, "PDP3_LDW" }, \
> + { RING_PDP_UDW(0, 3), 0, 0, "PDP3_UDW" }
> +
> +#define COMMON_XELP_BASE_RENDER \
> + { SC_INSTDONE, 0, 0, "SC_INSTDONE" }, \
> + { SC_INSTDONE_EXTRA, 0, 0, "SC_INSTDONE_EXTRA" }, \
> + { SC_INSTDONE_EXTRA2, 0, 0, "SC_INSTDONE_EXTRA2" }
> +
> +#define COMMON_XELP_BASE_VEC \
> + { SFC_DONE(0), 0, 0, "SFC_DONE[0]" }, \
> + { SFC_DONE(1), 0, 0, "SFC_DONE[1]" }, \
> + { SFC_DONE(2), 0, 0, "SFC_DONE[2]" }, \
> + { SFC_DONE(3), 0, 0, "SFC_DONE[3]" }
> +
> +int xe_guc_capture_init(struct xe_guc *guc)
> +{
> + return 0;
> +}
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.h b/drivers/gpu/drm/xe/xe_guc_capture.h
> new file mode 100644
> index 000000000000..35fedc78d9e2
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_capture.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021-2021 Intel Corporation
2024.
> + */
> +
> +#ifndef _XE_GUC_CAPTURE_H
> +#define _XE_GUC_CAPTURE_H
> +
> +#include <linux/types.h>
> +#include "xe_exec_queue_types.h"
You don't need either of those includes.
> +
> +struct xe_hw_engine;
> +
Superfluous newline.
> +struct xe_gt;
> +struct xe_guc;
> +
> +int xe_guc_capture_init(struct xe_guc *guc);
> +
> +#endif /* _XE_GUC_CAPTURE_H */
--
Jani Nikula, Intel
next prev parent reply other threads:[~2024-01-17 8:23 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-16 23:15 [PATCH v2] drm/xe/guc: Add GuC based register capture for error capture Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Add register defines for GuC based register capture Zhanjun Dong
2024-01-17 8:23 ` Jani Nikula [this message]
2024-01-17 22:15 ` Dong, Zhanjun
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Expose dss per group for GuC error capture Zhanjun Dong
2024-01-17 8:26 ` Jani Nikula
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Update GuC ADS size for " Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Add XE_LP steered register lists Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Add capture size check in GuC log buffer Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Check sizing of guc_capture output Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Extract GuC error capture lists on G2H notification Zhanjun Dong
2024-01-16 23:15 ` [PATCH v2] drm/xe/guc: Pre-allocate output nodes for extraction Zhanjun Dong
2024-01-16 23:16 ` [PATCH v2] drm/xe/guc: Plumb GuC-capture into dev coredump Zhanjun Dong
2024-01-16 23:19 ` ✗ CI.Patch_applied: failure for drm/xe/guc: Extract GuC error capture lists on G2H notification Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ply0qsg0.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=zhanjun.dong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.