From: Matthew Brost <matthew.brost@intel.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v2] drm/xe: Document GT statistics
Date: Fri, 27 Mar 2026 14:02:39 -0700 [thread overview]
Message-ID: <acbwbyspL0KAIWCf@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <20260327202749.222794-1-francois.dugast@intel.com>
On Fri, Mar 27, 2026 at 09:27:49PM +0100, Francois Dugast wrote:
> In the context of porting applications to SVM, the Xe GT statistics are
> used by application developers to validate expected behavior such as
> proper alignment, page fault count and migrations. As those statistics
> are made for kernel developers, they assume good understanding of driver
> internals, which is not always the case on the application side.
> Therefore, this commit documents the usage of GT statistics and clarifies
> the meaning of identifiers which correspond to the values exposed via
> debugfs. Documentation is close to identifiers declaration to make it
> easier to maintain when adding new entries in the future.
>
> v2: Fix page reclaim list (PRL) entries (Matthew Brost)
>
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
> Documentation/gpu/xe/index.rst | 1 +
> Documentation/gpu/xe/xe_gt_stats.rst | 11 +++
> drivers/gpu/drm/xe/xe_gt_stats.c | 41 +++++++++
> drivers/gpu/drm/xe/xe_gt_stats_types.h | 118 +++++++++++++++++++++++++
> 4 files changed, 171 insertions(+)
> create mode 100644 Documentation/gpu/xe/xe_gt_stats.rst
>
> diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst
> index bc432c95d1a3..874ffcb6da3a 100644
> --- a/Documentation/gpu/xe/index.rst
> +++ b/Documentation/gpu/xe/index.rst
> @@ -29,3 +29,4 @@ DG2, etc is provided to prototype the driver.
> xe_device
> xe-drm-usage-stats.rst
> xe_configfs
> + xe_gt_stats
> diff --git a/Documentation/gpu/xe/xe_gt_stats.rst b/Documentation/gpu/xe/xe_gt_stats.rst
> new file mode 100644
> index 000000000000..5ff806abaddb
> --- /dev/null
> +++ b/Documentation/gpu/xe/xe_gt_stats.rst
> @@ -0,0 +1,11 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +================
> +Xe GT Statistics
> +================
> +
> +.. kernel-doc:: drivers/gpu/drm/xe/xe_gt_stats.c
> + :doc: Xe GT Statistics
> +
> +.. kernel-doc:: drivers/gpu/drm/xe/xe_gt_stats_types.h
> + :internal:
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> index 59b3b23a54c8..789397514f3e 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> @@ -9,6 +9,47 @@
> #include "xe_device.h"
> #include "xe_gt_stats.h"
>
> +/**
> + * DOC: Xe GT Statistics
> + *
> + * Overview
> + * ========
> + *
> + * The Xe driver exposes per-GT statistics through the debugfs filesystem at::
> + *
> + * /sys/kernel/debug/dri/<device>/gt<id>/stats
> + *
> + * This interface requires the kernel to be built with ``CONFIG_DEBUG_FS=y``.
> + *
> + * Reading statistics
> + * ==================
> + *
> + * Reading the file prints all available statistics, one per line, in
> + * ``name: value`` format::
> + *
> + * $ cat /sys/kernel/debug/dri/0/gt0/stats
> + * svm_pagefault_count: 0
> + * tlb_inval_count: 1234
> + * ...
> + *
> + * All values are 64-bit unsigned integers aggregated across all CPUs.
> + * Counters accumulate since the driver was loaded or since the last explicit
> + * reset. Timing counters use microseconds as their unit; data volume counters
> + * use KiB.
> + *
> + * Resetting statistics
> + * ====================
> + *
> + * Writing a boolean true value to the file resets all counters to zero::
> + *
> + * echo 1 > /sys/kernel/debug/dri/0/gt0/stats
> + *
> + * Any value accepted by ``kstrtobool()`` (e.g. ``1``, ``y``, ``yes``,
> + * ``on``) triggers the reset. Resetting while the GPU is active may yield
> + * unpredictable intermediate values; it is recommended to reset only when
> + * the GPU is idle.
> + */
> +
> static void xe_gt_stats_fini(struct drm_device *drm, void *arg)
> {
> struct xe_gt *gt = arg;
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> index 081c787ddcb6..425491bed6c4 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> @@ -8,6 +8,124 @@
>
> #include <linux/types.h>
>
> +/**
> + * enum xe_gt_stats_id - GT statistics identifiers
> + * @XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT: Total SVM page faults handled.
> + * @XE_GT_STATS_ID_TLB_INVAL: Total GPU Translation Lookaside Buffer (TLB)
> + * invalidations issued.
> + * @XE_GT_STATS_ID_SVM_TLB_INVAL_COUNT: TLB invalidations issued during SVM
> + * page-fault handling.
> + * @XE_GT_STATS_ID_SVM_TLB_INVAL_US: Cumulative time (µs) waiting for TLB
> + * invalidations during SVM page-fault handling.
> + *
> + * @XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT: Buffer-object (non-SVM) page faults
> + * handled.
> + * @XE_GT_STATS_ID_VMA_PAGEFAULT_KB: Size (KiB) of VMAs involved in
> + * buffer-object page fault handling.
> + * @XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT: GPU prefetch faults for
> + * addresses with no valid backing.
> + *
> + * @XE_GT_STATS_ID_SVM_4K_PAGEFAULT_COUNT: SVM page faults resolved by
> + * mapping 4K pages.
> + * @XE_GT_STATS_ID_SVM_64K_PAGEFAULT_COUNT: SVM page faults resolved by
> + * mapping 64K pages.
> + * @XE_GT_STATS_ID_SVM_2M_PAGEFAULT_COUNT: SVM page faults resolved by
> + * mapping 2M pages.
> + * @XE_GT_STATS_ID_SVM_4K_VALID_PAGEFAULT_COUNT: Valid SVM page faults
> + * at 4K page size, where the GPU mapping was already valid — resolved without
> + * creating new mappings.
> + * @XE_GT_STATS_ID_SVM_64K_VALID_PAGEFAULT_COUNT: Valid SVM page faults at 64K
> + * page size.
> + * @XE_GT_STATS_ID_SVM_2M_VALID_PAGEFAULT_COUNT: Valid SVM page faults at 2M
> + * page size.
> + * @XE_GT_STATS_ID_SVM_4K_PAGEFAULT_US: Cumulative time (µs) handling 4K SVM
> + * page faults.
> + * @XE_GT_STATS_ID_SVM_64K_PAGEFAULT_US: Cumulative time (µs) handling 64K
> + * SVM page faults.
> + * @XE_GT_STATS_ID_SVM_2M_PAGEFAULT_US: Cumulative time (µs) handling 2M SVM
> + * page faults.
> + *
> + * @XE_GT_STATS_ID_SVM_4K_MIGRATE_COUNT: 4K pages moved from CPU to device
> + * memory.
> + * @XE_GT_STATS_ID_SVM_64K_MIGRATE_COUNT: 64K pages moved from CPU to device
> + * memory.
> + * @XE_GT_STATS_ID_SVM_2M_MIGRATE_COUNT: 2M pages moved from CPU to device
> + * memory.
> + * @XE_GT_STATS_ID_SVM_4K_MIGRATE_US: Cumulative time (µs) moving 4K pages
> + * from CPU to device memory.
> + * @XE_GT_STATS_ID_SVM_64K_MIGRATE_US: Cumulative time (µs) moving 64K pages
> + * from CPU to device memory.
> + * @XE_GT_STATS_ID_SVM_2M_MIGRATE_US: Cumulative time (µs) moving 2M pages
> + * from CPU to device memory.
> + *
> + * @XE_GT_STATS_ID_SVM_DEVICE_COPY_US: Cumulative time (µs) for memory copies to
> + * device, across all page sizes.
> + * @XE_GT_STATS_ID_SVM_4K_DEVICE_COPY_US: Cumulative time (µs) for memory copies
> + * of 4K pages to device.
> + * @XE_GT_STATS_ID_SVM_64K_DEVICE_COPY_US: Cumulative time (µs) for memory
> + * copies of 64K pages to device.
> + * @XE_GT_STATS_ID_SVM_2M_DEVICE_COPY_US: Cumulative time (µs) for memory copies
> + * of 2M pages to device.
> + * @XE_GT_STATS_ID_SVM_CPU_COPY_US: Cumulative time (µs) for memory copies to
> + * CPU, across all page sizes.
> + * @XE_GT_STATS_ID_SVM_4K_CPU_COPY_US: Cumulative time (µs) for memory copies of
> + * 4K pages to CPU.
> + * @XE_GT_STATS_ID_SVM_64K_CPU_COPY_US: Cumulative time (µs) for memory copies
> + * of 64K pages to CPU.
> + * @XE_GT_STATS_ID_SVM_2M_CPU_COPY_US: Cumulative time (µs) for memory copies of
> + * 2M pages to CPU.
> + * @XE_GT_STATS_ID_SVM_DEVICE_COPY_KB: Data (KiB) copied to device across all
> + * page sizes.
> + * @XE_GT_STATS_ID_SVM_4K_DEVICE_COPY_KB: Data (KiB) copied to device for 4K
> + * pages.
> + * @XE_GT_STATS_ID_SVM_64K_DEVICE_COPY_KB: Data (KiB) copied to device for
> + * 64K pages.
> + * @XE_GT_STATS_ID_SVM_2M_DEVICE_COPY_KB: Data (KiB) copied to device for 2M
> + * pages.
> + * @XE_GT_STATS_ID_SVM_CPU_COPY_KB: Data (KiB) copied to CPU across all page
> + * sizes.
> + * @XE_GT_STATS_ID_SVM_4K_CPU_COPY_KB: Data (KiB) copied to CPU for 4K pages.
> + * @XE_GT_STATS_ID_SVM_64K_CPU_COPY_KB: Data (KiB) copied to CPU for 64K pages.
> + * @XE_GT_STATS_ID_SVM_2M_CPU_COPY_KB: Data (KiB) copied to CPU for 2M pages.
> + *
> + * @XE_GT_STATS_ID_SVM_4K_GET_PAGES_US: Cumulative time (µs) getting CPU
> + * memory pages for GPU access at 4K page size.
> + * @XE_GT_STATS_ID_SVM_64K_GET_PAGES_US: Cumulative time (µs) getting CPU
> + * memory pages for GPU access at 64K page size.
> + * @XE_GT_STATS_ID_SVM_2M_GET_PAGES_US: Cumulative time (µs) getting CPU
> + * memory pages for GPU access at 2M page size.
> + * @XE_GT_STATS_ID_SVM_4K_BIND_US: Cumulative time (µs) binding 4K pages
> + * into the GPU page table.
> + * @XE_GT_STATS_ID_SVM_64K_BIND_US: Cumulative time (µs) binding 64K pages
> + * into the GPU page table.
> + * @XE_GT_STATS_ID_SVM_2M_BIND_US: Cumulative time (µs) binding 2M pages
> + * into the GPU page table.
> + *
> + * @XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_COUNT: Times the
> + * scheduler preempted a long-running (LR) GPU exec queue.
> + * @XE_GT_STATS_ID_HW_ENGINE_GROUP_SKIP_LR_QUEUE_COUNT: Times the scheduler
> + * skipped suspend because the system was idle.
> + * @XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_COUNT: Times the driver
> + * stalled waiting for prior GPU work to complete before scheduling more.
> + * @XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_US: Cumulative time
> + * (µs) spent preempting long-running (LR) GPU exec queues.
> + * @XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_US: Cumulative time (µs)
> + * stalled waiting for prior GPU work to complete.
> + *
> + * @XE_GT_STATS_ID_PRL_4K_ENTRY_COUNT: 4K-page entries from the page reclaim
> + * list that were processed.
> + * @XE_GT_STATS_ID_PRL_64K_ENTRY_COUNT: 64K-page entries from the page reclaim
> + * list that were processed.
> + * @XE_GT_STATS_ID_PRL_2M_ENTRY_COUNT: 2M-page entries from the page reclaim
> + * list that were processed.
> + * @XE_GT_STATS_ID_PRL_ISSUED_COUNT: Times a page reclamation was issued.
> + * @XE_GT_STATS_ID_PRL_ABORTED_COUNT: Times the page reclaim process was
> + * aborted.
> + *
> + * @__XE_GT_STATS_NUM_IDS: Number of valid IDs; not a real counter.
> + *
> + * See Documentation/gpu/xe/xe_gt_stats.rst.
> + */
> enum xe_gt_stats_id {
> XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
> XE_GT_STATS_ID_TLB_INVAL,
> --
> 2.43.0
>
next prev parent reply other threads:[~2026-03-27 21:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 20:27 [PATCH v2] drm/xe: Document GT statistics Francois Dugast
2026-03-27 21:02 ` Matthew Brost [this message]
2026-03-27 21:16 ` ✗ CI.checkpatch: warning for drm/xe: Document GT statistics (rev2) Patchwork
2026-03-27 21:18 ` ✓ CI.KUnit: success " Patchwork
2026-03-27 22:11 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-28 15:15 ` ✓ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acbwbyspL0KAIWCf@gsse-cloud1.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox