Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: simona.vetter@ffwll.ch, matthew.brost@intel.com,
	christian.koenig@amd.com, thomas.hellstrom@linux.intel.com,
	joonas.lahtinen@linux.intel.com, christoph.manszewski@intel.com,
	rodrigo.vivi@intel.com, andrzej.hajda@intel.com,
	matthew.auld@intel.com, maciej.patelczyk@intel.com,
	gwan-gyeong.mun@intel.com,
	Mika Kuoppala <mika.kuoppala@linux.intel.com>
Subject: [PATCH 22/22] drm/xe/eudebug: Enable EU pagefault handling
Date: Mon, 23 Feb 2026 16:03:17 +0200	[thread overview]
Message-ID: <20260223140318.1822138-23-mika.kuoppala@linux.intel.com> (raw)
In-Reply-To: <20260223140318.1822138-1-mika.kuoppala@linux.intel.com>

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
access will halt the corresponding EUs. To solve this problem, enable
EU pagefault handling functionality, which allows to unhalt pagefaulted
eu threads and to EU debugger to get inform about the eu attentions state
of EU threads during execution.

If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event
after handling the pagefault.

The pagefault handling is a mechanism that allows a stalled EU thread to
enter SIP mode by installing a temporal null page to the page table entry
where the pagefault happened.

A brief description of the page fault handling mechanism flow between KMD
and the eu thread is as follows

(1) eu thread accesses unallocated address
(2) pagefault happens and eu thread stalls
(3) XE kmd set an force eu thread exception to allow the running eu thread
    to enter SIP mode (kmd set ForceException / ForceExternalHalt bit of
    TD_CTL register)
    Not stalled (none-pagefaulted) eu threads enter SIP mode
(4) XE kmd installs temporal null page to the pagetable entry of the
    address where pagefault happened.
(5) XE kmd replies pagefault successful message to GUC
(6) stalled eu thread resumes as per pagefault condition has resolved
(7) resumed eu thread enters SIP mode due to force exception set by (3)
(8) adapted to consumer/produced pagefaults

As designed this feature to only work when eudbug is enabled, it should
have no impact to regular recoverable pagefault code path.

v2: - pf->q holds the vm ref so drop it (Mika)
    - streamline uapi (Mika)
    - cleanup the pagefault through producer if (Mika)

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pagefault.c   |  8 +++++++
 drivers/gpu/drm/xe/xe_pagefault.c       | 31 ++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_pagefault_types.h |  9 +++++++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
index d48f6ed103bb..6adf3bf73b1c 100644
--- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
@@ -8,6 +8,7 @@
 #include "xe_guc_ct.h"
 #include "xe_guc_pagefault.h"
 #include "xe_pagefault.h"
+#include "xe_eudebug_pagefault.h"
 
 static void guc_ack_fault(struct xe_pagefault *pf, int err)
 {
@@ -37,8 +38,15 @@ static void guc_ack_fault(struct xe_pagefault *pf, int err)
 	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
 }
 
+static void guc_cleanup_fault(struct xe_pagefault *pf, int err)
+{
+	xe_eudebug_pagefault_service(pf);
+	xe_eudebug_pagefault_destroy(pf, 0);
+}
+
 static const struct xe_pagefault_ops guc_pagefault_ops = {
 	.ack_fault = guc_ack_fault,
+	.cleanup_fault = guc_cleanup_fault,
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 72f589fd2b64..9dcd854e99f9 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -10,6 +10,7 @@
 
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_eudebug_pagefault.h"
 #include "xe_gt_printk.h"
 #include "xe_gt_types.h"
 #include "xe_gt_stats.h"
@@ -171,6 +172,8 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
+	xe_eudebug_pagefault_create(vm, pf);
+
 	/*
 	 * TODO: Change to read lock? Using write lock for simplicity.
 	 */
@@ -184,9 +187,28 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	vma = xe_vm_find_vma_by_addr(vm, pf->consumer.page_addr);
 	if (!vma) {
 		err = -EINVAL;
-		goto unlock_vm;
+		vma = xe_eudebug_create_vma(vm, pf);
+		if (IS_ERR(vma)) {
+			err = PTR_ERR(vma);
+			vma = NULL;
+		}
 	}
 
+	if (vma) {
+		/*
+		 * When creating an instance of eudebug_pagefault, there was
+		 * no vma containing the ppgtt address where the pagefault occurred,
+		 * but when reacquiring vm->lock, there is.
+		 * During not aquiring the vm->lock from this context,
+		 * but vma corresponding to the address where the pagefault occurred
+		 * in another context has allocated.
+		 */
+		err = 0;
+	}
+
+	if (err)
+		goto unlock_vm;
+
 	atomic = xe_pagefault_access_is_atomic(pf->consumer.access_type);
 
 	if (xe_vma_is_cpu_addr_mirror(vma))
@@ -198,6 +220,10 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 unlock_vm:
 	if (!err)
 		vm->usm.last_fault_vma = vma;
+
+	if (err)
+		xe_eudebug_pagefault_destroy(pf, err);
+
 	up_write(&vm->lock);
 	xe_vm_put(vm);
 
@@ -268,6 +294,9 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 
 		pf.producer.ops->ack_fault(&pf, err);
 
+		if (pf.producer.ops->cleanup_fault)
+			pf.producer.ops->cleanup_fault(&pf, err);
+
 		if (time_after(jiffies, threshold)) {
 			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
 			break;
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index 2bee858da597..9d2d29d35a4b 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -43,6 +43,15 @@ struct xe_pagefault_ops {
 	 * sends the result to the HW/FW interface.
 	 */
 	void (*ack_fault)(struct xe_pagefault *pf, int err);
+
+	/**
+	 * @cleanup_fault: Cleanup for producer, if any
+	 * @pf: Page fault
+	 * @err: Error state of fault
+	 *
+	 * Page fault producer received cleanup request from consumer
+	 */
+	void (*cleanup_fault)(struct xe_pagefault *pf, int err);
 };
 
 /**
-- 
2.43.0


  parent reply	other threads:[~2026-02-23 14:05 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 14:02 [PATCH 00/22] Intel Xe GPU Debug Support (eudebug) v7 Mika Kuoppala
2026-02-23 14:02 ` [PATCH 01/22] drm/xe/eudebug: Introduce eudebug interface Mika Kuoppala
2026-02-23 14:02 ` [PATCH 02/22] drm/xe/eudebug: Add documentation Mika Kuoppala
2026-02-23 14:02 ` [PATCH 03/22] drm/xe/eudebug: Add connection establishment documentation Mika Kuoppala
2026-02-23 14:02 ` [PATCH 04/22] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
2026-02-23 14:03 ` [PATCH 05/22] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
2026-02-23 14:03 ` [PATCH 06/22] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
2026-02-23 14:03 ` [PATCH 07/22] drm/xe/eudebug: Mark guc contexts as debuggable Mika Kuoppala
2026-02-23 14:03 ` [PATCH 08/22] drm/xe: Introduce ADD_DEBUG_DATA and REMOVE_DEBUG_DATA vm bind ops Mika Kuoppala
2026-02-23 14:03 ` [PATCH 09/22] drm/xe/eudebug: Introduce vm bind and vm bind debug data events Mika Kuoppala
2026-02-23 14:03 ` [PATCH 10/22] drm/xe/eudebug: Add UFENCE events with acks Mika Kuoppala
2026-02-23 14:03 ` [PATCH 11/22] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
2026-02-23 14:03 ` [PATCH 12/22] drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
2026-02-23 14:03 ` [PATCH 13/22] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
2026-02-23 14:03 ` [PATCH 14/22] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
2026-02-23 14:03 ` [PATCH 15/22] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
2026-02-23 14:03 ` [PATCH 16/22] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
2026-02-23 14:03 ` [PATCH 17/22] drm/xe: Implement SR-IOV and eudebug exclusivity Mika Kuoppala
2026-02-23 14:03 ` [PATCH 18/22] drm/xe: Add xe_client_debugfs and introduce debug_data file Mika Kuoppala
2026-02-23 14:03 ` [PATCH 19/22] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
2026-02-23 14:03 ` [PATCH 20/22] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
2026-02-23 14:03 ` [PATCH 21/22] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
2026-02-23 19:08   ` Matthew Brost
2026-02-27 22:10     ` Gwan-gyeong Mun
2026-02-28  0:36       ` Matthew Brost
2026-02-23 14:03 ` Mika Kuoppala [this message]
2026-02-23 18:41   ` [PATCH 22/22] drm/xe/eudebug: Enable EU pagefault handling Matthew Brost
2026-02-27 22:11     ` Gwan-gyeong Mun
2026-02-27 23:11   ` Gustavo Sousa
2026-02-28  6:49     ` Gwan-gyeong Mun
2026-02-23 15:14 ` ✗ CI.checkpatch: warning for Intel Xe GPU Debug Support (eudebug) v7 Patchwork
2026-02-23 15:16 ` ✓ CI.KUnit: success " Patchwork
2026-02-23 15:31 ` ✗ CI.checksparse: warning " Patchwork
2026-02-23 15:51 ` ✓ Xe.CI.BAT: success " Patchwork
2026-02-24  8:42 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223140318.1822138-23-mika.kuoppala@linux.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=christoph.manszewski@intel.com \
    --cc=gwan-gyeong.mun@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=maciej.patelczyk@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox