From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: simona.vetter@ffwll.ch, matthew.brost@intel.com,
christian.koenig@amd.com, thomas.hellstrom@linux.intel.com,
joonas.lahtinen@linux.intel.com, christoph.manszewski@intel.com,
rodrigo.vivi@intel.com, andrzej.hajda@intel.com,
matthew.auld@intel.com, maciej.patelczyk@intel.com,
gwan-gyeong.mun@intel.com,
Mika Kuoppala <mika.kuoppala@linux.intel.com>
Subject: [PATCH 20/20] drm/xe/eudebug: Enable EU pagefault handling
Date: Tue, 2 Dec 2025 15:52:39 +0200 [thread overview]
Message-ID: <20251202135241.880267-21-mika.kuoppala@linux.intel.com> (raw)
In-Reply-To: <20251202135241.880267-1-mika.kuoppala@linux.intel.com>
From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
access will halt the corresponding EUs. To solve this problem, enable
EU pagefault handling functionality, which allows to unhalt pagefaulted
eu threads and to EU debugger to get inform about the eu attentions state
of EU threads during execution.
If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event
after handling the pagefault.
The pagefault handling is a mechanism that allows a stalled EU thread to
enter SIP mode by installing a temporal null page to the page table entry
where the pagefault happened.
A brief description of the page fault handling mechanism flow between KMD
and the eu thread is as follows
(1) eu thread accesses unallocated address
(2) pagefault happens and eu thread stalls
(3) XE kmd set an force eu thread exception to allow the running eu thread
to enter SIP mode (kmd set ForceException / ForceExternalHalt bit of
TD_CTL register)
Not stalled (none-pagefaulted) eu threads enter SIP mode
(4) XE kmd installs temporal null page to the pagetable entry of the
address where pagefault happened.
(5) XE kmd replies pagefault successful message to GUC
(6) stalled eu thread resumes as per pagefault condition has resolved
(7) resumed eu thread enters SIP mode due to force exception set by (3)
(8) adapted to consumer/produced pagefaults
As designed this feature to only work when eudbug is enabled, it should
have no impact to regular recoverable pagefault code path.
v2: - pf->q holds the vm ref so drop it (Mika)
- streamline uapi (Mika)
- cleanup the pagefault through producer if (Mika)
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/xe/xe_guc_pagefault.c | 8 +++++++
drivers/gpu/drm/xe/xe_pagefault.c | 31 ++++++++++++++++++++++++-
drivers/gpu/drm/xe/xe_pagefault_types.h | 9 +++++++
3 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
index 719a18187a31..cd41023ebef9 100644
--- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
@@ -8,6 +8,7 @@
#include "xe_guc_ct.h"
#include "xe_guc_pagefault.h"
#include "xe_pagefault.h"
+#include "xe_eudebug_pagefault.h"
static void guc_ack_fault(struct xe_pagefault *pf, int err)
{
@@ -36,8 +37,15 @@ static void guc_ack_fault(struct xe_pagefault *pf, int err)
xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
}
+static void guc_cleanup_fault(struct xe_pagefault *pf, int err)
+{
+ xe_eudebug_pagefault_service(pf);
+ xe_eudebug_pagefault_destroy(pf, 0);
+}
+
static const struct xe_pagefault_ops guc_pagefault_ops = {
.ack_fault = guc_ack_fault,
+ .cleanup_fault = guc_cleanup_fault,
};
/**
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index afb06598b6e1..369749641f37 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -10,6 +10,7 @@
#include "xe_bo.h"
#include "xe_device.h"
+#include "xe_eudebug_pagefault.h"
#include "xe_gt_printk.h"
#include "xe_gt_types.h"
#include "xe_gt_stats.h"
@@ -171,6 +172,8 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
if (IS_ERR(vm))
return PTR_ERR(vm);
+ xe_eudebug_pagefault_create(vm, pf);
+
/*
* TODO: Change to read lock? Using write lock for simplicity.
*/
@@ -184,9 +187,28 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
vma = xe_vm_find_vma_by_addr(vm, pf->consumer.page_addr);
if (!vma) {
err = -EINVAL;
- goto unlock_vm;
+ vma = xe_eudebug_create_vma(vm, pf);
+ if (IS_ERR(vma)) {
+ err = PTR_ERR(vma);
+ vma = NULL;
+ }
}
+ if (vma) {
+ /*
+ * When creating an instance of eudebug_pagefault, there was
+ * no vma containing the ppgtt address where the pagefault occurred,
+ * but when reacquiring vm->lock, there is.
+ * During not aquiring the vm->lock from this context,
+ * but vma corresponding to the address where the pagefault occurred
+ * in another context has allocated.
+ */
+ err = 0;
+ }
+
+ if (err)
+ goto unlock_vm;
+
atomic = xe_pagefault_access_is_atomic(pf->consumer.access_type);
if (xe_vma_is_cpu_addr_mirror(vma))
@@ -198,6 +220,10 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
unlock_vm:
if (!err)
vm->usm.last_fault_vma = vma;
+
+ if (err)
+ xe_eudebug_pagefault_destroy(pf, err);
+
up_write(&vm->lock);
xe_vm_put(vm);
@@ -266,6 +292,9 @@ static void xe_pagefault_queue_work(struct work_struct *w)
pf.producer.ops->ack_fault(&pf, err);
+ if (pf.producer.ops->cleanup_fault)
+ pf.producer.ops->cleanup_fault(&pf, err);
+
if (time_after(jiffies, threshold)) {
queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
break;
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index c89d7fb698e0..ce82e39015ae 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -43,6 +43,15 @@ struct xe_pagefault_ops {
* sends the result to the HW/FW interface.
*/
void (*ack_fault)(struct xe_pagefault *pf, int err);
+
+ /**
+ * @cleanup_fault: Cleanup for producer, if any
+ * @pf: Page fault
+ * @err: Error state of fault
+ *
+ * Page fault producer received cleanup request from consumer
+ */
+ void (*cleanup_fault)(struct xe_pagefault *pf, int err);
};
/**
--
2.43.0
next prev parent reply other threads:[~2025-12-02 13:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-02 13:52 [PATCH 00/20] Intel Xe GPU Debug Support (eudebug) v6 Mika Kuoppala
2025-12-02 13:52 ` [PATCH 01/20] drm/xe/eudebug: Introduce eudebug interface Mika Kuoppala
2025-12-10 16:48 ` [PATCH 01/21] " Mika Kuoppala
2025-12-02 13:52 ` [PATCH 02/20] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
2025-12-02 13:52 ` [PATCH 03/20] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
2025-12-02 13:52 ` [PATCH 04/20] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
2025-12-02 13:52 ` [PATCH 05/20] drm/xe/eudebug: Mark guc contexts as debuggable Mika Kuoppala
2025-12-06 2:03 ` Daniele Ceraolo Spurio
2025-12-02 13:52 ` [PATCH 06/20] drm/xe: Introduce ADD_DEBUG_DATA and REMOVE_DEBUG_DATA vm bind ops Mika Kuoppala
2025-12-02 13:52 ` [PATCH 07/20] drm/xe/eudebug: Introduce vm bind and vm bind debug data events Mika Kuoppala
2025-12-02 13:52 ` [PATCH 08/20] drm/xe/eudebug: Add UFENCE events with acks Mika Kuoppala
2025-12-02 13:52 ` [PATCH 09/20] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
2025-12-02 13:52 ` [PATCH 10/20] drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
2025-12-02 13:52 ` [PATCH 11/20] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
2025-12-02 13:52 ` [PATCH 12/20] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
2025-12-02 13:52 ` [PATCH 13/20] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
2025-12-02 13:52 ` [PATCH 14/20] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
2025-12-02 13:52 ` [PATCH 15/20] drm/xe: Implement SR-IOV and eudebug exclusivity Mika Kuoppala
2025-12-02 13:52 ` [PATCH 16/20] drm/xe: Add xe_client_debugfs and introduce debug_data file Mika Kuoppala
2025-12-03 9:07 ` Mika Kuoppala
2025-12-02 13:52 ` [PATCH 17/20] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
2025-12-02 13:52 ` [PATCH 18/20] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
2025-12-02 13:52 ` [PATCH 19/20] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
2025-12-02 13:52 ` Mika Kuoppala [this message]
2025-12-02 14:02 ` ✗ CI.checkpatch: warning for Intel Xe GPU Debug Support (eudebug) v6 Patchwork
2025-12-02 14:04 ` ✓ CI.KUnit: success " Patchwork
2025-12-02 15:34 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-02 18:30 ` ✗ Xe.CI.Full: failure " Patchwork
2025-12-03 9:13 ` ✗ CI.checkpatch: warning for Intel Xe GPU Debug Support (eudebug) v6 (rev2) Patchwork
2025-12-03 9:15 ` ✓ CI.KUnit: success " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2025-10-06 11:16 [PATCH 00/20] Intel Xe GPU Debug Support (eudebug) v5 Mika Kuoppala
2025-10-06 11:17 ` [PATCH 20/20] drm/xe/eudebug: Enable EU pagefault handling Mika Kuoppala
2025-10-06 18:43 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251202135241.880267-21-mika.kuoppala@linux.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=andrzej.hajda@intel.com \
--cc=christian.koenig@amd.com \
--cc=christoph.manszewski@intel.com \
--cc=gwan-gyeong.mun@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=joonas.lahtinen@linux.intel.com \
--cc=maciej.patelczyk@intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona.vetter@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox