public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
To: Mika Kuoppala <mika.kuoppala@linux.intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: <simona.vetter@ffwll.ch>, <matthew.brost@intel.com>,
	<christian.koenig@amd.com>, <thomas.hellstrom@linux.intel.com>,
	<joonas.lahtinen@linux.intel.com>, <gustavo.sousa@intel.com>,
	<jan.maslak@intel.com>, <dominik.karol.piatkowski@intel.com>,
	<rodrigo.vivi@intel.com>, <andrzej.hajda@intel.com>,
	<matthew.auld@intel.com>, <maciej.patelczyk@intel.com>
Subject: Re: [PATCH 22/24] drm/xe/eudebug: Introduce EU pagefault handling interface
Date: Thu, 30 Apr 2026 12:50:42 -0700	[thread overview]
Message-ID: <2574641c-a69a-4b46-9300-42751951d7bc@intel.com> (raw)
In-Reply-To: <20260430105121.712843-23-mika.kuoppala@linux.intel.com>



On 4/30/26 3:51 AM, Mika Kuoppala wrote:
> From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
> 
> The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
> access will halt the corresponding EUs. To solve this problem, introduce
> EU pagefault handling functionality, which allows to unhalt pagefaulted
> eu threads and to EU debugger to get inform about the eu attentions state
> of EU threads during execution.
> 
> If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event
> after handling the pagefault. The pagefault eudebug event follows
> the newly added drm_xe_eudebug_event_pagefault type.
> When a pagefault occurs, it prevents to send the
> DRM_XE_EUDEBUG_EVENT_EU_ATTENTION event to the client during pagefault
> handling.
> 
> The page fault event delivery follows the below policy.
> (1) If EU Debugger discovery has completed and pagefaulted eu threads turn
>      on attention bit then pagefault handler delivers pagefault event
>      directly.
> (2) If a pagefault occurs during eu debugger discovery process, pagefault
>      handler queues a pagefault event and sends the queued event when
>      discovery has completed and pagefaulted eu threads turn on attention
>      bit.
> (3) If the pagefaulted eu thread struggles to turn on the attention bit
>      within the specified time, the attention scan worker sends a pagefault
>      event when it detects that the attention bit is turned on.
> 
> If multiple eu threads are running and a pagefault occurs due to accessing
> the same invalid address, send a single pagefault event
> (DRM_XE_EUDEBUG_EVENT_PAGEFAULT type) to the user debugger instead of a
> pagefault event for each of the multiple eu threads.
> If eu threads (other than the one that caused the page fault before) access
> the new invalid addresses, send a new pagefault event.
> 
> As the attention scan worker send the eu attention event whenever the
> attention bit is turned on, user debugger receives attenion event
> immediately after pagefault event.
> In this case, the page-fault event always precedes the attention event.
> 
> When the user debugger receives an attention event after a pagefault event,
> it can detect whether additional breakpoints or interrupts occur in
> addition to the existing pagefault by comparing the eu threads where the
> pagefault occurred with the eu threads where the attention bit is newly
> enabled.
> 
> v2: use only force exception (Joonas, Mika)
> v3: rebased on v4 (Mika)
> v4: streamline uapi, cleanups (Mika)
> v5: struct member documentation (Mika)
> v6: fault to fault_type (Mika)
> v7: pagefault rework (Maciej)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
> Signed-off-by: Jan Maślak <jan.maslak@intel.com>
> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile               |   2 +-
>   drivers/gpu/drm/xe/xe_eudebug.c           | 104 +++++-
>   drivers/gpu/drm/xe/xe_eudebug.h           |   8 +
>   drivers/gpu/drm/xe/xe_eudebug_hw.c        |  15 +-
>   drivers/gpu/drm/xe/xe_eudebug_pagefault.c | 412 ++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_eudebug_pagefault.h |  63 ++++
>   drivers/gpu/drm/xe/xe_eudebug_types.h     |  61 +++-
>   drivers/gpu/drm/xe/xe_guc_pagefault.c     |   3 +-
>   drivers/gpu/drm/xe/xe_pagefault_types.h   |   1 +
>   include/uapi/drm/xe_drm_eudebug.h         |  12 +
>   10 files changed, 658 insertions(+), 23 deletions(-)
>   create mode 100644 drivers/gpu/drm/xe/xe_eudebug_pagefault.c
>   create mode 100644 drivers/gpu/drm/xe/xe_eudebug_pagefault.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index e43d89a45d39..53302104d05c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -158,7 +158,7 @@ xe-$(CONFIG_DRM_XE_GPUSVM) += xe_svm.o
>   xe-$(CONFIG_DRM_GPUSVM) += xe_userptr.o
>   
>   # debugging shaders with gdb (eudebug) support
> -xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o xe_eudebug_vm.o xe_eudebug_hw.o xe_gt_debug.o
> +xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o xe_eudebug_vm.o xe_eudebug_hw.o xe_eudebug_pagefault.o xe_gt_debug.o
>   
>   # graphics hardware monitoring (HWMON) support
>   xe-$(CONFIG_HWMON) += xe_hwmon.o
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> index 3f22924a1275..06cbb3de57f4 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> @@ -17,11 +17,15 @@
>   #include "xe_eudebug.h"
>   #include "xe_eudebug_hw.h"
>   #include "xe_eudebug_types.h"
> +#include "xe_eudebug_pagefault.h"
>   #include "xe_eudebug_vm.h"
>   #include "xe_exec_queue.h"
> +#include "xe_force_wake.h"
>   #include "xe_gt.h"
>   #include "xe_gt_debug.h"
> +#include "xe_gt_mcr.h"
>   #include "xe_hw_engine.h"
> +#include "regs/xe_gt_regs.h"
>   #include "xe_macros.h"
>   #include "xe_pm.h"
>   #include "xe_sriov_pf.h"
> @@ -261,9 +265,12 @@ static void xe_eudebug_free(struct kref *ref)
>   	while (kfifo_get(&d->events.fifo, &event))
>   		kfree(event);
>   
> +	xe_eudebug_pagefault_fini(d);
>   	xe_eudebug_resources_destroy(d);
> +	mutex_destroy(&d->pf_lock);
>   	mutex_destroy(&d->hw.lock);
>   	mutex_destroy(&d->target.lock);
> +
>   	XE_WARN_ON(d->target.xef);
>   
>   	xe_eudebug_assert(d, !kfifo_len(&d->events.fifo));
> @@ -440,7 +447,7 @@ static bool xe_eudebug_detach(struct xe_device *xe,
>   	eu_dbg(d, "session %lld detached with %d", d->session, err);
>   
>   	release_acks(d);
> -
> +	xe_eudebug_pagefault_signal(target);
>   	remove_debugger(target);
>   	xe_file_put(target);
>   
> @@ -1939,10 +1946,6 @@ static int xe_eudebug_handle_gt_attention(struct xe_gt *gt)
>   {
>   	int ret;
>   
> -	ret = xe_gt_eu_threads_needing_attention(gt);
> -	if (ret <= 0)
> -		return ret;
> -
>   	ret = xe_send_gt_attention(gt);
>   
>   	/* Discovery in progress, fake it */
> @@ -1952,6 +1955,65 @@ static int xe_eudebug_handle_gt_attention(struct xe_gt *gt)
>   	return ret;
>   }
>   
> +int xe_eudebug_send_pagefault_event(struct xe_eudebug *d,
> +				    struct xe_eudebug_pagefault *pf)
> +{
> +	struct drm_xe_eudebug_event_pagefault *ep;
> +	struct drm_xe_eudebug_event *event;
> +	int h_queue, h_lrc;
> +	u32 size = xe_gt_eu_attention_bitmap_size(pf->q->gt) * 3;
> +	u32 sz = struct_size(ep, bitmask, size);
> +	int ret;
> +
> +	XE_WARN_ON(pf->lrc_idx < 0 || pf->lrc_idx >= pf->q->width);
> +
> +	XE_WARN_ON(!xe_exec_queue_is_debuggable(pf->q));
> +
> +	h_queue = find_handle(d, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, pf->q);
> +	if (h_queue < 0)
> +		return h_queue;
> +
> +	h_lrc = find_handle(d, XE_EUDEBUG_RES_TYPE_LRC, pf->q->lrc[pf->lrc_idx]);
> +	if (h_lrc < 0)
> +		return h_lrc;
> +
> +	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_PAGEFAULT, 0,
> +					DRM_XE_EUDEBUG_EVENT_STATE_CHANGE, sz);
> +
> +	if (!event)
> +		return -ENOSPC;
> +
> +	ep = cast_event(ep, event);
> +	ep->exec_queue_handle = h_queue;
> +	ep->lrc_handle = h_lrc;
> +	ep->bitmask_size = size;
> +	ep->pagefault_address = pf->fault.addr;
> +
> +	memcpy(ep->bitmask, pf->attentions.before.att, pf->attentions.before.size);
> +	memcpy(ep->bitmask + pf->attentions.before.size,
> +	       pf->attentions.after.att, pf->attentions.after.size);
> +	memcpy(ep->bitmask + pf->attentions.before.size + pf->attentions.after.size,
> +	       pf->attentions.resolved.att, pf->attentions.resolved.size);
> +
> +	event->seqno = atomic_long_inc_return(&d->events.seqno);
> +
> +	ret = xe_eudebug_queue_event(d, event);
> +	if (ret)
> +		xe_eudebug_disconnect(d, ret);
> +
> +	return ret;
> +}
> +
> +static void handle_attention_fail(struct xe_gt *gt, int gt_id, int ret)
> +{
> +	/* TODO: error capture */
> +	drm_info(&gt_to_xe(gt)->drm,
> +		 "gt:%d unable to handle eu attention ret = %d\n",
> +		 gt_id, ret);
> +
> +	xe_gt_reset_async(gt);
> +}
> +
>   static void attention_poll_work(struct work_struct *work)
>   {
>   	struct xe_device *xe = container_of(work, typeof(*xe),
> @@ -1975,15 +2037,15 @@ static void attention_poll_work(struct work_struct *work)
>   			if (gt->info.type != XE_GT_TYPE_MAIN)
>   				continue;
>   
> -			ret = xe_eudebug_handle_gt_attention(gt);
> -			if (ret) {
> -				/* TODO: error capture */
> -				drm_info(&gt_to_xe(gt)->drm,
> -					 "gt:%d unable to handle eu attention ret=%d\n",
> -					 gt_id, ret);
> +			if (!xe_gt_eu_threads_needing_attention(gt))
> +				continue;
> +
> +			ret = xe_eudebug_handle_pagefaults(gt);
> +			if (!ret)
> +				ret = xe_eudebug_handle_gt_attention(gt);
>   
> -				xe_gt_reset_async(gt);
> -			}
> +			if (ret)
> +				handle_attention_fail(gt, gt_id, ret);
>   		}
>   
>   		xe_pm_runtime_put(xe);
> @@ -1992,12 +2054,12 @@ static void attention_poll_work(struct work_struct *work)
>   	schedule_delayed_work(&xe->eudebug.attention_dwork, delay);
>   }
>   
> -static void attention_poll_stop(struct xe_device *xe)
> +void xe_eudebug_attention_poll_stop(struct xe_device *xe)
>   {
>   	cancel_delayed_work_sync(&xe->eudebug.attention_dwork);
>   }
>   
> -static void attention_poll_start(struct xe_device *xe)
> +void xe_eudebug_attention_poll_start(struct xe_device *xe)
>   {
>   	mod_delayed_work(system_wq, &xe->eudebug.attention_dwork, 0);
>   }
> @@ -2042,6 +2104,8 @@ xe_eudebug_connect(struct xe_device *xe,
>   	kref_init(&d->ref);
>   	mutex_init(&d->target.lock);
>   	mutex_init(&d->hw.lock);
> +	mutex_init(&d->pf_lock);
> +	INIT_LIST_HEAD(&d->pagefaults);
>   	init_waitqueue_head(&d->events.write_done);
>   	init_waitqueue_head(&d->events.read_done);
>   	init_completion(&d->discovery);
> @@ -2079,7 +2143,7 @@ xe_eudebug_connect(struct xe_device *xe,
>   
>   	kref_get(&d->ref); /* for discovery */
>   	queue_work(xe->eudebug.wq, &d->discovery_work);
> -	attention_poll_start(xe);
> +	xe_eudebug_attention_poll_start(xe);
>   
>   	eu_dbg(d, "connected session %lld", d->session);
>   
> @@ -2092,6 +2156,7 @@ xe_eudebug_connect(struct xe_device *xe,
>   err_free_res:
>   	xe_eudebug_resources_destroy(d);
>   err_free:
> +	mutex_destroy(&d->pf_lock);
>   	mutex_destroy(&d->hw.lock);
>   	mutex_destroy(&d->target.lock);
>   	kfree(d);
> @@ -2101,6 +2166,7 @@ xe_eudebug_connect(struct xe_device *xe,
>   
>   void xe_eudebug_file_close(struct xe_file *xef)
>   {
> +	xe_eudebug_pagefault_signal(xef);
>   	remove_debugger(xef);
>   }
>   
> @@ -2162,9 +2228,9 @@ int xe_eudebug_enable(struct xe_device *xe, bool enable)
>   	mutex_unlock(&xe->eudebug.lock);
>   
>   	if (enable) {
> -		attention_poll_start(xe);
> +		xe_eudebug_attention_poll_start(xe);
>   	} else {
> -		attention_poll_stop(xe);
> +		xe_eudebug_attention_poll_stop(xe);
>   
>   		if (IS_SRIOV_PF(xe))
>   			xe_sriov_pf_end_lockdown(xe);
> @@ -2217,7 +2283,7 @@ static void xe_eudebug_fini(struct drm_device *dev, void *__unused)
>   
>   	xe_assert(xe, list_empty(&xe->eudebug.targets));
>   
> -	attention_poll_stop(xe);
> +	xe_eudebug_attention_poll_stop(xe);
>   }
>   
>   void xe_eudebug_init(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
> index b1f8a5fcc890..826b63c4ba09 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.h
> +++ b/drivers/gpu/drm/xe/xe_eudebug.h
> @@ -13,12 +13,14 @@ struct drm_file;
>   struct xe_debug_data;
>   struct xe_device;
>   struct xe_file;
> +struct xe_gt;
>   struct xe_vm;
>   struct xe_vma;
>   struct xe_vma_ops;
>   struct xe_exec_queue;
>   struct xe_user_fence;
>   struct xe_eudebug;
> +struct xe_eudebug_pagefault;
>   
>   #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
>   
> @@ -76,6 +78,12 @@ struct xe_eudebug *xe_eudebug_get_nolock(struct xe_file *xef);
>   struct xe_eudebug *xe_eudebug_get_nolock_with_discovery(struct xe_file *xef);
>   void xe_eudebug_put(struct xe_eudebug *d);
>   
> +int xe_eudebug_send_pagefault_event(struct xe_eudebug *d,
> +				    struct xe_eudebug_pagefault *pf);
> +
> +void xe_eudebug_attention_poll_stop(struct xe_device *xe);
> +void xe_eudebug_attention_poll_start(struct xe_device *xe);
> +
>   #else
>   
>   static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/xe/xe_eudebug_hw.c b/drivers/gpu/drm/xe/xe_eudebug_hw.c
> index e6510e7b51a9..d67530ace186 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug_hw.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug_hw.c
> @@ -340,6 +340,7 @@ static int do_eu_control(struct xe_eudebug *d,
>   	void __user * const bitmask_ptr = u64_to_user_ptr(arg->bitmask_ptr);
>   	struct xe_device *xe = d->xe;
>   	struct xe_exec_queue *q, *active;
> +	struct dma_fence *pf_fence;
>   	struct xe_lrc *lrc;
>   	unsigned int hw_attn_size, attn_size;
>   	u8 *bits = NULL;
> @@ -411,8 +412,20 @@ static int do_eu_control(struct xe_eudebug *d,
>   		goto out_free;
>   	}
>   
> -	ret = -EINVAL;
>   	mutex_lock(&d->hw.lock);
> +	do {
> +		pf_fence = dma_fence_get(d->pf_fence);
> +		if (pf_fence) {
> +			mutex_unlock(&d->hw.lock);
> +			ret = dma_fence_wait(pf_fence, true);
> +			dma_fence_put(pf_fence);
> +			if (ret)
> +				goto out_free;
> +			mutex_lock(&d->hw.lock);
> +		}
> +	} while (pf_fence);
> +
> +	ret = -EINVAL;
>   
>   	switch (arg->cmd) {
>   	case DRM_XE_EUDEBUG_EU_CONTROL_CMD_INTERRUPT_ALL:
> diff --git a/drivers/gpu/drm/xe/xe_eudebug_pagefault.c b/drivers/gpu/drm/xe/xe_eudebug_pagefault.c
> new file mode 100644
> index 000000000000..15389fcd042f
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_eudebug_pagefault.c
> @@ -0,0 +1,412 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023-2025 Intel Corporation
> + */
> +
> +#include "xe_eudebug_pagefault.h"
> +
> +#include <linux/delay.h>
> +
> +#include "xe_exec_queue.h"
> +#include "xe_eudebug.h"
> +#include "xe_eudebug_hw.h"
> +#include "xe_force_wake.h"
> +#include "xe_gt_debug.h"
> +#include "xe_gt_mcr.h"
> +#include "regs/xe_gt_regs.h"
> +#include "xe_vm.h"
> +
> +static struct xe_gt *
> +epf_to_gt(struct xe_eudebug_pagefault *epf)
> +{
> +	return epf->q->gt;
> +}
> +
> +static void destroy_pagefault(struct xe_eudebug_pagefault *epf)
> +{
> +	xe_exec_queue_put(epf->q);
> +	kfree(epf);
> +}
> +
> +static void queue_pagefault(struct xe_eudebug *d,
> +			    struct xe_eudebug_pagefault *epf)
> +{
> +	mutex_lock(&d->pf_lock);
> +	list_add_tail(&epf->link, &d->pagefaults);
> +	mutex_unlock(&d->pf_lock);
> +}
> +
> +static const char *
> +pagefault_get_driver_name(struct dma_fence *dma_fence)
> +{
> +	return "xe";
> +}
> +
> +static const char *
> +pagefault_fence_get_timeline_name(struct dma_fence *dma_fence)
> +{
> +	return "eudebug_pagefault_fence";
> +}
> +
> +static const struct dma_fence_ops pagefault_fence_ops = {
> +	.get_driver_name = pagefault_get_driver_name,
> +	.get_timeline_name = pagefault_fence_get_timeline_name,
> +};
> +
> +struct pagefault_fence {
> +	struct dma_fence base;
> +	spinlock_t lock;
> +};
> +
> +static struct pagefault_fence *pagefault_fence_create(void)
> +{
> +	struct pagefault_fence *fence;
> +
> +	fence = kzalloc_obj(*fence, GFP_KERNEL);
> +	if (fence == NULL)
> +		return NULL;
> +
> +	spin_lock_init(&fence->lock);
> +	dma_fence_init(&fence->base, &pagefault_fence_ops, &fence->lock,
> +		       dma_fence_context_alloc(1), 1);
> +
> +	return fence;
> +}
> +
> +static void xe_eudebug_pagefault_set_private(struct xe_pagefault *pf,
> +					     struct xe_eudebug_pagefault *epf)
> +{
> +	u64 private = (u64)pf->producer.private;
> +
> +	XE_WARN_ON(private & XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG);
> +
> +	epf->private = pf->producer.private;
> +	private = (u64)epf | XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG;
> +	pf->producer.private = (void *)private;
> +}
> +
> +void *xe_eudebug_pagefault_get_private(void *private)
> +{
> +	if ((u64)private & XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG) {
> +		struct xe_eudebug_pagefault *epf = (void *)((u64)private &
> +							    ~XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG);
> +		return epf->private;
> +	}
> +	return private;
> +}
> +
> +int
> +xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf)
> +{
> +	struct pagefault_fence *pf_fence;
> +	struct xe_eudebug_pagefault *epf;
> +	struct xe_gt *gt = pf->gt;
> +	struct xe_exec_queue *q;
> +	struct dma_fence *fence;
> +	struct xe_eudebug *d;
> +	unsigned int fw_ref;
> +	int lrc_idx;
> +	u32 td_ctl;
> +
> +	d = xe_eudebug_get_nolock_with_discovery(vm->xef);
> +	if (!d)
> +		return -ENOENT;
> +
> +	q = xe_gt_runalone_active_queue_get(gt, &lrc_idx);
> +	if (IS_ERR(q))
> +		goto err_put_eudebug;
> +
> +	if (XE_WARN_ON(q->vm != vm))
> +		goto err_put_exec_queue;
> +
> +	if (!xe_exec_queue_is_debuggable(q))
> +		goto err_put_exec_queue;
> +
> +	fw_ref = xe_force_wake_get(gt_to_fw(gt), q->hwe->domain);
> +	if (!fw_ref)
> +		goto err_put_exec_queue;
> +
> +	/*
> +	 * If there is no debug functionality (TD_CTL_GLOBAL_DEBUG_ENABLE, etc.),
> +	 * don't proceed pagefault routine for eu debugger.
> +	 */
> +	td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
> +	if (!td_ctl)
> +		goto err_put_fw;
> +
> +	epf = kzalloc_obj(*epf, GFP_KERNEL);
> +	if (!epf)
> +		goto err_put_fw;
> +
> +	xe_eudebug_attention_poll_stop(gt_to_xe(gt));
> +
> +	mutex_lock(&d->hw.lock);
> +	fence = dma_fence_get(d->pf_fence);
> +
> +	if (fence) {
> +		/*
> +		 * Unless there are parallel PF routines this should
> +		 * not happen.
> +		 */
> +		dma_fence_put(fence);
> +		goto err_unlock_hw_lock;
> +	}
> +
> +	pf_fence = pagefault_fence_create();
> +	if (!pf_fence)
> +		goto err_unlock_hw_lock;
> +
> +	d->pf_fence = &pf_fence->base;
> +
> +	INIT_LIST_HEAD(&epf->link);
> +
> +	xe_gt_eu_attentions_read(gt, &epf->attentions.before, 0);
> +
> +	if (td_ctl & TD_CTL_FORCE_EXCEPTION)
> +		eu_warn(d, "force exception already set!");
> +
> +	/* Halt regardless of thread dependencies */
> +	while (!(td_ctl & TD_CTL_FORCE_EXCEPTION)) {
> +		xe_gt_mcr_multicast_write(gt, TD_CTL,
> +					  td_ctl | TD_CTL_FORCE_EXCEPTION);
> +		udelay(200);
> +		td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
> +	}
> +
> +	xe_gt_eu_attentions_read(gt, &epf->attentions.after,
> +				 XE_GT_ATTENTION_TIMEOUT_MS);
> +
> +	mutex_unlock(&d->hw.lock);
> +
> +	/*
> +	 * xe_exec_queue_put() will be called from destroy_pagefault()
> +	 * or handle_pagefault()
> +	 */
> +	epf->q = q;
> +	epf->lrc_idx = lrc_idx;
> +	epf->fault.addr = pf->consumer.page_addr;
> +	epf->fault.type_level = pf->consumer.fault_type_level;
> +	epf->fault.access_type = pf->consumer.access_type;
> +
> +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> +	xe_eudebug_put(d);
> +
> +	xe_eudebug_pagefault_set_private(pf, epf);
> +
> +	return 0;
> +
> +err_unlock_hw_lock:
> +	mutex_unlock(&d->hw.lock);
> +	xe_eudebug_attention_poll_start(gt_to_xe(gt));
> +	kfree(epf);
> +err_put_fw:
> +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> +err_put_exec_queue:
> +	xe_exec_queue_put(q);
> +err_put_eudebug:
> +	xe_eudebug_put(d);
> +
> +	return -EINVAL;
> +}
> +
> +static struct xe_eudebug_pagefault *xe_debubug_get_epf(struct xe_pagefault *pf)
> +{
> +	u64 private = (u64)pf->producer.private;
> +
> +	if (private & XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG)
> +		return (void *)(private & ~XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG);
> +
> +	return NULL;
> +}
> +
> +struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf)
> +{
> +	struct xe_vma *vma = NULL;
> +	struct xe_eudebug_pagefault *epf = xe_debubug_get_epf(pf);
> +
> +	if (!epf)
> +		return NULL;
> +
> +	vma = xe_vm_create_null_vma(vm, pf->consumer.page_addr);
> +	if (IS_ERR(vma))
> +		return vma;
> +
> +	return vma;
> +}
> +
> +static void
> +xe_eudebug_pagefault_process(struct xe_eudebug_pagefault *epf)
> +{
> +	struct xe_gt *gt = epf_to_gt(epf);
> +
> +	xe_gt_eu_attentions_read(gt, &epf->attentions.resolved,
> +				 XE_GT_ATTENTION_TIMEOUT_MS);
> +}
> +
> +static int send_queued_pagefaults_locked(struct xe_eudebug *d)
> +{
> +	struct xe_eudebug_pagefault *epf, *epf_temp;
> +	int ret = 0;
> +
> +	list_for_each_entry_safe(epf, epf_temp, &d->pagefaults, link) {
> +		ret = xe_eudebug_send_pagefault_event(d, epf);
> +
> +		list_del(&epf->link);
> +
> +		destroy_pagefault(epf);
> +
> +		if (ret)
> +			break;
> +	}
> +	return ret;
> +}
> +
> +static int send_queued_pagefaults(struct xe_eudebug *d)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&d->pf_lock);
> +	ret = send_queued_pagefaults_locked(d);
> +	mutex_unlock(&d->pf_lock);
> +
> +	return ret;
> +}
> +
> +static void
> +_xe_eudebug_pagefault_destroy(struct xe_eudebug_pagefault *epf, int err)
> +{
> +	struct xe_gt *gt = epf_to_gt(epf);
> +	struct xe_vm *vm = epf->q->vm;
> +	struct xe_eudebug *d;
> +	struct dma_fence *f;
> +	unsigned int fw_ref;
> +	bool queued = false;
> +	u32 td_ctl, ret = 0;
> +
> +	fw_ref = xe_force_wake_get(gt_to_fw(gt), epf->q->hwe->domain);
> +	if (!fw_ref) {
> +		struct xe_device *xe = gt_to_xe(gt);
> +
> +		drm_warn(&xe->drm, "Forcewake fail: Can not recover TD_CTL");
> +	} else {
> +		td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
> +		xe_gt_mcr_multicast_write(gt, TD_CTL, td_ctl &
> +					  ~(TD_CTL_FORCE_EXCEPTION));
> +		xe_force_wake_put(gt_to_fw(gt), fw_ref);
> +	}
> +
> +	d = xe_eudebug_get_nolock_with_discovery(vm->xef);
> +	if (!d)
> +		goto epf_free;
> +
> +	if (!err) {
> +		if (completion_done(&d->discovery)) {
> +			/* Just in case there was a discovery */
> +			ret = send_queued_pagefaults_locked(d);
> +			if (!ret)
> +				ret = xe_eudebug_send_pagefault_event(d, epf);
> +		} else {
> +			queue_pagefault(d, epf);
> +			queued = true;
> +		}
> +	}
> +
> +	mutex_lock(&d->hw.lock);
> +	f = d->pf_fence;
> +	d->pf_fence = NULL;
> +	mutex_unlock(&d->hw.lock);
> +
> +	if (f) {
> +		dma_fence_signal(f);
> +		dma_fence_put(f);
> +	}
> +
> +	xe_eudebug_put(d);
> +
> + epf_free:
> +	if (!queued || ret)
> +		destroy_pagefault(epf);
> +
> +	xe_eudebug_attention_poll_start(gt_to_xe(gt));
> +}
> +
> +int xe_eudebug_handle_pagefaults(struct xe_gt *gt)
> +{
> +	struct xe_exec_queue *q;
> +	struct xe_eudebug *d;
> +	int ret, lrc_idx;
> +
> +	q = xe_gt_runalone_active_queue_get(gt, &lrc_idx);
> +	if (IS_ERR(q))
> +		return PTR_ERR(q);
> +
> +	if (!xe_exec_queue_is_debuggable(q)) {
> +		ret = -EPERM;
> +		goto out_exec_queue_put;
> +	}
> +
> +	d = xe_eudebug_get_nolock(q->vm->xef);
> +	if (!d) {
> +		ret = -ENOTCONN;
> +		goto out_exec_queue_put;
> +	}
> +
> +	ret = send_queued_pagefaults(d);
> +
> +	xe_eudebug_put(d);
> +
> +out_exec_queue_put:
> +	xe_exec_queue_put(q);
> +
> +	return ret;
> +}
> +
> +void xe_eudebug_pagefault_service(struct xe_pagefault *pf, int err)
> +{
> +	struct xe_eudebug_pagefault *epf = xe_debubug_get_epf(pf);
> +
> +	if (!epf)
> +		return;
> +
> +	if (!err)
> +		xe_eudebug_pagefault_process(epf);
> +
> +	_xe_eudebug_pagefault_destroy(epf, err);
> +}
> +
> +void xe_eudebug_pagefault_fini(struct xe_eudebug *d)
> +{
> +	struct xe_eudebug_pagefault *epf, *epf_temp;
> +
> +	/* Since it's the last reference no race here */
> +
> +	list_for_each_entry_safe(epf, epf_temp, &d->pagefaults, link) {
> +		list_del(&epf->link);
> +		destroy_pagefault(epf);
> +	}
> +
> +	XE_WARN_ON(d->pf_fence);
> +}
> +
> +void xe_eudebug_pagefault_signal(struct xe_file *xef)
> +{
> +	struct xe_eudebug *d;
> +	struct dma_fence *f;
> +
> +	mutex_lock(&xef->eudebug.lock);
> +	d = xef->eudebug.debugger;
> +	mutex_unlock(&xef->eudebug.lock);
> +
> +	if (!d)
> +		return;
> +
> +	mutex_lock(&d->hw.lock);
> +	f = d->pf_fence;
> +	d->pf_fence = NULL;
> +	mutex_unlock(&d->hw.lock);
> +
> +	if (f) {
> +		dma_fence_signal(f);
> +		dma_fence_put(f);
> +	}
> +}
> diff --git a/drivers/gpu/drm/xe/xe_eudebug_pagefault.h b/drivers/gpu/drm/xe/xe_eudebug_pagefault.h
> new file mode 100644
> index 000000000000..c7434e1c3bd3
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_eudebug_pagefault.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2023-2025 Intel Corporation
> + */
> +
> +#ifndef _XE_EUDEBUG_PAGEFAULT_H_
> +#define _XE_EUDEBUG_PAGEFAULT_H_
> +
> +#include <linux/types.h>
> +
> +struct xe_eudebug;
> +struct xe_gt;
> +struct xe_pagefault;
> +struct xe_eudebug_pagefault;
> +struct xe_vm;
> +struct xe_file;
> +
> +void xe_eudebug_pagefault_fini(struct xe_eudebug *d);
> +int xe_eudebug_handle_pagefaults(struct xe_gt *gt);
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
> +int xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf);
> +struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf);
> +void xe_eudebug_pagefault_service(struct xe_pagefault *pf, int err);
> +/*
> + * The (struct xe_pagefault *)->producer.private is a pointer which, for now,
> + * stores the pointer guc.
> + * EU Debug intercepts this pointer to store struct xe_eudebug_pagefault.
> + * Original pointer can be obtained via eudebug function below called with
> + * mentioned producer's private field.
> + */
> +#define XE_EUDEBUG_PAGEFAULT_PRIVATE_EUDEBUG	0x1
> +void *xe_eudebug_pagefault_get_private(void *private);
> +
> +void xe_eudebug_pagefault_signal(struct xe_file *xef);
> +#else
> +
in order to use EOPNOTSUPP,  it should include `#include <linux/errno.h>`.
this version missed it.

G.G.> +static inline int
> +xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf)
> +{
> +	return NULL;
> +}
> +
> +static inline void xe_eudebug_pagefault_service(struct xe_pagefault *pf, int err)
> +{
> +}
> +
> +static inline void *xe_eudebug_pagefault_get_private(void *private)
> +{
> +	return private;
> +}
> +
> +static inline void xe_eudebug_pagefault_signal(struct xe_file *xef)
> +{
> +}
> +#endif
> +
> +#endif /* _XE_EUDEBUG_PAGEFAULT_H_ */
> diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
> index 386b5c78ecff..46dac32fabf6 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug_types.h
> +++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
> @@ -15,6 +15,8 @@
>   #include <linux/wait.h>
>   #include <linux/xarray.h>
>   
> +#include "xe_gt_debug_types.h"
> +
>   struct xe_device;
>   struct task_struct;
>   struct xe_eudebug;
> @@ -37,7 +39,7 @@ enum xe_eudebug_state {
>   };
>   
>   #define CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE 64
> -#define XE_EUDEBUG_MAX_EVENT_TYPE DRM_XE_EUDEBUG_EVENT_EU_ATTENTION
> +#define XE_EUDEBUG_MAX_EVENT_TYPE DRM_XE_EUDEBUG_EVENT_PAGEFAULT
>   
>   /**
>    * struct xe_eudebug_handle - eudebug resource handle
> @@ -164,6 +166,63 @@ struct xe_eudebug {
>   
>   	/** @ops: operations for eu_control */
>   	struct xe_eudebug_eu_control_ops *ops;
> +
> +	/** @pf_lock: guards access to pagefaults list*/
> +	struct mutex pf_lock;
> +	/** @pagefaults: xe_eudebug_pagefault list for pagefault event queuing */
> +	struct list_head pagefaults;
> +	/**
> +	 * @pf_fence: fence on operations of eus (eu thread control and attention)
> +	 * when page faults are being handled, protected by @eu_lock.
> +	 */
> +	struct dma_fence *pf_fence;
> +};
> +
> +/**
> + * struct xe_eudebug_pagefault - eudebug structure for queuing pagefault
> + */
> +struct xe_eudebug_pagefault {
> +	/** @link: link into the xe_eudebug.pagefaults */
> +	struct list_head link;
> +	/** @q: exec_queue which raised pagefault */
> +	struct xe_exec_queue *q;
> +	/** @lrc_idx: lrc index of the workload which raised pagefault */
> +	int lrc_idx;
> +
> +	/** @fault: pagefault raw partial data passed from guc */
> +	struct {
> +		/** @addr: ppgtt address where the pagefault occurred */
> +		u64 addr;
> +		u8 type_level;
> +		u8 access_type;
> +	} fault;
> +
> +	/** @attentions: attention states in different phases of fault */
> +	struct {
> +		/** @before: state of attention bits before page fault WA processing*/
> +		struct xe_eu_attentions before;
> +		/**
> +		 * @after: status of attention bits during page fault WA processing.
> +		 * It includes eu threads where attention bits are turned on for
> +		 * reasons other than page fault WA (breakpoint, interrupt, etc.).
> +		 */
> +		struct xe_eu_attentions after;
> +		/**
> +		 * @resolved: state of the attention bits after page fault WA.
> +		 * It includes the eu thread that caused the page fault.
> +		 * To determine the eu thread that caused the page fault,
> +		 * do XOR attentions.after and attentions.resolved.
> +		 */
> +		struct xe_eu_attentions resolved;
> +	} attentions;
> +
> +	/**
> +	 * @private: copied the (struct xe_pagefault *)->producer.private filed.
> +	 * EU Debugger masks private field in the struct xe_pagefault.
> +	 * The xe_eudebug_pagefault_get_private() function to extracts original
> +	 * private field regardless if it was shadowed or not.
> +	 */
> +	void *private;
>   };
>   
>   #endif /* _XE_EUDEBUG_TYPES_H_ */
> diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> index 607e32392f46..038688ab63b4 100644
> --- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> @@ -4,6 +4,7 @@
>    */
>   
>   #include "abi/guc_actions_abi.h"
> +#include "xe_eudebug_pagefault.h"
>   #include "xe_guc.h"
>   #include "xe_guc_ct.h"
>   #include "xe_guc_pagefault.h"
> @@ -35,7 +36,7 @@ static void guc_ack_fault(struct xe_pagefault *pf, int err)
>   		FIELD_PREP(PFR_ENG_CLASS, engine_class) |
>   		FIELD_PREP(PFR_PDATA, pdata),
>   	};
> -	struct xe_guc *guc = pf->producer.private;
> +	struct xe_guc *guc = xe_eudebug_pagefault_get_private(pf->producer.private);
>   
>   	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>   }
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> index c4ee625b93dd..ab38e135f23d 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -10,6 +10,7 @@
>   
>   struct xe_gt;
>   struct xe_pagefault;
> +struct xe_eudebug_pagefault;
>   
>   /** enum xe_pagefault_access_type - Xe page fault access type */
>   enum xe_pagefault_access_type {
> diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
> index 54394a7e12ab..f7d035532be2 100644
> --- a/include/uapi/drm/xe_drm_eudebug.h
> +++ b/include/uapi/drm/xe_drm_eudebug.h
> @@ -53,6 +53,7 @@ struct drm_xe_eudebug_event {
>   #define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_DEBUG_DATA	5
>   #define DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE	6
>   #define DRM_XE_EUDEBUG_EVENT_EU_ATTENTION	7
> +#define DRM_XE_EUDEBUG_EVENT_PAGEFAULT		8
>   
>   	/** @flags: Flags */
>   	__u16 flags;
> @@ -358,6 +359,17 @@ struct drm_xe_eudebug_event_eu_attention {
>   	__u8 bitmask[];
>   };
>   
> +struct drm_xe_eudebug_event_pagefault {
> +	struct drm_xe_eudebug_event base;
> +
> +	__u64 exec_queue_handle;
> +	__u64 lrc_handle;
> +	__u32 flags;
> +	__u32 bitmask_size;
> +	__u64 pagefault_address;
> +	__u8 bitmask[];
> +};
> +
>   #if defined(__cplusplus)
>   }
>   #endif


  reply	other threads:[~2026-04-30 19:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 10:50 [PATCH 00/24] Intel Xe GPU Debug Support (eudebug) v8 Mika Kuoppala
2026-04-30 10:50 ` [PATCH 01/24] drm/xe/eudebug: Introduce eudebug interface Mika Kuoppala
2026-04-30 10:50 ` [PATCH 02/24] drm/xe/eudebug: Add documentation Mika Kuoppala
2026-04-30 10:50 ` [PATCH 03/24] drm/xe/eudebug: Add connection establishment documentation Mika Kuoppala
2026-04-30 10:51 ` [PATCH 04/24] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
2026-04-30 10:51 ` [PATCH 05/24] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
2026-04-30 10:51 ` [PATCH 06/24] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
2026-04-30 10:51 ` [PATCH 07/24] drm/xe/eudebug: Mark guc contexts as debuggable Mika Kuoppala
2026-04-30 10:51 ` [PATCH 08/24] drm/xe: Introduce ADD_DEBUG_DATA and REMOVE_DEBUG_DATA vm bind ops Mika Kuoppala
2026-04-30 10:51 ` [PATCH 09/24] drm/xe/eudebug: Introduce vm bind and vm bind debug data events Mika Kuoppala
2026-04-30 10:51 ` [PATCH 10/24] drm/xe/eudebug: Add ufence events with acks Mika Kuoppala
2026-04-30 10:51 ` [PATCH 11/24] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
2026-04-30 10:51 ` [PATCH 12/24] drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
2026-04-30 10:51 ` [PATCH 13/24] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
2026-04-30 10:51 ` [PATCH 14/24] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
2026-04-30 10:51 ` [PATCH 15/24] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
2026-04-30 10:51 ` [PATCH 16/24] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
2026-04-30 14:16   ` Michal Wajdeczko
2026-04-30 10:51 ` [PATCH 17/24] drm/xe: Implement SR-IOV and eudebug exclusivity Mika Kuoppala
2026-04-30 10:51 ` [PATCH 18/24] drm/xe: Add xe_client_debugfs and introduce debug_data file Mika Kuoppala
2026-04-30 10:51 ` [PATCH 19/24] drm/xe/eudebug: Allow getting eudebug instance during discovery Mika Kuoppala
2026-04-30 10:51 ` [PATCH 20/24] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
2026-04-30 10:51 ` [PATCH 21/24] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
2026-04-30 10:51 ` [PATCH 22/24] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
2026-04-30 19:50   ` Gwan-gyeong Mun [this message]
2026-04-30 10:51 ` [PATCH 23/24] drm/xe/eudebug: Enable EU pagefault handling Mika Kuoppala
2026-04-30 10:51 ` [PATCH 24/24] drm/xe/eudebug: Disable SVM in Xe for Eudebug Mika Kuoppala
2026-04-30 19:22   ` Matthew Brost
2026-04-30 11:09 ` ✗ CI.checkpatch: warning for Intel Xe GPU Debug Support (eudebug) v8 Patchwork
2026-04-30 11:10 ` ✓ CI.KUnit: success " Patchwork
2026-04-30 12:06 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-30 22:41 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2574641c-a69a-4b46-9300-42751951d7bc@intel.com \
    --to=gwan-gyeong.mun@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dominik.karol.piatkowski@intel.com \
    --cc=gustavo.sousa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jan.maslak@intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=maciej.patelczyk@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mika.kuoppala@linux.intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox