Re: [PATCH 01/11] drm/xe: Stub out new pagefault layer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Francois Dugast <francois.dugast@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	<thomas.hellstrom@linux.intel.com>,
	<himal.prasad.ghimiray@intel.com>, <michal.mrozek@intel.com>
Subject: Re: [PATCH 01/11] drm/xe: Stub out new pagefault layer
Date: Wed, 27 Aug 2025 17:29:46 +0200	[thread overview]
Message-ID: <aK8kauwpALcoSxwr@fdugast-desk> (raw)
In-Reply-To: <20250806062242.1090416-2-matthew.brost@intel.com>

On Tue, Aug 05, 2025 at 11:22:32PM -0700, Matthew Brost wrote:
> Stub out the new page fault layer and add kernel documentation. This is
> intended as a replacement for the GT page fault layer, enabling multiple
> producers to hook into a shared page fault consumer interface.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile             |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c       |  63 ++++++++++++
>  drivers/gpu/drm/xe/xe_pagefault.h       |  19 ++++
>  drivers/gpu/drm/xe/xe_pagefault_types.h | 125 ++++++++++++++++++++++++
>  4 files changed, 208 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.c
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.h
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault_types.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 8e0c3412a757..6fbebafe79c9 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -93,6 +93,7 @@ xe-y += xe_bb.o \
>  	xe_nvm.o \
>  	xe_oa.o \
>  	xe_observation.o \
> +	xe_pagefault.o \
>  	xe_pat.o \
>  	xe_pci.o \
>  	xe_pcode.o \
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> new file mode 100644
> index 000000000000..3ce0e8d74b9d
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "xe_pagefault.h"
> +#include "xe_pagefault_types.h"
> +
> +/**
> + * DOC: Xe page faults
> + *
> + * Xe page faults are handled in two layers. The producer layer interacts with
> + * hardware or firmware to receive and parse faults into struct xe_pagefault,
> + * then forwards them to the consumer. The consumer layer services the faults
> + * (e.g., memory migration, page table updates) and acknowledges the result back
> + * to the producer, which then forwards the results to the hardware or firmware.
> + * The consumer uses a page fault queue sized to absorb all potential faults and
> + * a multi-threaded worker to process them. Multiple producers are supported,
> + * with a single shared consumer.

I am not through with the series yet but xe_pagefault seems to be the
consumer code only, while the producer code will be located elsewhere
such as in xe_guc*. If so, might be good to write it here or in the
functions below.

> + */
> +
> +/**
> + * xe_pagefault_init() - Page fault init
> + * @xe: xe device instance
> + *
> + * Initialize Xe page fault state. Must be done after reading fuses.
> + *
> + * Return: 0 on Success, errno on failure
> + */
> +int xe_pagefault_init(struct xe_device *xe)
> +{
> +	/* TODO - implement */
> +	return 0;
> +}
> +
> +/**
> + * xe_pagefault_reset() - Page fault reset for a GT
> + * @xe: xe device instance
> + * @gt: GT being reset
> + *
> + * Reset the Xe page fault state for a GT; that is, squash any pending faults on
> + * the GT.
> + */
> +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> +{
> +	/* TODO - implement */
> +}
> +
> +/**
> + * xe_pagefault_handler() - Page fault handler
> + * @xe: xe device instance
> + * @pf: Page fault
> + *
> + * Sink the page fault to a queue (i.e., a memory buffer) and queue a worker to
> + * service it. Safe to be called from IRQ or process context. Reclaim safe.
> + *
> + * Return: 0 on success, errno on failure
> + */
> +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
> +{
> +	/* TODO - implement */
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
> new file mode 100644
> index 000000000000..bd0cdf9ed37f
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PAGEFAULT_H_
> +#define _XE_PAGEFAULT_H_
> +
> +struct xe_device;
> +struct xe_gt;
> +struct xe_pagefault;
> +
> +int xe_pagefault_init(struct xe_device *xe);
> +
> +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
> +
> +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> new file mode 100644
> index 000000000000..fcff84f93dd8
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -0,0 +1,125 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PAGEFAULT_TYPES_H_
> +#define _XE_PAGEFAULT_TYPES_H_
> +
> +#include <linux/workqueue.h>
> +
> +struct xe_pagefault;
> +struct xe_gt;
> +
> +/** enum xe_pagefault_access_type - Xe page fault access type */
> +enum xe_pagefault_access_type {
> +	/** @XE_PAGEFAULT_ACCESS_TYPE_READ: Read access type */
> +	XE_PAGEFAULT_ACCESS_TYPE_READ	= 0,
> +	/** @XE_PAGEFAULT_ACCESS_TYPE_WRITE: Write access type */
> +	XE_PAGEFAULT_ACCESS_TYPE_WRITE	= 1,
> +	/** @XE_PAGEFAULT_ACCESS_TYPE_ATOMIC: Atomic access type */
> +	XE_PAGEFAULT_ACCESS_TYPE_ATOMIC	= 2,
> +};
> +
> +/** enum xe_pagefault_type - Xe page fault type */
> +enum xe_pagefault_type {
> +	/** @XE_PAGEFAULT_TYPE_NOT_PRESENT: Not present */
> +	XE_PAGEFAULT_TYPE_NOT_PRESENT		= 0,
> +	/** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Write access violation */
> +	XE_PAGEFAULT_WRITE_ACCESS_VIOLATION	= 1,
> +	/** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Atomic access violation */
> +	XE_PAGEFAULT_ATOMIC_ACCESS_VIOLATION	= 2,
> +};
> +
> +/** struct xe_pagefault_ops - Xe pagefault ops (producer) */
> +struct xe_pagefault_ops {
> +	/**
> +	 * @ack_fault: Ack fault
> +	 * @pf: Page fault
> +	 * @err: Error state of fault
> +	 *
> +	 * Page fault producer receives acknowledgment from the consumer and
> +	 * sends the result to the HW/FW interface.
> +	 */
> +	void (*ack_fault)(struct xe_pagefault *pf, int err);
> +};
> +
> +/**
> + * struct xe_pagefault - Xe page fault
> + *
> + * Generic page fault structure for communication between producer and consumer.
> + * Carefully sized to be 64 bytes.
> + */
> +struct xe_pagefault {
> +	/**
> +	 * @gt: GT of fault
> +	 *
> +	 * XXX: We may want to decouple the GT from individual faults, as it's
> +	 * unclear whether future platforms will always have a GT for all page
> +	 * fault producers. Internally, the GT is used for stats, identifying
> +	 * the appropriate VRAM region, and locating the migration queue.
> +	 * Leaving this as-is for now, but we can revisit later to see if we
> +	 * can convert it to use the Xe device pointer instead.
> +	 */
> +	struct xe_gt *gt;
> +	/**
> +	 * @consumer: State for the software handling the fault. Populated by
> +	 * the producer and may be modified by the consumer to communicate
> +	 * information back to the producer upon fault acknowledgment.
> +	 */
> +	struct {
> +		/** @consumer.page_addr: address of page fault */
> +		u64 page_addr;
> +		/** @consumer.asid: address space ID */
> +		u32 asid;
> +		/** @consumer.access_type: access type */
> +		u8 access_type;

For type safety we shoud use enum xe_pagefault_access_type instead of u8.

> +		/** @consumer.fault_type: fault type */
> +		u8 fault_type;

Same here with enum xe_pagefault_type instead of u8.

> +#define XE_PAGEFAULT_LEVEL_NACK		0xff	/* Producer indicates nack fault */
> +		/** @consumer.fault_level: fault level */
> +		u8 fault_level;
> +		/** @consumer.engine_class: engine class */
> +		u8 engine_class;
> +		/** consumer.reserved: reserved bits for future expansion */
> +		u64 reserved;
> +	} consumer;
> +	/**
> +	 * @producer: State for the producer (i.e., HW/FW interface). Populated
> +	 * by the producer and should not be modified—or even inspected—by the
> +	 * consumer, except for calling operations.
> +	 */
> +	struct {
> +		/** @producer.private: private pointer */
> +		void *private;
> +		/** @producer.ops: operations */
> +		const struct xe_pagefault_ops *ops;
> +#define XE_PAGEFAULT_PRODUCER_MSG_LEN_DW	4
> +		/**
> +		 * producer.msg: page fault message, used by producer in fault

s/producer.msg/@producer.msg/

> +		 * acknowledgement to formulate response to HW/FW interface.

s/acknowledgement/acknowledgment/

> +		 */
> +		u32 msg[XE_PAGEFAULT_PRODUCER_MSG_LEN_DW];

It is clear from patch #6 why it is more convenient to have this in
struct xe_pagefault rather than local to the producer but this seems
to go a bit against the elegant abstraction provided by this new page
fault layers. producer.private could store a struct with {guc,msg}
but that is probably overengineering so up to you.

Francois

> +	} producer;
> +};
> +
> +/** struct xe_pagefault_queue: Xe pagefault queue (consumer) */
> +struct xe_pagefault_queue {
> +	/**
> +	 * @data: Data in queue containing struct xe_pagefault, protected by
> +	 * @lock
> +	 */
> +	void *data;
> +	/** @size: Size of queue in bytes */
> +	u32 size;
> +	/** @head: Head pointer in bytes, moved by producer, protected by @lock */
> +	u32 head;
> +	/** @tail: Tail pointer in bytes, moved by consumer, protected by @lock */
> +	u32 tail;
> +	/** @lock: protects page fault queue */
> +	spinlock_t lock;
> +	/** @worker: to process page faults */
> +	struct work_struct worker;
> +};
> +
> +#endif
> -- 
> 2.34.1
>

next prev parent reply	other threads:[~2025-08-27 15:30 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06  6:22 [PATCH 00/11] Pagefault refactor, fine grained fault locking, threaded prefetch Matthew Brost
2025-08-06  6:22 ` [PATCH 01/11] drm/xe: Stub out new pagefault layer Matthew Brost
2025-08-06 23:01   ` Summers, Stuart
2025-08-06 23:53     ` Matthew Brost
2025-08-07 17:20       ` Summers, Stuart
2025-08-07 18:10         ` Matthew Brost
2025-08-28 20:18           ` Summers, Stuart
2025-08-28 20:20             ` Matthew Brost
2025-08-27 15:29   ` Francois Dugast [this message]
2025-08-27 16:03     ` Matthew Brost
2025-08-27 16:25       ` Francois Dugast
2025-08-27 16:40         ` Matthew Brost
2025-08-27 18:00       ` Matthew Brost
2025-08-28 20:08   ` Summers, Stuart
2025-08-06  6:22 ` [PATCH 02/11] drm/xe: Implement xe_pagefault_init Matthew Brost
2025-08-06 23:08   ` Summers, Stuart
2025-08-06 23:59     ` Matthew Brost
2025-08-07 18:22       ` Summers, Stuart
2025-08-27 16:30   ` Francois Dugast
2025-08-27 16:49     ` Matthew Brost
2025-08-28 20:10   ` Summers, Stuart
2025-08-28 20:14     ` Matthew Brost
2025-08-28 20:19       ` Summers, Stuart
2025-08-06  6:22 ` [PATCH 03/11] drm/xe: Implement xe_pagefault_reset Matthew Brost
2025-08-06 23:16   ` Summers, Stuart
2025-08-07  0:12     ` Matthew Brost
2025-08-07 18:29       ` Summers, Stuart
2025-08-06  6:22 ` [PATCH 04/11] drm/xe: Implement xe_pagefault_handler Matthew Brost
2025-08-28 11:26   ` Francois Dugast
2025-08-28 20:24   ` Summers, Stuart
2025-08-06  6:22 ` [PATCH 05/11] drm/xe: Implement xe_pagefault_queue_work Matthew Brost
2025-08-28 12:29   ` Francois Dugast
2025-08-28 18:39     ` Matthew Brost
2025-08-28 22:04   ` Summers, Stuart
2025-08-29  0:51     ` Matthew Brost
2025-08-06  6:22 ` [PATCH 06/11] drm/xe: Add xe_guc_pagefault layer Matthew Brost
2025-08-28 13:27   ` Francois Dugast
2025-08-28 18:38     ` Matthew Brost
2025-08-28 22:11   ` Summers, Stuart
2025-08-29  0:54     ` Matthew Brost
2025-08-06  6:22 ` [PATCH 07/11] drm/xe: Remove unused GT page fault code Matthew Brost
2025-08-28 19:13   ` Summers, Stuart
2025-08-06  6:22 ` [PATCH 08/11] drm/xe: Fine grained page fault locking Matthew Brost
2025-08-06  6:22 ` [PATCH 09/11] drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock Matthew Brost
2025-08-06  6:22 ` [PATCH 10/11] drm/xe: Thread prefetch of SVM ranges Matthew Brost
2025-08-28 22:55   ` Summers, Stuart
2025-08-29  1:06     ` Matthew Brost
2025-08-06  6:22 ` [PATCH 11/11] drm/xe: Add num_pf_queue modparam Matthew Brost
2025-08-28 22:58   ` Summers, Stuart
2025-08-06  6:36 ` ✗ CI.checkpatch: warning for Pagefault refactor, fine grained fault locking, threaded prefetch Patchwork
2025-08-06  6:36 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aK8kauwpALcoSxwr@fdugast-desk \
    --to=francois.dugast@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.