From: Matthew Brost <matthew.brost@intel.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <stuart.summers@intel.com>
Subject: Re: [PATCH v3 1/7] drm/xe: Stub out new pagefault layer
Date: Fri, 31 Oct 2025 09:41:07 -0700 [thread overview]
Message-ID: <aQTmo3yHnPEmfxHl@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <aQS6hgtqWvaIglr_@fdugast-desk>
On Fri, Oct 31, 2025 at 02:33:02PM +0100, Francois Dugast wrote:
> On Mon, Oct 27, 2025 at 08:58:37PM -0700, Matthew Brost wrote:
> > Stub out the new page fault layer and add kernel documentation. This is
> > intended as a replacement for the GT page fault layer, enabling multiple
> > producers to hook into a shared page fault consumer interface.
>
> Nit: this patch and others in the series have been modified since the
> previous version, for future revisions please include a brief changelog
> in the commit message to help review, thanks.
>
Yea, I had change logs but somehow lost them...
Will try to bring them back in v4.
Matt
> Francois
>
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile | 1 +
> > drivers/gpu/drm/xe/xe_pagefault.c | 65 +++++++++++
> > drivers/gpu/drm/xe/xe_pagefault.h | 19 ++++
> > drivers/gpu/drm/xe/xe_pagefault_types.h | 136 ++++++++++++++++++++++++
> > 4 files changed, 221 insertions(+)
> > create mode 100644 drivers/gpu/drm/xe/xe_pagefault.c
> > create mode 100644 drivers/gpu/drm/xe/xe_pagefault.h
> > create mode 100644 drivers/gpu/drm/xe/xe_pagefault_types.h
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 82c6b3d29676..b35021e5b9eb 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -94,6 +94,7 @@ xe-y += xe_bb.o \
> > xe_nvm.o \
> > xe_oa.o \
> > xe_observation.o \
> > + xe_pagefault.o \
> > xe_pat.o \
> > xe_pci.o \
> > xe_pcode.o \
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> > new file mode 100644
> > index 000000000000..d509a80cb1f3
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> > @@ -0,0 +1,65 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include "xe_pagefault.h"
> > +#include "xe_pagefault_types.h"
> > +
> > +/**
> > + * DOC: Xe page faults
> > + *
> > + * Xe page faults are handled in two layers. The producer layer interacts with
> > + * hardware or firmware to receive and parse faults into struct xe_pagefault,
> > + * then forwards them to the consumer. The consumer layer services the faults
> > + * (e.g., memory migration, page table updates) and acknowledges the result back
> > + * to the producer, which then forwards the results to the hardware or firmware.
> > + * The consumer uses a page fault queue sized to absorb all potential faults and
> > + * a multi-threaded worker to process them. Multiple producers are supported,
> > + * with a single shared consumer.
> > + *
> > + * xe_pagefault.c implements the consumer layer.
> > + */
> > +
> > +/**
> > + * xe_pagefault_init() - Page fault init
> > + * @xe: xe device instance
> > + *
> > + * Initialize Xe page fault state. Must be done after reading fuses.
> > + *
> > + * Return: 0 on Success, errno on failure
> > + */
> > +int xe_pagefault_init(struct xe_device *xe)
> > +{
> > + /* TODO - implement */
> > + return 0;
> > +}
> > +
> > +/**
> > + * xe_pagefault_reset() - Page fault reset for a GT
> > + * @xe: xe device instance
> > + * @gt: GT being reset
> > + *
> > + * Reset the Xe page fault state for a GT; that is, squash any pending faults on
> > + * the GT.
> > + */
> > +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> > +{
> > + /* TODO - implement */
> > +}
> > +
> > +/**
> > + * xe_pagefault_handler() - Page fault handler
> > + * @xe: xe device instance
> > + * @pf: Page fault
> > + *
> > + * Sink the page fault to a queue (i.e., a memory buffer) and queue a worker to
> > + * service it. Safe to be called from IRQ or process context. Reclaim safe.
> > + *
> > + * Return: 0 on success, errno on failure
> > + */
> > +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
> > +{
> > + /* TODO - implement */
> > + return 0;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
> > new file mode 100644
> > index 000000000000..bd0cdf9ed37f
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> > @@ -0,0 +1,19 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_PAGEFAULT_H_
> > +#define _XE_PAGEFAULT_H_
> > +
> > +struct xe_device;
> > +struct xe_gt;
> > +struct xe_pagefault;
> > +
> > +int xe_pagefault_init(struct xe_device *xe);
> > +
> > +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
> > +
> > +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > new file mode 100644
> > index 000000000000..d3b516407d60
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > @@ -0,0 +1,136 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_PAGEFAULT_TYPES_H_
> > +#define _XE_PAGEFAULT_TYPES_H_
> > +
> > +#include <linux/workqueue.h>
> > +
> > +struct xe_gt;
> > +struct xe_pagefault;
> > +
> > +/** enum xe_pagefault_access_type - Xe page fault access type */
> > +enum xe_pagefault_access_type {
> > + /** @XE_PAGEFAULT_ACCESS_TYPE_READ: Read access type */
> > + XE_PAGEFAULT_ACCESS_TYPE_READ = 0,
> > + /** @XE_PAGEFAULT_ACCESS_TYPE_WRITE: Write access type */
> > + XE_PAGEFAULT_ACCESS_TYPE_WRITE = 1,
> > + /** @XE_PAGEFAULT_ACCESS_TYPE_ATOMIC: Atomic access type */
> > + XE_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2,
> > +};
> > +
> > +/** enum xe_pagefault_type - Xe page fault type */
> > +enum xe_pagefault_type {
> > + /** @XE_PAGEFAULT_TYPE_NOT_PRESENT: Not present */
> > + XE_PAGEFAULT_TYPE_NOT_PRESENT = 0,
> > + /** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Write access violation */
> > + XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION = 1,
> > + /** @XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION: Atomic access violation */
> > + XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION = 2,
> > +};
> > +
> > +/** struct xe_pagefault_ops - Xe pagefault ops (producer) */
> > +struct xe_pagefault_ops {
> > + /**
> > + * @ack_fault: Ack fault
> > + * @pf: Page fault
> > + * @err: Error state of fault
> > + *
> > + * Page fault producer receives acknowledgment from the consumer and
> > + * sends the result to the HW/FW interface.
> > + */
> > + void (*ack_fault)(struct xe_pagefault *pf, int err);
> > +};
> > +
> > +/**
> > + * struct xe_pagefault - Xe page fault
> > + *
> > + * Generic page fault structure for communication between producer and consumer.
> > + * Carefully sized to be 64 bytes. Upon a device page fault, the producer
> > + * populates this structure, and the consumer copies it into the page-fault
> > + * queue for deferred handling.
> > + */
> > +struct xe_pagefault {
> > + /**
> > + * @gt: GT of fault
> > + */
> > + struct xe_gt *gt;
> > + /**
> > + * @consumer: State for the software handling the fault. Populated by
> > + * the producer and may be modified by the consumer to communicate
> > + * information back to the producer upon fault acknowledgment.
> > + */
> > + struct {
> > + /** @consumer.page_addr: address of page fault */
> > + u64 page_addr;
> > + /** @consumer.asid: address space ID */
> > + u32 asid;
> > + /**
> > + * @consumer.access_type: access type, u8 rather than enum to
> > + * keep size compact
> > + */
> > + u8 access_type;
> > + /**
> > + * @consumer.fault_type: fault type, u8 rather than enum to
> > + * keep size compact
> > + */
> > + u8 fault_type;
> > +#define XE_PAGEFAULT_LEVEL_NACK 0xff /* Producer indicates nack fault */
> > + /** @consumer.fault_level: fault level */
> > + u8 fault_level;
> > + /** @consumer.engine_class: engine class */
> > + u8 engine_class;
> > + /** @consumer.engine_instance: engine instance */
> > + u8 engine_instance;
> > + /** consumer.reserved: reserved bits for future expansion */
> > + u8 reserved[7];
> > + } consumer;
> > + /**
> > + * @producer: State for the producer (i.e., HW/FW interface). Populated
> > + * by the producer and should not be modified—or even inspected—by the
> > + * consumer, except for calling operations.
> > + */
> > + struct {
> > + /** @producer.private: private pointer */
> > + void *private;
> > + /** @producer.ops: operations */
> > + const struct xe_pagefault_ops *ops;
> > +#define XE_PAGEFAULT_PRODUCER_MSG_LEN_DW 4
> > + /**
> > + * @producer.msg: page fault message, used by producer in fault
> > + * acknowledgment to formulate response to HW/FW interface.
> > + * Included in the page-fault message because the producer
> > + * typically receives the fault in a context where memory cannot
> > + * be allocated (e.g., atomic context or the reclaim path).
> > + */
> > + u32 msg[XE_PAGEFAULT_PRODUCER_MSG_LEN_DW];
> > + } producer;
> > +};
> > +
> > +/**
> > + * struct xe_pagefault_queue: Xe pagefault queue (consumer)
> > + *
> > + * Used to capture all device page faults for deferred processing. Size this
> > + * queue to absorb the device’s worst-case number of outstanding faults.
> > + */
> > +struct xe_pagefault_queue {
> > + /**
> > + * @data: Data in queue containing struct xe_pagefault, protected by
> > + * @lock
> > + */
> > + void *data;
> > + /** @size: Size of queue in bytes */
> > + u32 size;
> > + /** @head: Head pointer in bytes, moved by producer, protected by @lock */
> > + u32 head;
> > + /** @tail: Tail pointer in bytes, moved by consumer, protected by @lock */
> > + u32 tail;
> > + /** @lock: protects page fault queue */
> > + spinlock_t lock;
> > + /** @worker: to process page faults */
> > + struct work_struct worker;
> > +};
> > +
> > +#endif
> > --
> > 2.34.1
> >
next prev parent reply other threads:[~2025-10-31 16:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 3:58 [PATCH v3 0/7] Pagefault refactor Matthew Brost
2025-10-28 3:58 ` [PATCH v3 1/7] drm/xe: Stub out new pagefault layer Matthew Brost
2025-10-31 13:33 ` Francois Dugast
2025-10-31 16:41 ` Matthew Brost [this message]
2025-10-28 3:58 ` [PATCH v3 2/7] drm/xe: Implement xe_pagefault_init Matthew Brost
2025-10-31 13:40 ` Francois Dugast
2025-10-31 16:40 ` Matthew Brost
2025-10-28 3:58 ` [PATCH v3 3/7] drm/xe: Implement xe_pagefault_reset Matthew Brost
2025-10-28 3:58 ` [PATCH v3 4/7] drm/xe: Implement xe_pagefault_handler Matthew Brost
2025-10-28 3:58 ` [PATCH v3 5/7] drm/xe: Implement xe_pagefault_queue_work Matthew Brost
2025-10-28 3:58 ` [PATCH v3 6/7] drm/xe: Add xe_guc_pagefault layer Matthew Brost
2025-10-28 3:58 ` [PATCH v3 7/7] drm/xe: Remove unused GT page fault code Matthew Brost
2025-10-28 4:05 ` ✗ CI.checkpatch: warning for Pagefault refactor (rev2) Patchwork
2025-10-28 4:06 ` ✓ CI.KUnit: success " Patchwork
2025-10-28 4:46 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-28 9:30 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQTmo3yHnPEmfxHl@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox