Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Summers, Stuart" <stuart.summers@intel.com>
To: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Brost,  Matthew" <matthew.brost@intel.com>
Cc: "Dugast, Francois" <francois.dugast@intel.com>
Subject: Re: [PATCH v2 1/7] drm/xe: Stub out new pagefault layer
Date: Fri, 24 Oct 2025 18:35:12 +0000	[thread overview]
Message-ID: <4ef590d14eccd4d962b3ab70338c597f05aec645.camel@intel.com> (raw)
In-Reply-To: <20251024180414.1379284-2-matthew.brost@intel.com>

On Fri, 2025-10-24 at 11:04 -0700, Matthew Brost wrote:
> Stub out the new page fault layer and add kernel documentation. This
> is
> intended as a replacement for the GT page fault layer, enabling
> multiple
> producers to hook into a shared page fault consumer interface.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> ---
> v2:
>  - Fix kernel doc typo (checkpatch)
>  - Remove comment around GT (Stuart)
>  - Add explaination around reclaim (Francois)
>  - Add comment around u8 vs enum (Francois)
>  - Include engine instance (Stuart)
> ---
>  drivers/gpu/drm/xe/Makefile             |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c       |  65 +++++++++++
>  drivers/gpu/drm/xe/xe_pagefault.h       |  19 ++++
>  drivers/gpu/drm/xe/xe_pagefault_types.h | 136
> ++++++++++++++++++++++++
>  4 files changed, 221 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.c
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.h
>  create mode 100644 drivers/gpu/drm/xe/xe_pagefault_types.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile
> b/drivers/gpu/drm/xe/Makefile
> index 82c6b3d29676..b35021e5b9eb 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -94,6 +94,7 @@ xe-y += xe_bb.o \
>         xe_nvm.o \
>         xe_oa.o \
>         xe_observation.o \
> +       xe_pagefault.o \
>         xe_pat.o \
>         xe_pci.o \
>         xe_pcode.o \
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> b/drivers/gpu/drm/xe/xe_pagefault.c
> new file mode 100644
> index 000000000000..d509a80cb1f3
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "xe_pagefault.h"
> +#include "xe_pagefault_types.h"
> +
> +/**
> + * DOC: Xe page faults
> + *
> + * Xe page faults are handled in two layers. The producer layer
> interacts with
> + * hardware or firmware to receive and parse faults into struct
> xe_pagefault,
> + * then forwards them to the consumer. The consumer layer services
> the faults
> + * (e.g., memory migration, page table updates) and acknowledges the
> result back
> + * to the producer, which then forwards the results to the hardware
> or firmware.
> + * The consumer uses a page fault queue sized to absorb all
> potential faults and
> + * a multi-threaded worker to process them. Multiple producers are
> supported,
> + * with a single shared consumer.
> + *
> + * xe_pagefault.c implements the consumer layer.
> + */
> +
> +/**
> + * xe_pagefault_init() - Page fault init
> + * @xe: xe device instance
> + *
> + * Initialize Xe page fault state. Must be done after reading fuses.
> + *
> + * Return: 0 on Success, errno on failure
> + */
> +int xe_pagefault_init(struct xe_device *xe)
> +{
> +       /* TODO - implement */
> +       return 0;
> +}
> +
> +/**
> + * xe_pagefault_reset() - Page fault reset for a GT
> + * @xe: xe device instance
> + * @gt: GT being reset
> + *
> + * Reset the Xe page fault state for a GT; that is, squash any
> pending faults on
> + * the GT.
> + */
> +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> +{
> +       /* TODO - implement */
> +}
> +
> +/**
> + * xe_pagefault_handler() - Page fault handler
> + * @xe: xe device instance
> + * @pf: Page fault
> + *
> + * Sink the page fault to a queue (i.e., a memory buffer) and queue
> a worker to
> + * service it. Safe to be called from IRQ or process context.
> Reclaim safe.
> + *
> + * Return: 0 on success, errno on failure
> + */
> +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault
> *pf)
> +{
> +       /* TODO - implement */
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.h
> b/drivers/gpu/drm/xe/xe_pagefault.h
> new file mode 100644
> index 000000000000..bd0cdf9ed37f
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PAGEFAULT_H_
> +#define _XE_PAGEFAULT_H_
> +
> +struct xe_device;
> +struct xe_gt;
> +struct xe_pagefault;
> +
> +int xe_pagefault_init(struct xe_device *xe);
> +
> +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
> +
> +int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault
> *pf);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h
> b/drivers/gpu/drm/xe/xe_pagefault_types.h
> new file mode 100644
> index 000000000000..32bc3348b314
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PAGEFAULT_TYPES_H_
> +#define _XE_PAGEFAULT_TYPES_H_
> +
> +#include <linux/workqueue.h>
> +
> +struct xe_gt;
> +struct xe_pagefault;
> +
> +/** enum xe_pagefault_access_type - Xe page fault access type */
> +enum xe_pagefault_access_type {
> +       /** @XE_PAGEFAULT_ACCESS_TYPE_READ: Read access type */
> +       XE_PAGEFAULT_ACCESS_TYPE_READ   = 0,
> +       /** @XE_PAGEFAULT_ACCESS_TYPE_WRITE: Write access type */
> +       XE_PAGEFAULT_ACCESS_TYPE_WRITE  = 1,
> +       /** @XE_PAGEFAULT_ACCESS_TYPE_ATOMIC: Atomic access type */
> +       XE_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2,
> +};
> +
> +/** enum xe_pagefault_type - Xe page fault type */
> +enum xe_pagefault_type {
> +       /** @XE_PAGEFAULT_TYPE_NOT_PRESENT: Not present */
> +       XE_PAGEFAULT_TYPE_NOT_PRESENT                   = 0,
> +       /** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Write access
> violation */
> +       XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION        = 1,
> +       /** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Atomic access

Comment here doesn't match the variable name, should be
"XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION".

Thanks,
Stuart

> violation */
> +       XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION       = 2,
> +};
> +
> +/** struct xe_pagefault_ops - Xe pagefault ops (producer) */
> +struct xe_pagefault_ops {
> +       /**
> +        * @ack_fault: Ack fault
> +        * @pf: Page fault
> +        * @err: Error state of fault
> +        *
> +        * Page fault producer receives acknowledgment from the
> consumer and
> +        * sends the result to the HW/FW interface.
> +        */
> +       void (*ack_fault)(struct xe_pagefault *pf, int err);
> +};
> +
> +/**
> + * struct xe_pagefault - Xe page fault
> + *
> + * Generic page fault structure for communication between producer
> and consumer.
> + * Carefully sized to be 64 bytes. Upon a device page fault, the
> producer
> + * populates this structure, and the consumer copies it into the
> page-fault
> + * queue for deferred handling.
> + */
> +struct xe_pagefault {
> +       /**
> +        * @gt: GT of fault
> +        */
> +       struct xe_gt *gt;
> +       /**
> +        * @consumer: State for the software handling the fault.
> Populated by
> +        * the producer and may be modified by the consumer to
> communicate
> +        * information back to the producer upon fault
> acknowledgment.
> +        */
> +       struct {
> +               /** @consumer.page_addr: address of page fault */
> +               u64 page_addr;
> +               /** @consumer.asid: address space ID */
> +               u32 asid;
> +               /**
> +                * @consumer.access_type: access type, u8 rather than
> enum to
> +                * keep size compact
> +                */
> +               u8 access_type;
> +               /**
> +                * @consumer.fault_type: fault type, u8 rather than
> enum to
> +                * keep size compact
> +                */
> +               u8 fault_type;
> +#define XE_PAGEFAULT_LEVEL_NACK                0xff    /* Producer
> indicates nack fault */
> +               /** @consumer.fault_level: fault level */
> +               u8 fault_level;
> +               /** @consumer.engine_class: engine class */
> +               u8 engine_class;
> +               /** @consumer.engine_instance: engine instance */
> +               u8 engine_instance;
> +               /** consumer.reserved: reserved bits for future
> expansion */
> +               u8 reserved[7];
> +       } consumer;
> +       /**
> +        * @producer: State for the producer (i.e., HW/FW interface).
> Populated
> +        * by the producer and should not be modified—or even
> inspected—by the
> +        * consumer, except for calling operations.
> +        */
> +       struct {
> +               /** @producer.private: private pointer */
> +               void *private;
> +               /** @producer.ops: operations */
> +               const struct xe_pagefault_ops *ops;
> +#define XE_PAGEFAULT_PRODUCER_MSG_LEN_DW       4
> +               /**
> +                * @producer.msg: page fault message, used by
> producer in fault
> +                * acknowledgment to formulate response to HW/FW
> interface.
> +                * Included in the page-fault message because the
> producer
> +                * typically receives the fault in a context where
> memory cannot
> +                * be allocated (e.g., atomic context or the reclaim
> path).
> +                */
> +               u32 msg[XE_PAGEFAULT_PRODUCER_MSG_LEN_DW];
> +       } producer;
> +};
> +
> +/**
> + * struct xe_pagefault_queue: Xe pagefault queue (consumer)
> + *
> + * Used to capture all device page faults for deferred processing.
> Size this
> + * queue to absorb the device’s worst-case number of outstanding
> faults.
> + */
> +struct xe_pagefault_queue {
> +       /**
> +        * @data: Data in queue containing struct xe_pagefault,
> protected by
> +        * @lock
> +        */
> +       void *data;
> +       /** @size: Size of queue in bytes */
> +       u32 size;
> +       /** @head: Head pointer in bytes, moved by producer,
> protected by @lock */
> +       u32 head;
> +       /** @tail: Tail pointer in bytes, moved by consumer,
> protected by @lock */
> +       u32 tail;
> +       /** @lock: protects page fault queue */
> +       spinlock_t lock;
> +       /** @worker: to process page faults */
> +       struct work_struct worker;
> +};
> +
> +#endif


  reply	other threads:[~2025-10-24 18:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 18:04 [PATCH v2 0/7] Pagefault refactor Matthew Brost
2025-10-24 18:04 ` [PATCH v2 1/7] drm/xe: Stub out new pagefault layer Matthew Brost
2025-10-24 18:35   ` Summers, Stuart [this message]
2025-10-24 18:04 ` [PATCH v2 2/7] drm/xe: Implement xe_pagefault_init Matthew Brost
2025-10-24 18:55   ` Summers, Stuart
2025-10-24 19:18     ` Matthew Brost
2025-10-24 21:48       ` Matthew Brost
2025-10-24 18:04 ` [PATCH v2 3/7] drm/xe: Implement xe_pagefault_reset Matthew Brost
2025-10-24 18:04 ` [PATCH v2 4/7] drm/xe: Implement xe_pagefault_handler Matthew Brost
2025-10-24 18:04 ` [PATCH v2 5/7] drm/xe: Implement xe_pagefault_queue_work Matthew Brost
2025-10-24 18:04 ` [PATCH v2 6/7] drm/xe: Add xe_guc_pagefault layer Matthew Brost
2025-10-24 18:04 ` [PATCH v2 7/7] drm/xe: Remove unused GT page fault code Matthew Brost
2025-10-24 18:10 ` ✗ CI.checkpatch: warning for Pagefault refactor Patchwork
2025-10-24 18:11 ` ✓ CI.KUnit: success " Patchwork
2025-10-24 19:04 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-25  7:17 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ef590d14eccd4d962b3ab70338c597f05aec645.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox