From: Matthew Brost <matthew.brost@intel.com>
To: Brian Nguyen <brian3.nguyen@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <tejas.upadhyay@intel.com>,
<shuicheng.lin@intel.com>, <stuart.summers@intel.com>
Subject: Re: [PATCH 06/11] drm/xe: Create page reclaim list on unbind
Date: Sat, 22 Nov 2025 11:18:07 -0800 [thread overview]
Message-ID: <aSIMb4EREPix3SDK@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20251118090552.246243-7-brian3.nguyen@intel.com>
On Tue, Nov 18, 2025 at 05:05:47PM +0800, Brian Nguyen wrote:
> Page reclaim list (PRL) is preparation work for the page reclaim feature.
> The PRL is firstly owned by pt_update_ops and all other page reclaim
> operations will point back to this PRL. PRL generates its entries during
> the unbind page walker, updating the PRL.
>
> This PRL is restricted to a 4K page, so 512 page entries at most.
>
> Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
> ---
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/regs/xe_gtt_defs.h | 1 +
> drivers/gpu/drm/xe/xe_page_reclaim.c | 52 ++++++++++++
> drivers/gpu/drm/xe/xe_page_reclaim.h | 49 ++++++++++++
> drivers/gpu/drm/xe/xe_pt.c | 109 ++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_pt_types.h | 5 ++
> 6 files changed, 217 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_page_reclaim.c
> create mode 100644 drivers/gpu/drm/xe/xe_page_reclaim.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index e4b273b025d2..048e6c93271c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -95,6 +95,7 @@ xe-y += xe_bb.o \
> xe_oa.o \
> xe_observation.o \
> xe_pagefault.o \
> + xe_page_reclaim.o \
> xe_pat.o \
> xe_pci.o \
> xe_pcode.o \
> diff --git a/drivers/gpu/drm/xe/regs/xe_gtt_defs.h b/drivers/gpu/drm/xe/regs/xe_gtt_defs.h
> index 4389e5a76f89..4d83461e538b 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gtt_defs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gtt_defs.h
> @@ -9,6 +9,7 @@
> #define XELPG_GGTT_PTE_PAT0 BIT_ULL(52)
> #define XELPG_GGTT_PTE_PAT1 BIT_ULL(53)
>
> +#define XE_PTE_ADDR_MASK GENMASK_ULL(51, 12)
> #define GGTT_PTE_VFID GENMASK_ULL(11, 2)
>
> #define GUC_GGTT_TOP 0xFEE00000
> diff --git a/drivers/gpu/drm/xe/xe_page_reclaim.c b/drivers/gpu/drm/xe/xe_page_reclaim.c
> new file mode 100644
> index 000000000000..a0d15efff58c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_page_reclaim.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/kref.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +
> +#include "xe_page_reclaim.h"
> +
> +#include "regs/xe_gt_regs.h"
> +#include "xe_assert.h"
> +#include "xe_macros.h"
> +
> +/**
> + * xe_page_reclaim_list_invalidate() - Mark a PRL as invalid
> + * @prl: Page reclaim list to reset
> + *
> + * Clears the entries pointer and marks the list as invalid so
> + * future use know PRL is unusable. It is expected that the entries
> + * have already been released.
> + */
> +void xe_page_reclaim_list_invalidate(struct xe_page_reclaim_list *prl)
> +{
> + prl->entries = NULL;
> + prl->num_entries = XE_PAGE_RECLAIM_INVALID_LIST;
> +}
> +
> +/**
> + * xe_page_reclaim_list_alloc_entries() - Allocate page reclaim list entries
> + * @prl: Page reclaim list to allocate entries for
> + *
> + * Allocate one 4K page for the PRL entries, otherwise assign prl->entries to NULL.
> + */
> +int xe_page_reclaim_list_alloc_entries(struct xe_page_reclaim_list *prl)
> +{
> + struct page *page;
> +
> + XE_WARN_ON(prl->entries != NULL);
> + if (prl->entries)
> + return 0;
> +
> + page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + if (page) {
> + prl->entries = page_address(page);
> + prl->num_entries = 0;
> + }
> +
> + return page ? 0 : -ENOMEM;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_page_reclaim.h b/drivers/gpu/drm/xe/xe_page_reclaim.h
> new file mode 100644
> index 000000000000..d066d7d97f79
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_page_reclaim.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PAGE_RECLAIM_H_
> +#define _XE_PAGE_RECLAIM_H_
> +
> +#include <linux/kref.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
> +
> +#define XE_PAGE_RECLAIM_MAX_ENTRIES 512
> +#define XE_PAGE_RECLAIM_LIST_MAX_SIZE SZ_4K
> +
> +struct xe_guc_page_reclaim_entry {
> + u32 valid:1;
> + u32 reclamation_size:6;
> + u32 reserved:5;
> + u32 address_lo:20;
> + u32 address_hi:20;
> + u32 reserved1:12;
This is wire interface with the GuC. Bitfields can based on endianess of
the CPU. I know this is a iGPU feature for now but it could possibly
change in the future, with that, to future proof can the layout of this
be setup via defines / macros?
> +} __packed;
> +
> +struct xe_page_reclaim_list {
> + /** @entries: array of page reclaim entries, page allocated */
> + struct xe_guc_page_reclaim_entry *entries;
> + /** @num_entries: number of entries */
> + int num_entries;
> +#define XE_PAGE_RECLAIM_INVALID_LIST -1
> +};
> +
> +void xe_page_reclaim_list_invalidate(struct xe_page_reclaim_list *prl);
> +int xe_page_reclaim_list_alloc_entries(struct xe_page_reclaim_list *prl);
> +static inline void xe_page_reclaim_entries_get(struct xe_guc_page_reclaim_entry *entries)
> +{
> + if (entries)
> + get_page(virt_to_page(entries));
> +}
> +
> +static inline void xe_page_reclaim_entries_put(struct xe_guc_page_reclaim_entry *entries)
> +{
> + if (entries)
> + put_page(virt_to_page(entries));
> +}
Kernel doc for static inlines.
> +
> +#endif /* _XE_PAGE_RECLAIM_H_ */
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 884127b4d97d..532a047676d4 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -12,6 +12,7 @@
> #include "xe_exec_queue.h"
> #include "xe_gt.h"
> #include "xe_migrate.h"
> +#include "xe_page_reclaim.h"
> #include "xe_pt_types.h"
> #include "xe_pt_walk.h"
> #include "xe_res_cursor.h"
> @@ -1538,6 +1539,9 @@ struct xe_pt_stage_unbind_walk {
> /* Output */
> /* @wupd: Structure to track the page-table updates we're building */
> struct xe_walk_update wupd;
> +
> + /** @prl: Backing pointer to page reclaim list in pt_update_ops */
> + struct xe_page_reclaim_list *prl;
> };
>
> /*
> @@ -1572,6 +1576,69 @@ static bool xe_pt_check_kill(u64 addr, u64 next, unsigned int level,
> return false;
> }
>
> +/* Huge 2MB leaf lives directly in a level-1 table and has no children */
> +static bool is_large_pte(struct xe_pt *pte)
> +{
> + return pte->level == 1 && !pte->base.children;
> +}
> +
> +/* page_size = 2^(reclamation_size + 12) */
> +#define COMPUTE_RECLAIM_ADDRESS_MASK(page_size) \
> +({ \
> + BUILD_BUG_ON(!__builtin_constant_p(page_size)); \
> + ilog2(page_size) - 12; \
s/12/XE_PTE_SHIFT ?
> +})
> +
> +static void generate_reclaim_entry(struct xe_tile *tile,
> + struct xe_page_reclaim_list *prl,
> + u64 pte,
> + struct xe_pt *xe_child)
Nit, xe_pt can be on the same line as 'u64 pte'.
> +{
> + struct xe_guc_page_reclaim_entry *reclaim_entries = prl->entries;
> + u64 phys_addr = pte & XE_PTE_ADDR_MASK;
> + const u64 field_mask = GENMASK_ULL(19, 0);
> + u32 reclamation_size;
Nit, I'd make the last variable declared on the stack for readability.
> + const uint max_entries = XE_PAGE_RECLAIM_MAX_ENTRIES;
> + int num_entries = prl->num_entries;
> +
> + xe_tile_assert(tile, xe_child->level <= MAX_HUGEPTE_LEVEL);
> + xe_tile_assert(tile, reclaim_entries);
> +
> + if (num_entries == XE_PAGE_RECLAIM_INVALID_LIST)
> + return;
> +
> + /* Overflow: mark as invalid through num_entries */
> + if (num_entries >= max_entries) {
> + prl->num_entries = XE_PAGE_RECLAIM_INVALID_LIST;
> + return;
> + }
> +
> + /**
> + * reclamation_size indicates the size of the page to be
> + * invalidated and flushed from non-coherent cache.
> + * Page size is computed as 2^(reclamation_size+12) bytes.
> + * Only valid for these specific levels.
> + */
> +
> + if (xe_child->level == 0 && !(pte & XE_PTE_PS64))
> + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_4K); /* reclamation_size = 0 */
> + else if (xe_child->level == 0)
> + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_64K); /* reclamation_size = 1 */
> + else if (is_large_pte(xe_child))
> + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_2M); /* reclamation_size = 2 */
What happens if we have 1G page? That doesn't seem to be handled.
> + else
> + return;
> +
> + reclaim_entries[num_entries].valid = 1;
> + reclaim_entries[num_entries].reclamation_size =
> + reclamation_size;
> + reclaim_entries[num_entries].address_lo =
> + FIELD_GET(field_mask, phys_addr);
> + reclaim_entries[num_entries].address_hi =
> + FIELD_GET(field_mask, phys_addr >> 20);
As suggested above, use macros/defines here to setup the entry.
> + prl->num_entries++;
> +}
> +
> static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
> unsigned int level, u64 addr, u64 next,
> struct xe_ptw **child,
> @@ -1579,10 +1646,27 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
> struct xe_pt_walk *walk)
> {
> struct xe_pt *xe_child = container_of(*child, typeof(*xe_child), base);
> + struct xe_pt_stage_unbind_walk *xe_walk =
> + container_of(walk, typeof(*xe_walk), base);
> + struct xe_device *xe = tile_to_xe(xe_walk->tile);
>
> XE_WARN_ON(!*child);
> XE_WARN_ON(!level);
>
> + /* 4K and 64K Pages are level 0, large pte needs additional handling. */
> + if (xe_walk->prl && (xe_child->level == 0 || is_large_pte(xe_child))) {
And also here? 1G pages are unhandled? Please explain.
> + struct iosys_map *leaf_map = &xe_child->bo->vmap;
> + pgoff_t first = xe_pt_offset(addr, 0, walk);
> + pgoff_t count = xe_pt_num_entries(addr, next, 0, walk);
> +
> + for (pgoff_t i = 0; i < count; i++) {
> + u64 pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64);
> +
> + generate_reclaim_entry(xe_walk->tile, xe_walk->prl,
> + pte, xe_child);
> + }
> + }
> +
> xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk);
>
> return 0;
> @@ -1654,6 +1738,8 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
> {
> u64 start = range ? xe_svm_range_start(range) : xe_vma_start(vma);
> u64 end = range ? xe_svm_range_end(range) : xe_vma_end(vma);
> + struct xe_vm_pgtable_update_op *pt_update_op =
> + container_of(entries, struct xe_vm_pgtable_update_op, entries[0]);
> struct xe_pt_stage_unbind_walk xe_walk = {
> .base = {
> .ops = &xe_pt_stage_unbind_ops,
> @@ -1665,6 +1751,7 @@ static unsigned int xe_pt_stage_unbind(struct xe_tile *tile,
> .modified_start = start,
> .modified_end = end,
> .wupd.entries = entries,
> + .prl = pt_update_op->prl,
> };
> struct xe_pt *pt = vm->pt_root[tile->id];
>
> @@ -1897,6 +1984,7 @@ static int unbind_op_prepare(struct xe_tile *tile,
> struct xe_vm_pgtable_update_ops *pt_update_ops,
> struct xe_vma *vma)
> {
> + struct xe_device *xe = tile_to_xe(tile);
> u32 current_op = pt_update_ops->current_op;
> struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op];
> int err;
> @@ -1914,6 +2002,13 @@ static int unbind_op_prepare(struct xe_tile *tile,
> pt_op->vma = vma;
> pt_op->bind = false;
> pt_op->rebind = false;
> + /* Maintain one PRL located in pt_update_ops that all others in unbind op reference */
> + if (xe->info.has_page_reclaim_hw_assist && !pt_update_ops->prl.entries) {
> + err = xe_page_reclaim_list_alloc_entries(&pt_update_ops->prl);
> + if (err < 0)
> + xe_page_reclaim_list_invalidate(&pt_update_ops->prl);
I don't think you need to call xe_page_reclaim_list_invalidate, right?
If xe_page_reclaim_list_alloc_entries fails the prl should be in the
init state.
> + }
> + pt_op->prl = (pt_update_ops->prl.entries) ? &pt_update_ops->prl : NULL;
>
> err = vma_reserve_fences(tile_to_xe(tile), vma);
> if (err)
> @@ -1921,6 +2016,13 @@ static int unbind_op_prepare(struct xe_tile *tile,
>
> pt_op->num_entries = xe_pt_stage_unbind(tile, xe_vma_vm(vma),
> vma, NULL, pt_op->entries);
> + /* Free PRL if list declared as invalid */
> + if (pt_update_ops->prl.entries &&
> + pt_update_ops->prl.num_entries == XE_PAGE_RECLAIM_INVALID_LIST) {
> + xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
> + pt_op->prl = NULL;
> + pt_update_ops->prl.entries = NULL;
Call xe_page_reclaim_list_invalidate for clarity?
> + }
>
> xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries,
> pt_op->num_entries, false);
> @@ -1979,6 +2081,7 @@ static int unbind_range_prepare(struct xe_vm *vm,
> pt_op->vma = XE_INVALID_VMA;
> pt_op->bind = false;
> pt_op->rebind = false;
> + pt_op->prl = NULL;
>
> pt_op->num_entries = xe_pt_stage_unbind(tile, vm, NULL, range,
> pt_op->entries);
> @@ -2096,6 +2199,7 @@ xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops)
> init_llist_head(&pt_update_ops->deferred);
> pt_update_ops->start = ~0x0ull;
> pt_update_ops->last = 0x0ull;
> + xe_page_reclaim_list_invalidate(&pt_update_ops->prl);
Can we introduce a function called xe_page_reclaim_list_init for
clarity? It might do the same thing as xe_page_reclaim_list_invalidate
but it would make this a little more clear. Likewise later in the series
when a job is created, you can call xe_page_reclaim_list_init there too.
> }
>
> /**
> @@ -2518,6 +2622,11 @@ void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops)
> &vops->pt_update_ops[tile->id];
> int i;
>
> + if (pt_update_ops->prl.entries) {
> + xe_page_reclaim_entries_put(pt_update_ops->prl.entries);
> + xe_page_reclaim_list_invalidate(&pt_update_ops->prl);
> + }
> +
> lockdep_assert_held(&vops->vm->lock);
> xe_vm_assert_held(vops->vm);
>
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> index 881f01e14db8..26e5295f118e 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -8,6 +8,7 @@
>
> #include <linux/types.h>
>
> +#include "xe_page_reclaim.h"
> #include "xe_pt_walk.h"
>
> struct xe_bo;
> @@ -85,6 +86,8 @@ struct xe_vm_pgtable_update_op {
> bool bind;
> /** @rebind: is a rebind */
> bool rebind;
> + /** @prl: Backing pointer to page reclaim list of pt_update_ops */
> + struct xe_page_reclaim_list *prl;
Can you move this above the bools in the layout of
xe_vm_pgtable_update_op, likely just below "struct xe_vma".
> };
>
> /** struct xe_vm_pgtable_update_ops: page table update operations */
> @@ -119,6 +122,8 @@ struct xe_vm_pgtable_update_ops {
> * slots are idle.
> */
> bool wait_vm_kernel;
> + /** @prl: embedded page reclaim list */
> + struct xe_page_reclaim_list prl;
Same thing here, move just below "struct xe_exec_queue".
Matt
> };
>
> #endif
> --
> 2.51.2
>
next prev parent reply other threads:[~2025-11-22 19:18 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 9:05 [PATCH 00/11] Page Reclamation Support for Xe3p Platforms Brian Nguyen
2025-11-18 9:05 ` [PATCH 01/11] [DO, NOT, REVIEW] drm/xe: Do not forward invalid TLB invalidation seqnos to upper layers Brian Nguyen
2025-11-18 9:05 ` [PATCH 02/11] drm/xe: Reset tlb fence timeout on invalid seqno received Brian Nguyen
2025-11-21 17:23 ` Lin, Shuicheng
2025-11-22 1:53 ` Nguyen, Brian3
2025-11-22 18:25 ` Matthew Brost
2025-11-25 11:01 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 03/11] drm/xe/xe_tlb_inval: Modify fence interface to support PPC flush Brian Nguyen
2025-11-21 18:02 ` Lin, Shuicheng
2025-11-22 1:54 ` Nguyen, Brian3
2025-11-22 19:32 ` Matthew Brost
2025-11-25 11:07 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 04/11] drm/xe: Add page reclamation info to device info Brian Nguyen
2025-11-21 18:15 ` Lin, Shuicheng
2025-11-22 18:31 ` Matthew Brost
2025-11-18 9:05 ` [PATCH 05/11] drm/xe/guc: Add page reclamation interface to GuC Brian Nguyen
2025-11-21 18:32 ` Lin, Shuicheng
2025-11-22 1:56 ` Nguyen, Brian3
2025-11-22 18:39 ` Matthew Brost
2025-11-25 11:13 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 06/11] drm/xe: Create page reclaim list on unbind Brian Nguyen
2025-11-21 21:29 ` Lin, Shuicheng
2025-11-22 1:57 ` Nguyen, Brian3
2025-11-22 19:18 ` Matthew Brost [this message]
2025-11-25 11:18 ` Nguyen, Brian3
2025-11-25 18:34 ` Matthew Brost
2025-11-25 19:01 ` Nguyen, Brian3
2025-11-25 19:07 ` Matthew Brost
2025-11-25 19:46 ` Nguyen, Brian3
2025-11-25 22:35 ` Matthew Brost
2025-11-26 2:33 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 07/11] drm/xe: Suballocate BO for page reclaim Brian Nguyen
2025-11-22 19:42 ` Matthew Brost
2025-11-25 11:20 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 08/11] drm/xe: Prep page reclaim in tlb inval job Brian Nguyen
2025-11-22 13:52 ` Michal Wajdeczko
2025-11-25 11:20 ` Nguyen, Brian3
2025-11-18 9:05 ` [PATCH 09/11] drm/xe: Append page reclamation action to tlb inval Brian Nguyen
2025-11-18 9:05 ` [PATCH 10/11] drm/xe: Optimize flushing of L2$ by skipping unnecessary page reclaim Brian Nguyen
2025-11-24 12:29 ` Matthew Auld
2025-11-25 6:12 ` Nguyen, Brian3
2025-11-25 11:48 ` Upadhyay, Tejas
2025-11-25 13:05 ` Upadhyay, Tejas
2025-11-18 9:05 ` [PATCH 11/11] drm/xe: Add debugfs support for page reclamation Brian Nguyen
2025-11-21 22:32 ` Lin, Shuicheng
2025-11-22 1:57 ` Nguyen, Brian3
2025-11-22 14:18 ` Michal Wajdeczko
2025-11-25 11:21 ` Nguyen, Brian3
2025-11-18 9:52 ` ✗ CI.checkpatch: warning for Page Reclamation Support for Xe3p Platforms Patchwork
2025-11-18 9:53 ` ✓ CI.KUnit: success " Patchwork
2025-11-18 13:02 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSIMb4EREPix3SDK@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=brian3.nguyen@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=shuicheng.lin@intel.com \
--cc=stuart.summers@intel.com \
--cc=tejas.upadhyay@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.