From: Matthew Brost <matthew.brost@intel.com>
To: Brian Nguyen <brian3.nguyen@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages
Date: Tue, 6 Jan 2026 08:41:56 -0800 [thread overview]
Message-ID: <aV07VAFD8K8YaGlV@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260105233351.3753716-9-brian3.nguyen@intel.com>
On Tue, Jan 06, 2026 at 07:33:55AM +0800, Brian Nguyen wrote:
> For 64KB pages, XE_PTE_PS64 is defined for all consecutive 4KB pages and
> are all considered leaf nodes, so existing check was falsely adding
> multiple 64KB pages to PRL.
>
> For larger entries such as 2MB PDE, the check for pte->base.children is
> insufficient since this array is always defined for page directory,
> level 1 and above, so perform a check on the entry itself pointing to
> the correct page.
>
> For unmaps, if the range is properly covered by the page full directory,
> page walker may finish without walking to the leaf nodes.
>
> For example, a 1G range can be fully covered by 512 2MB pages if
> alignment allows. In this case, the page walker will walk until
> it reaches this corresponding directory which can correlate to the 1GB
> range. Page walker will simply complete its walk and the individual 2MB
> PDE leaves won't get accessed.
>
> In this case, PRL invalidation is also required, so add a check to see if
> pt entry cover the entire range since the walker will complete the walk.
>
> There are possible race conditions that will cause driver to read a pte
> that hasn't been written to yet. The 2 scenarios are:
> - Another issued TLB invalidation such as from userptr or MMU notifier.
> - Dependencies on original bind that has yet to be executed with an
> unbind on that job.
>
> The expectation is these race conditions are likely rare cases so simply
> perform a fallback to full PPC flush invalidation instead.
>
> v2:
> - Reword commit and updated zero-pte handling. (Matthew B)
>
> Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind")
> Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_pt.c | 50 +++++++++++++++++++++++++++-----------
> 1 file changed, 36 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2752a5a48a97..668a981696f9 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1576,12 +1576,6 @@ static bool xe_pt_check_kill(u64 addr, u64 next, unsigned int level,
> return false;
> }
>
> -/* Huge 2MB leaf lives directly in a level-1 table and has no children */
> -static bool is_2m_pte(struct xe_pt *pte)
> -{
> - return pte->level == 1 && !pte->base.children;
> -}
> -
> /* page_size = 2^(reclamation_size + XE_PTE_SHIFT) */
> #define COMPUTE_RECLAIM_ADDRESS_MASK(page_size) \
> ({ \
> @@ -1594,7 +1588,8 @@ static int generate_reclaim_entry(struct xe_tile *tile,
> u64 pte, struct xe_pt *xe_child)
> {
> struct xe_guc_page_reclaim_entry *reclaim_entries = prl->entries;
> - u64 phys_page = (pte & XE_PTE_ADDR_MASK) >> XE_PTE_SHIFT;
> + u64 phys_addr = pte & XE_PTE_ADDR_MASK;
> + u64 phys_page = phys_addr >> XE_PTE_SHIFT;
> int num_entries = prl->num_entries;
> u32 reclamation_size;
>
> @@ -1613,10 +1608,13 @@ static int generate_reclaim_entry(struct xe_tile *tile,
> */
> if (xe_child->level == 0 && !(pte & XE_PTE_PS64)) {
> reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_4K); /* reclamation_size = 0 */
> + xe_tile_assert(tile, phys_addr % SZ_4K == 0);
> } else if (xe_child->level == 0) {
> reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_64K); /* reclamation_size = 4 */
> - } else if (is_2m_pte(xe_child)) {
> + xe_tile_assert(tile, phys_addr % SZ_64K == 0);
> + } else if (xe_child->level == 1 && pte & XE_PDE_PS_2M) {
> reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_2M); /* reclamation_size = 9 */
> + xe_tile_assert(tile, phys_addr % SZ_2M == 0);
> } else {
> xe_page_reclaim_list_abort(tile->primary_gt, prl,
> "unsupported PTE level=%u pte=%#llx",
> @@ -1647,20 +1645,39 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
> struct xe_pt_stage_unbind_walk *xe_walk =
> container_of(walk, typeof(*xe_walk), base);
> struct xe_device *xe = tile_to_xe(xe_walk->tile);
> + pgoff_t first = xe_pt_offset(addr, xe_child->level, walk);
>
> XE_WARN_ON(!*child);
> XE_WARN_ON(!level);
> /* Check for leaf node */
> if (xe_walk->prl && xe_page_reclaim_list_valid(xe_walk->prl) &&
> - !xe_child->base.children) {
> + (!xe_child->base.children || !xe_child->base.children[first])) {
> struct iosys_map *leaf_map = &xe_child->bo->vmap;
> - pgoff_t first = xe_pt_offset(addr, 0, walk);
> - pgoff_t count = xe_pt_num_entries(addr, next, 0, walk);
> + pgoff_t count = xe_pt_num_entries(addr, next, xe_child->level, walk);
>
> for (pgoff_t i = 0; i < count; i++) {
> u64 pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64);
> int ret;
>
> + /*
> + * In rare scenarios, pte may not be written yet due to racy conditions.
> + * In such cases, invalidate the PRL and fallback to full PPC invalidation.
> + */
> + if (!pte) {
> + xe_page_reclaim_list_abort(xe_walk->tile->primary_gt, xe_walk->prl,
> + "found zero pte at addr=%#llx", addr);
> + break;
> + }
> +
> + /* Ensure it is a defined page */
> + xe_tile_assert(xe_walk->tile,
> + xe_child->level == 0 ||
> + (pte & (XE_PTE_PS64 | XE_PDE_PS_2M | XE_PDPE_PS_1G)));
> +
> + /* An entry should be added for 64KB but contigious 4K have XE_PTE_PS64 */
> + if (pte & XE_PTE_PS64)
> + i += 15; /* Skip other 15 consecutive 4K pages in the 64K page */
> +
> /* Account for NULL terminated entry on end (-1) */
> if (xe_walk->prl->num_entries < XE_PAGE_RECLAIM_MAX_ENTRIES - 1) {
> ret = generate_reclaim_entry(xe_walk->tile, xe_walk->prl,
> @@ -1677,9 +1694,14 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
> }
> }
>
> - /* If aborting page walk early, invalidate PRL since PTE may be dropped from this abort */
> - if (xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk) &&
> - xe_walk->prl && level > 1 && xe_child->base.children && xe_child->num_live != 0) {
> + /*
> + * If aborting page walk early or page walk finishes,
> + * invalidate PRL since PTE may be dropped from this abort
> + */
> + if ((xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk) ||
> + xe_pt_covers(addr, next, xe_child->level, &xe_walk->base)) &&
> + xe_walk->prl && level > 1 && (xe_child->base.children &&
> + xe_child->base.children[first]) && xe_child->num_live != 0) {
This is pretty confusing if statement. I'm not really following the
'xe_child->base.children[first]' check. At minimum can the comment above
this if statement perhaps explain all the condition here?
Matt
> xe_page_reclaim_list_abort(xe_walk->tile->primary_gt, xe_walk->prl,
> "kill at level=%u addr=%#llx next=%#llx num_live=%u\n",
> level, addr, next, xe_child->num_live);
> --
> 2.52.0
>
next prev parent reply other threads:[~2026-01-06 16:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-05 23:33 [PATCH 0/4] Page-reclaim fixes and PRL stats addition Brian Nguyen
2026-01-05 23:33 ` [PATCH 1/4] drm/xe: Remove debug comment in page reclaim Brian Nguyen
2026-01-06 2:15 ` Matthew Brost
2026-01-05 23:33 ` [PATCH 2/4] drm/xe: Add explicit abort page reclaim list Brian Nguyen
2026-01-06 2:23 ` Matthew Brost
2026-01-06 12:44 ` Nguyen, Brian3
2026-01-05 23:33 ` [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages Brian Nguyen
2026-01-06 16:41 ` Matthew Brost [this message]
2026-01-06 17:12 ` Nguyen, Brian3
2026-01-05 23:33 ` [PATCH 4/4] drm/xe: Add page reclamation related stats Brian Nguyen
2026-01-05 23:41 ` ✓ CI.KUnit: success for Page-reclaim fixes and PRL stats addition (rev2) Patchwork
2026-01-06 1:12 ` ✗ Xe.CI.Full: failure " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2026-01-07 1:04 [PATCH 0/4] Page-reclaim fixes and PRL stats addition Brian Nguyen
2026-01-07 1:04 ` [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages Brian Nguyen
2026-01-08 16:22 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV07VAFD8K8YaGlV@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=brian3.nguyen@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox