Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Brian Nguyen <brian3.nguyen@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages
Date: Tue, 6 Jan 2026 08:41:56 -0800	[thread overview]
Message-ID: <aV07VAFD8K8YaGlV@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260105233351.3753716-9-brian3.nguyen@intel.com>

On Tue, Jan 06, 2026 at 07:33:55AM +0800, Brian Nguyen wrote:
> For 64KB pages, XE_PTE_PS64 is defined for all consecutive 4KB pages and
> are all considered leaf nodes, so existing check was falsely adding
> multiple 64KB pages to PRL.
> 
> For larger entries such as 2MB PDE, the check for pte->base.children is
> insufficient since this array is always  defined for page directory,
> level 1 and above, so perform a check on the entry itself pointing to
> the correct page.
> 
> For unmaps, if the range is properly covered by the page full directory,
> page walker may finish without walking to the leaf nodes.
> 
> For example, a 1G range can be fully covered by 512 2MB pages if
> alignment allows. In this case, the page walker will walk until
> it reaches this corresponding directory which can correlate to the 1GB
> range. Page walker will simply complete its walk and the individual 2MB
> PDE leaves won't get accessed.
> 
> In this case, PRL invalidation is also required, so add a check to see if
> pt entry cover the entire range since the walker will complete the walk.
> 
> There are possible race conditions that will cause driver to read a pte
> that hasn't been written to yet. The 2 scenarios are:
>  - Another issued TLB invalidation such as from userptr or MMU notifier.
>  - Dependencies on original bind that has yet to be executed with an
>    unbind on that job.
> 
> The expectation is these race conditions are likely rare cases so simply
> perform a fallback to full PPC flush invalidation instead.
> 
> v2:
>  - Reword commit and updated zero-pte handling. (Matthew B)
> 
> Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind")
> Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c | 50 +++++++++++++++++++++++++++-----------
>  1 file changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 2752a5a48a97..668a981696f9 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1576,12 +1576,6 @@ static bool xe_pt_check_kill(u64 addr, u64 next, unsigned int level,
>  	return false;
>  }
>  
> -/* Huge 2MB leaf lives directly in a level-1 table and has no children */
> -static bool is_2m_pte(struct xe_pt *pte)
> -{
> -	return pte->level == 1 && !pte->base.children;
> -}
> -
>  /* page_size = 2^(reclamation_size + XE_PTE_SHIFT) */
>  #define COMPUTE_RECLAIM_ADDRESS_MASK(page_size)				\
>  ({									\
> @@ -1594,7 +1588,8 @@ static int generate_reclaim_entry(struct xe_tile *tile,
>  				  u64 pte, struct xe_pt *xe_child)
>  {
>  	struct xe_guc_page_reclaim_entry *reclaim_entries = prl->entries;
> -	u64 phys_page = (pte & XE_PTE_ADDR_MASK) >> XE_PTE_SHIFT;
> +	u64 phys_addr = pte & XE_PTE_ADDR_MASK;
> +	u64 phys_page = phys_addr >> XE_PTE_SHIFT;
>  	int num_entries = prl->num_entries;
>  	u32 reclamation_size;
>  
> @@ -1613,10 +1608,13 @@ static int generate_reclaim_entry(struct xe_tile *tile,
>  	 */
>  	if (xe_child->level == 0 && !(pte & XE_PTE_PS64)) {
>  		reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_4K);  /* reclamation_size = 0 */
> +		xe_tile_assert(tile, phys_addr % SZ_4K == 0);
>  	} else if (xe_child->level == 0) {
>  		reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_64K); /* reclamation_size = 4 */
> -	} else if (is_2m_pte(xe_child)) {
> +		xe_tile_assert(tile, phys_addr % SZ_64K == 0);
> +	} else if (xe_child->level == 1 && pte & XE_PDE_PS_2M) {
>  		reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_2M);  /* reclamation_size = 9 */
> +		xe_tile_assert(tile, phys_addr % SZ_2M == 0);
>  	} else {
>  		xe_page_reclaim_list_abort(tile->primary_gt, prl,
>  					   "unsupported PTE level=%u pte=%#llx",
> @@ -1647,20 +1645,39 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	struct xe_pt_stage_unbind_walk *xe_walk =
>  		container_of(walk, typeof(*xe_walk), base);
>  	struct xe_device *xe = tile_to_xe(xe_walk->tile);
> +	pgoff_t first = xe_pt_offset(addr, xe_child->level, walk);
>  
>  	XE_WARN_ON(!*child);
>  	XE_WARN_ON(!level);
>  	/* Check for leaf node */
>  	if (xe_walk->prl && xe_page_reclaim_list_valid(xe_walk->prl) &&
> -	    !xe_child->base.children) {
> +	    (!xe_child->base.children || !xe_child->base.children[first])) {
>  		struct iosys_map *leaf_map = &xe_child->bo->vmap;
> -		pgoff_t first = xe_pt_offset(addr, 0, walk);
> -		pgoff_t count = xe_pt_num_entries(addr, next, 0, walk);
> +		pgoff_t count = xe_pt_num_entries(addr, next, xe_child->level, walk);
>  
>  		for (pgoff_t i = 0; i < count; i++) {
>  			u64 pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64);
>  			int ret;
>  
> +			/*
> +			 * In rare scenarios, pte may not be written yet due to racy conditions.
> +			 * In such cases, invalidate the PRL and fallback to full PPC invalidation.
> +			 */
> +			if (!pte) {
> +				xe_page_reclaim_list_abort(xe_walk->tile->primary_gt, xe_walk->prl,
> +							   "found zero pte at addr=%#llx", addr);
> +				break;
> +			}
> +
> +			/* Ensure it is a defined page */
> +			xe_tile_assert(xe_walk->tile,
> +				       xe_child->level == 0 ||
> +				       (pte & (XE_PTE_PS64 | XE_PDE_PS_2M | XE_PDPE_PS_1G)));
> +
> +			/* An entry should be added for 64KB but contigious 4K have XE_PTE_PS64 */
> +			if (pte & XE_PTE_PS64)
> +				i += 15; /* Skip other 15 consecutive 4K pages in the 64K page */
> +
>  			/* Account for NULL terminated entry on end (-1) */
>  			if (xe_walk->prl->num_entries < XE_PAGE_RECLAIM_MAX_ENTRIES - 1) {
>  				ret = generate_reclaim_entry(xe_walk->tile, xe_walk->prl,
> @@ -1677,9 +1694,14 @@ static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset,
>  		}
>  	}
>  
> -	/* If aborting page walk early, invalidate PRL since PTE may be dropped from this abort */
> -	if (xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk) &&
> -	    xe_walk->prl && level > 1 && xe_child->base.children && xe_child->num_live != 0) {
> +	/*
> +	 * If aborting page walk early or page walk finishes,
> +	 * invalidate PRL since PTE may be dropped from this abort
> +	 */
> +	if ((xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk) ||
> +	     xe_pt_covers(addr, next, xe_child->level, &xe_walk->base)) &&
> +	    xe_walk->prl && level > 1 && (xe_child->base.children &&
> +	    xe_child->base.children[first]) && xe_child->num_live != 0) {

This is pretty confusing if statement. I'm not really following the
'xe_child->base.children[first]' check. At minimum can the comment above
this if statement perhaps explain all the condition here?

Matt 

>  		xe_page_reclaim_list_abort(xe_walk->tile->primary_gt, xe_walk->prl,
>  					   "kill at level=%u addr=%#llx next=%#llx num_live=%u\n",
>  					   level, addr, next, xe_child->num_live);
> -- 
> 2.52.0
> 

  reply	other threads:[~2026-01-06 16:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 23:33 [PATCH 0/4] Page-reclaim fixes and PRL stats addition Brian Nguyen
2026-01-05 23:33 ` [PATCH 1/4] drm/xe: Remove debug comment in page reclaim Brian Nguyen
2026-01-06  2:15   ` Matthew Brost
2026-01-05 23:33 ` [PATCH 2/4] drm/xe: Add explicit abort page reclaim list Brian Nguyen
2026-01-06  2:23   ` Matthew Brost
2026-01-06 12:44     ` Nguyen, Brian3
2026-01-05 23:33 ` [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages Brian Nguyen
2026-01-06 16:41   ` Matthew Brost [this message]
2026-01-06 17:12     ` Nguyen, Brian3
2026-01-05 23:33 ` [PATCH 4/4] drm/xe: Add page reclamation related stats Brian Nguyen
2026-01-05 23:41 ` ✓ CI.KUnit: success for Page-reclaim fixes and PRL stats addition (rev2) Patchwork
2026-01-06  1:12 ` ✗ Xe.CI.Full: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2026-01-07  1:04 [PATCH 0/4] Page-reclaim fixes and PRL stats addition Brian Nguyen
2026-01-07  1:04 ` [PATCH 3/4] drm/xe: Fix page reclaim entry handling for large pages Brian Nguyen
2026-01-08 16:22   ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aV07VAFD8K8YaGlV@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=brian3.nguyen@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox