From: Matthew Brost <matthew.brost@intel.com>
To: "Nguyen, Brian3" <brian3.nguyen@intel.com>
Cc: "Summers, Stuart" <stuart.summers@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Roper, Matthew D" <matthew.d.roper@intel.com>
Subject: Re: [PATCH 1/2] drm/xe: Skip over non leaf pte for PRL generation
Date: Tue, 24 Feb 2026 23:19:35 -0800 [thread overview]
Message-ID: <aZ6ih7RgkmrBGTj5@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <SA1PR11MB883971CC360573D5C2D96AFFAA75A@SA1PR11MB8839.namprd11.prod.outlook.com>
On Tue, Feb 24, 2026 at 11:45:21PM -0700, Nguyen, Brian3 wrote:
> On Monday, February 23, 2026 6:03 PM, Nguyen, Brian3 wrote:
> > On Mon, Feb 23, 2026 at 05:45:23PM -0800, Matthew Brost wrote:
> > > On Mon, Feb 23, 2026 at 04:33:30PM -0700, Nguyen, Brian3 wrote:
> > > > On Monday, February 23, 2026 3:07 PM, Stuart Summers wrote:
> > > > > On Mon, 2026-02-23 at 14:59 -0800, Matthew Brost wrote:
> > > > > > On Mon, Feb 23, 2026 at 10:49:21PM +0000, Summers, Stuart wrote:
> > > > > > > On Thu, 2026-01-29 at 08:27 +0000, Brian Nguyen wrote:
> > > > > > > > The check using xe_child->base.children was insufficient in
> > > > > > > > determining if a pte was a leaf node. So explicitly check
> > > > > > > > for if a pte is a leaf through the bit checks.
> > > > > > > >
> > > > > > > > Fixes: b912138df299 ("drm/xe: Create page reclaim list on
> > > > > > > > unbind")
> > > > > >
> > > > > > Move the Fixes tag by other tags (Signed-off-by, Cc)
> > > > > >
> > > >
> > > > Got it, will move.
> > > >
> > > > > > > >
> > > > > > > > v2:
> > > > > > > > - Remove old assert. (Matt R)
> > > > > > > >
> > > > > > > > Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
> > > > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > > > ---
> > > > > > > > drivers/gpu/drm/xe/xe_pt.c | 13 ++++++++-----
> > > > > > > > 1 file changed, 8 insertions(+), 5 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > > > > > b/drivers/gpu/drm/xe/xe_pt.c index
> > > > > > > > 6703a7049227..b73a356d0fa1
> > > > > > > > 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > > > > > @@ -1655,7 +1655,7 @@ static int
> > > > > > > > xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t
> > > > > > > > offset,
> > > > > > > >
> > > > > > > > XE_WARN_ON(!*child);
> > > > > > > > XE_WARN_ON(!level);
> > > > > > > > - /* Check for leaf node */
> > > > > > > > + /* Optimistically check for leaf node, may not be
> > > > > > > > guaranteed
> > > > > > >
> > > > > > > I would keep the comments the same - we're still trying to
> > > > > > > check a leaf node here and we aren't really doing anything
> > > > > > > special. If we have questions, we can look at the commit
> > > > > > > history to determine what changed from your prior implementation.
> > > > > > >
> > > > > > > Or if you want documentation here, it's more interesting to me
> > > > > > > *why* we
> > > > > > > can't see this from the child alone than just the fact that we
> > > > > > > can't (which we can observe by the if condition).
> > > > > > >
> > > > > > > Also applies to the comment below too.
> > > > > > >
> > > >
> > > > Ahh got it. Let me reword the comments here with more details
> > > > depending on the changes we make in the comments below.
> > > >
> > > > > > > Thanks,
> > > > > > > Stuart
> > > > > > >
> > > > > > > > from children alone */
> > > > > > > > if (xe_walk->prl &&
> > > > > > > > xe_page_reclaim_list_valid(xe_walk-
> > > > > > > > >prl)
> > > > > > > > &&
> > > > > > > > (!xe_child->base.children || !xe_child-
> > > > > > > > > base.children[first])) {
> > > > > > > > struct iosys_map *leaf_map =
> > > > > > > > &xe_child->bo->vmap; @@ -1675,10 +1675,13 @@ static int
> > > > > > > > xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t
> > > > > > > > offset,
> > > > > > > > break;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - /* Ensure it is a defined page */
> > > > > > > > - xe_tile_assert(xe_walk->tile,
> > > > > > > > - xe_child->level == 0
> > > > > > > > ||
> > > > > > > > - (pte & (XE_PTE_PS64 |
> > > > > > > > XE_PDE_PS_2M | XE_PDPE_PS_1G)));
> > > > > > > > + /*
> > > > > > > > + * The check for xe_pt's children is
> > > > > > > > insufficient to determine leaf.
> > > > > > > > + * If not leaf, break out and
> > > > > > > > +continue in
> > > > > > > > next page walk level.
> > > > > > > > + */
> > > > > > > > + if (xe_child->level > 0 &&
> > > > > > > > + !(pte & (XE_PTE_PS64 |
> > > > > > > > +XE_PDE_PS_2M |
> > > > > > > > XE_PDPE_PS_1G)))
> > > > > >
> > > > > > I don't think XE_PTE_PS64 needs to be checked here as that
> > > > > > should only be set at level 0.
> > > > >
> > > > > Oh that's a good catch. PS64 has a specific meaning within the PDE. Are we trying to check if this is > 4K basically?
> > > > >
> > > >
> > > > This was checking if one of the bits indicating that this is a leaf entry was raised and if not, skip adding this.
> > > > So, checking for if it’s a leaf PTE (which would have one of those bits raised), not necessarily if it is just > 4K.
> > > >
> > > > Previously I had thought to remove XE_PTE_PS64 since level == 0
> > > > covers the condition but just for clarity of checking all leaf PTE, I kept it with the assert and then kept it since I inverted condition in
> > this patch.
> > > > I'll remove XE_PTE_PS64 in next patch. Thanks.
> > > >
> > > > > Thanks,
> > > > > Stuart
> > > > >
> > > > > >
> > > > > > I agree we xe_child->base.children can be set at level > 0 but
> > > > > > now I'm thinking the outer if statement is wrong wrt to
> > > > > > 'xe_child->base.children[first]'. Couldn't
> > > > > > xe_child->base.children[first] be NULL when subsequent
> > > > > > xe_child->base.children[first + 1] be non-NULL?
> > > > > >
> > > >
> > > > I believe this was the issue I was running into with one of the test
> > > > cases. Having some sort of non-NULL subsequent child declared here
> > >
> > > Yes, but any check at outer level of individual children doesn't seem
> > > right.
> > >
> > > Let's give an example of a 6M unbind.
> > >
> > > Case 1:
> > > - children[0] == NULL (leaf)
> > > - children[1] == Valid (not a leaf)
> > > - children[2] == NULL (leaf)
> > >
> > > I believe you correct code correctly generate PRL for children[0],
> > > break on children[1], so children[3] is skipped. I don't think this is
> > > correct... On hitting children[1], I believe in addition to breaking
> > > the loop, you the PRL needs to be invalidated, right?
> > >
>
> Yea... I see the issue. My intent at the time was to ensure that the xe_pt being
> read is a leaf entry by reading children. My thought was that if the 1st one read
> was a leaf PTE, then we could keep iterating until there wasn't a leaf anymore
> and handle it on next page walk, but that seems insufficient now for these
> interleaved leaf pte and nonleaf pt.
>
> Still smacking my head at this trying to come up with a better solution, but
> current idea here is removing that children check from the outer if statement
> and adding the following inside of the for loop. Let me test to see if that is sufficient.
>
> ```
> for (pgoff_t i = 0; i < count; i++) {
> if (xe_child->base.children && xe_child->base.children[first + i])
> continue;
> ...
> }
> ```
Yes, I think something roughly like this.
See below, I confused myself shooting test cases from hip so unless I
type an implementation out I'll probably give further bad input...
>
> This would allow for us to add all the leaf nodes and skip over the ones aren't accessed
> by the page walker until later. This would work for the issue I previously ran into
> mentioned later and case 2 but it wouldn't work for case 1.
>
> - children[0] == NULL (leaf)
> - children[1] == Valid (not a leaf)
> - children[2] == NULL (leaf)
>
> In this case, could we do something dumb like check to see if this not a leaf would
> be at the edge, based on the index with count and alignment?
>
> If quick test results are good and no objections, I can try this as next patch to see...
>
> > > Case 2:
> > > - children[0] == Valid (not a leaf)
> > > - children[1] == NULL (leaf)
> > > - children[2] == NULL (leaf)
> > >
> > > The entire PRL generation is skipped but the PRL is not invalidated
> > > which seems to be problem as leafs are skipped here.
> > >
> > > So I believe this entire logic needs a bit more rework than this patch
> > > alone.
> > >
> > > I believe the logic is roughly - at level 1 / 0 always walk all the
> > > PTEs in the PRL is valid, at level 1 if any non-2M pages are found
> > > invalidate the PRL if we are not going to desend further down (i.e.,
> > > we cover the entire 2M range).
> > >
> > > This fairly complicated, so I'd have to type this out + get on test
> > > machine to provide further advise here.
> > >
> >
> > I can give you some test cases I suppose...
> >
>
> Thanks for the test case and examples. I did have a try with these to see
> the intended behavior here and everything worked out except the 2m mixed cases.
>
> FYI, took a look again at the particular test case I was triggering an issue
> with the original assert: igt@xe_vm@large-split-binds-268435456,
> which performs
> BO[0] = alloc(256M+4K)
> bind(BO[0], aligned VA, 128M)
> bind(BO[0]+offset(128M), aligned VA + 128M, 128M+4K)
> unbind(aligned VA, 128M)
> unbind(aligned VA+128M, 128M+4K)
>
> So there was an edge case here of what I believe is a page directory
> filled with 2M and a 4K page afterwards and num_entries of xe_pt was 65.
>
> PD[64] = 2M PTE
> PD[65] = 2M PTE
> ...
> PD[127] = 2M PTE
> PD[128] = PT -> 4K PTE (Triggers assert, shouldn't be added as a PRL entry)
> But can still proceed with PRL w/o invalidate because the next page walk
> should proceed to the 4K page here.
Yes, I think we would descend if the unbind doesn’t completely cover the
2M region and there are other present PTEs in the region. I believe we'd
short circuit the decend if the 4K PTE was only PTE in region though...
I could be wrong though as it has been a while since I've run this code
tracing every step.
In some of my examples above, I was thinking that within a single VMA
you could have a 2M page and then 512 4K pages in the covered region,
depending on how the physical backing gets allocated. Hitting this case
is really difficult because of how DRM buddy works, but it is possible.
>
> So
> - children[64] == NULL (leaf)
> - children[65] == NULL (leaf)
> ...
> - children[127] == NULL (leaf)
> - children[128] == VALID (not a leaf)
>
> > - Test case A - 2M binds:
> >
> > BO[0] = alloc(8M);
> > bind(BO[0], aligned VA, 8M);
> > unbind(aligned VA, 8M); /* Get four 2M PRL */
> >
> > - Test base B - 4k binds - Same as above but 4k
> >
> > - Test case C - 2M mixed:
> >
> > BO[0] = alloc(2M);
> > BO[1] = alloc(4K);
> > BO[2] = alloc(2M);
> > bind(BO[0], aligned VA, 2M);
> > bind(BO[1], aligned VA + 2M, 4K);
> > bind(BO[2], aligned VA + 4M, 2M);
> > unbind(aligned VA, 6M); /* PRL should be invalid */
> >
>
> A bit confused why this would require the PRL to be invalidated?
> There should be 3 unbind ops with the 6M range each could
> target one entry here since 3 BO.
Ah, yes. We would get 3 unbind ops here. I was thinking we'd get one 6M
unbind op and the middle part we wouldn't decend but I was clearly
confusing myself... This case should actually work.
>
> Hopefully understood your direction here and ran the test case and saw the OPs
> [drm:print_op [xe]] UNMAP: addr=0x0000000040000000, range=0x0000000000200000, keep=0
> [drm:print_op [xe]] UNMAP: addr=0x0000000040200000, range=0x0000000000001000, keep=0
> [drm:print_op [xe]] UNMAP: addr=0x0000000040400000, range=0x0000000000200000, keep=0
> [drm:unbind_op_prepare [xe]] Preparing unbind, with range [40000000...401fffff)
> [drm:xe_pt_stage_unbind_entry [xe]] PRL add entry: level=1 pte=0x20000001516004bb reclamation_size=9 prl_idx=0
> [drm:xe_vm_dbg_print_entries [xe]] unbind: 1 entries to update
> [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 1 at (0 + 1) [40000000...40200000) f:0
> [drm:unbind_op_prepare [xe]] Preparing unbind, with range [40200000...40200fff)
> [drm:xe_pt_stage_unbind_entry [xe]] PRL add entry: level=0 pte=0x200000015048c43b reclamation_size=0 prl_idx=1
> [drm:xe_vm_dbg_print_entries [xe]] unbind: 1 entries to update
> [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 1 at (1 + 1) [40200000...40400000) f:0
> [drm:unbind_op_prepare [xe]] Preparing unbind, with range [40400000...405fffff)
> [drm:xe_pt_stage_unbind_entry [xe]] PRL add entry: level=1 pte=0x20000001808004bb reclamation_size=9 prl_idx=2
> [drm:xe_vm_dbg_print_entries [xe]] unbind: 1 entries to update
> [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 4 at (0 + 1) [0...1000000000000) f:0
>
> > - Test case C - 2M mixed:
> >
> > BO[0] = alloc(4K);
> > BO[1] = alloc(2M);
> > BO[2] = alloc(2M);
> > bind(BO[0], aligned VA, 4k);
> > bind(BO[1], aligned VA + 2M, 2M);
> > bind(BO[2], aligned VA + 4M, 2M); /* PRL should be invalid */
> > unbind(aligned VA, 6M); /* PRL should be invalid */
> >
>
This one should work too.
Matt
> Same question as above. I'm guessing that we are trying to
> trigger that case with the non-leaf in the middle of the children,
> but I believe we must make one BO that has 2M PTEs and
> scattered 4K PTEs for one of the 2M regions?
>
> Brian
>
> > - test case D - 2M, partial:
> >
> > BO[0] = alloc(2M);
> > BO[1] = alloc(4k);
> > bind(BO[0], aligned VA, 2M);
> > bind(BO[1], aligned VA + 2M, 4k);
> > unbind(aligned VA, 2M + 4k); /* Get one 2M PRL, one 4K PRL */
> >
> > - test case E - 2M, partial:
> >
> > BO[0] = alloc(4K);
> > BO[1] = alloc(2M);
> > bind(BO[0], aligned VA - 4K, 4K);
> > bind(BO[1], aligned VA, 2M);
> > unbind(aligned VA - 4K, 2M + 4K); /* Get one 2M PRL, one 4K PRL */
> >
> > - test case F - 1 GB, partial:
> >
> > BO[0] = alloc(1G);
> > BO[1] = alloc(2M);
> > bind(BO[0], aligned VA, 1G);
> > bind(BO[1], aligned VA + 1G, 2M);
> > unbind(aligned VA, 1G + 2M); /* PRL should be invalid */
> >
> > Matt
> >
> > > Matt
> > >
> > > > with the xe_child->base.children[first] = NULL. But determining if
> > > > this PTE is a leaf entry would require us to iterate through all the
> > > > childrens I believe, which is why I had kept that check (xe_child->base.children[first] != NULL) as sort of an optimistic filter and had
> > this new inner if statement to ensure that these entries are leaf entries.
> > > >
> > > > So, what is your suggestion here?
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > > > > Matt
> > > > > >
> > > > > > > > + break;
> > > > > > > >
> > > > > > > > /* An entry should be added for 64KB
> > > > > > > > but contigious 4K have XE_PTE_PS64 */
> > > > > > > > if (pte & XE_PTE_PS64)
> > > > > > >
> > > >
prev parent reply other threads:[~2026-02-25 7:19 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-29 8:27 [PATCH 1/2] drm/xe: Skip over non leaf pte for PRL generation Brian Nguyen
2026-01-29 8:27 ` [PATCH 2/2] drm/xe: Move page reclaim done_handler to own func Brian Nguyen
2026-01-29 21:40 ` Lin, Shuicheng
2026-02-23 22:45 ` Summers, Stuart
2026-02-23 22:51 ` Matthew Brost
2026-01-29 9:14 ` ✓ CI.KUnit: success for series starting with [1/2] drm/xe: Skip over non leaf pte for PRL generation Patchwork
2026-01-29 9:48 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-23 21:48 ` [PATCH 1/2] " Nguyen, Brian3
2026-02-23 22:49 ` Summers, Stuart
2026-02-23 22:59 ` Matthew Brost
2026-02-23 23:07 ` Summers, Stuart
2026-02-23 23:33 ` Nguyen, Brian3
2026-02-24 1:45 ` Matthew Brost
2026-02-24 2:02 ` Matthew Brost
2026-02-25 6:45 ` Nguyen, Brian3
2026-02-25 7:19 ` Matthew Brost [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZ6ih7RgkmrBGTj5@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=brian3.nguyen@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.d.roper@intel.com \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox