Re: [RFC PATCH] drm/xe/bo: Honor madvise(2) advices

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>, <intel-xe@lists.freedesktop.org>
Subject: Re: [RFC PATCH] drm/xe/bo: Honor madvise(2) advices
Date: Sat, 29 Nov 2025 07:55:04 -0800	[thread overview]
Message-ID: <aSsXWDSOVnI4V2jG@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <b7c3969245a5db71ced0c3aadc52c9531e68141d.camel@linux.intel.com>

On Sat, Nov 29, 2025 at 01:51:38PM +0100, Thomas Hellström wrote:
> On Fri, 2025-11-28 at 13:01 -0800, Matthew Brost wrote:
> > On Fri, Nov 28, 2025 at 12:57:15PM +0000, Matthew Auld wrote:
> > > On 28/11/2025 10:46, Thomas Hellström wrote:
> > > > The user can give advices as to how the CPU will access an
> > > > address range. Use those advices to determine the number of
> > > > bo pages to prefault on a page-fault.
> > > > 
> > > > Do this regardless of whether we can find a way to avoid the
> > > > fairly slow vm_insert_pfn_prot() to populate buffer
> > > > object maps.
> > > > 
> > > > Initially, fault up to 512 pages on sequential access and
> > > > a single page on random access.
> > > > 
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_bo.c | 18 +++++++++++++++++-
> > > >   1 file changed, 17 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 6fd6ce6c6586..07d0d954f826 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1821,15 +1821,31 @@ static int xe_bo_fault_migrate(struct
> > > > xe_bo *bo, struct ttm_operation_ctx *ctx,
> > > >   	return err;
> > > >   }
> > > > +/*
> > > > + * Number of prefaulted pages for the MADV_SEQUENTIAL and
> > > > + * MADV_RANDOM madvise() advices.
> > > > + */
> > > > +#define XE_BO_VM_NUM_PREFAULT_SEQ  512
> > > > +#define XE_BO_VM_NUM_PREFAULT_RAND 1
> > > > +
> > > >   /* Call into TTM to populate PTEs, and register bo for PTE
> > > > removal on runtime suspend. */
> > > >   static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf,
> > > > struct xe_device *xe, struct xe_bo *bo)
> > > >   {
> > > > +	const struct vm_area_struct *vma = vmf->vma;
> > > > +	pgoff_t num_prefault;
> > > >   	vm_fault_t ret;
> > > >   	trace_xe_bo_cpu_fault(bo);
> > > > +	if (vma->vm_flags & VM_SEQ_READ)
> > > > +		num_prefault = XE_BO_VM_NUM_PREFAULT_SEQ;
> > > > +	else if (vma->vm_flags & VM_RAND_READ)
> > > > +		num_prefault = XE_BO_VM_NUM_PREFAULT_RAND;
> > > > +	else
> > > > +		num_prefault = TTM_BO_VM_NUM_PREFAULT;
> > > 
> > > Ah, interesting. Do we know if any UMD is making use of these
> > > special flags
> > > today? Just wondering if this might be a visible change or not?
> > > Also would
> > > it make sense to document/advertise this somewhere for UMD folks,
> > > in case
> > > this has an immediate benefit for them?
> > > 
> > 
> > I also have a question here - does Xe / TTM support faulting in THP
> > on
> > the CPU side? Is that something we should also look at doing based on
> > madvise / global THP settings? Would that help mitigate the slow
> > vm_insert_pfn_prot too?
> 
> It would probably help a lot, as long as we actually get 2MiB pages
> from TTM. 
> 

Hmm, yes this seems like a pretty big win too considering Mesa now
always allocates 2M BOs and then suballocates smaller allocations in
user space. So we should pretty much always should be getting 2M pages /
faults.

> I had that implemented in TTM once with vmwgfx the only user, and it
> was working fine except one very important detail: I had implemented it
> based on vma information rather than PTE-based information, so
> get_user_pages_fast() didn't recognize these pages and was terribly
> confused. So it had to be ripped out.
> 
> If we're going to try that again, we need to talk to x86 arch to get a
> PMD_PUD_SPECIAL pmd/pud flag that behaves just like PTE_SPECIAL, so
> that things like get_user_pages_fast() ignore these huge PTEs. Auditing
> all page-walks in core-mm for this is non-trivial.
> 

Agree, core-mm page walks are non-trivial to audit. Recently looked at
2M device pages series and it really wasn't all that bad though.

Out of technical depth on the PTE_SPECIAL comment, but can dig in a bit
here. We do have a maintainer of x86 at Intel (Dave Hansen) which we can
float any ideas wrt to this topic though.

> But if that is done, we could bring in that stuff again, although
> Christian wasn't very fond of having it in TTM.
> 

We can perhaps bring this up as an option to Christian - from my limited
knowledge on this topic, this seems like something worth while to do
regardless of the PAT issue as just seems like a pretty big win. This
however is very unlikely to make it into customer kernels which are
complaining about this perf issue, so I think ww need to explore other
options too.

Matt

> But I think it would also be very beneficial for things like ioremap()
> and friends.
> 
> /Thomas
> 
> 
> > 
> > Matt
> > 
> > > I guess would be good to add an IGT which uses both flags, if we
> > > don't
> > > already?
> > > 
> > > Anyway, I think change makes sense,
> > > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > > 
> > > > +
> > > >   	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > > >vm_page_prot,
> > > > -				       TTM_BO_VM_NUM_PREFAULT);
> > > > +				       num_prefault);
> > > >   	/*
> > > >   	 * When TTM is actually called to insert PTEs, ensure no
> > > > blocking conditions
> > > >   	 * remain, in which case TTM may drop locks and return
> > > > VM_FAULT_RETRY.
> > > 
>

next prev parent reply	other threads:[~2025-11-29 15:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28 10:46 [RFC PATCH] drm/xe/bo: Honor madvise(2) advices Thomas Hellström
2025-11-28 10:53 ` ✓ CI.KUnit: success for " Patchwork
2025-11-28 12:57 ` [RFC PATCH] " Matthew Auld
2025-11-28 21:01   ` Matthew Brost
2025-11-29 12:51     ` Thomas Hellström
2025-11-29 15:55       ` Matthew Brost [this message]
2025-11-29 16:18         ` Thomas Hellström
2025-11-29 12:40   ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aSsXWDSOVnI4V2jG@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox