Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [RFC PATCH] drm/xe/bo: Honor madvise(2) advices
Date: Sat, 29 Nov 2025 13:51:38 +0100	[thread overview]
Message-ID: <b7c3969245a5db71ced0c3aadc52c9531e68141d.camel@linux.intel.com> (raw)
In-Reply-To: <aSoNkE3dldrSbbF9@lstrano-desk.jf.intel.com>

On Fri, 2025-11-28 at 13:01 -0800, Matthew Brost wrote:
> On Fri, Nov 28, 2025 at 12:57:15PM +0000, Matthew Auld wrote:
> > On 28/11/2025 10:46, Thomas Hellström wrote:
> > > The user can give advices as to how the CPU will access an
> > > address range. Use those advices to determine the number of
> > > bo pages to prefault on a page-fault.
> > > 
> > > Do this regardless of whether we can find a way to avoid the
> > > fairly slow vm_insert_pfn_prot() to populate buffer
> > > object maps.
> > > 
> > > Initially, fault up to 512 pages on sequential access and
> > > a single page on random access.
> > > 
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_bo.c | 18 +++++++++++++++++-
> > >   1 file changed, 17 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 6fd6ce6c6586..07d0d954f826 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1821,15 +1821,31 @@ static int xe_bo_fault_migrate(struct
> > > xe_bo *bo, struct ttm_operation_ctx *ctx,
> > >   	return err;
> > >   }
> > > +/*
> > > + * Number of prefaulted pages for the MADV_SEQUENTIAL and
> > > + * MADV_RANDOM madvise() advices.
> > > + */
> > > +#define XE_BO_VM_NUM_PREFAULT_SEQ  512
> > > +#define XE_BO_VM_NUM_PREFAULT_RAND 1
> > > +
> > >   /* Call into TTM to populate PTEs, and register bo for PTE
> > > removal on runtime suspend. */
> > >   static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf,
> > > struct xe_device *xe, struct xe_bo *bo)
> > >   {
> > > +	const struct vm_area_struct *vma = vmf->vma;
> > > +	pgoff_t num_prefault;
> > >   	vm_fault_t ret;
> > >   	trace_xe_bo_cpu_fault(bo);
> > > +	if (vma->vm_flags & VM_SEQ_READ)
> > > +		num_prefault = XE_BO_VM_NUM_PREFAULT_SEQ;
> > > +	else if (vma->vm_flags & VM_RAND_READ)
> > > +		num_prefault = XE_BO_VM_NUM_PREFAULT_RAND;
> > > +	else
> > > +		num_prefault = TTM_BO_VM_NUM_PREFAULT;
> > 
> > Ah, interesting. Do we know if any UMD is making use of these
> > special flags
> > today? Just wondering if this might be a visible change or not?
> > Also would
> > it make sense to document/advertise this somewhere for UMD folks,
> > in case
> > this has an immediate benefit for them?
> > 
> 
> I also have a question here - does Xe / TTM support faulting in THP
> on
> the CPU side? Is that something we should also look at doing based on
> madvise / global THP settings? Would that help mitigate the slow
> vm_insert_pfn_prot too?

It would probably help a lot, as long as we actually get 2MiB pages
from TTM. 

I had that implemented in TTM once with vmwgfx the only user, and it
was working fine except one very important detail: I had implemented it
based on vma information rather than PTE-based information, so
get_user_pages_fast() didn't recognize these pages and was terribly
confused. So it had to be ripped out.

If we're going to try that again, we need to talk to x86 arch to get a
PMD_PUD_SPECIAL pmd/pud flag that behaves just like PTE_SPECIAL, so
that things like get_user_pages_fast() ignore these huge PTEs. Auditing
all page-walks in core-mm for this is non-trivial.

But if that is done, we could bring in that stuff again, although
Christian wasn't very fond of having it in TTM.

But I think it would also be very beneficial for things like ioremap()
and friends.

/Thomas


> 
> Matt
> 
> > I guess would be good to add an IGT which uses both flags, if we
> > don't
> > already?
> > 
> > Anyway, I think change makes sense,
> > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > 
> > > +
> > >   	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > > -				       TTM_BO_VM_NUM_PREFAULT);
> > > +				       num_prefault);
> > >   	/*
> > >   	 * When TTM is actually called to insert PTEs, ensure no
> > > blocking conditions
> > >   	 * remain, in which case TTM may drop locks and return
> > > VM_FAULT_RETRY.
> > 


  reply	other threads:[~2025-11-29 12:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28 10:46 [RFC PATCH] drm/xe/bo: Honor madvise(2) advices Thomas Hellström
2025-11-28 10:53 ` ✓ CI.KUnit: success for " Patchwork
2025-11-28 12:57 ` [RFC PATCH] " Matthew Auld
2025-11-28 21:01   ` Matthew Brost
2025-11-29 12:51     ` Thomas Hellström [this message]
2025-11-29 15:55       ` Matthew Brost
2025-11-29 16:18         ` Thomas Hellström
2025-11-29 12:40   ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7c3969245a5db71ced0c3aadc52c9531e68141d.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox