Re: Separating xe_vma- and page-table state

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>,
	"Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: Separating xe_vma- and page-table state
Date: Thu, 14 Mar 2024 09:52:23 +0100	[thread overview]
Message-ID: <5165406f368cc023a5d0fd9879e33b8ac01d8aa7.camel@linux.intel.com> (raw)
In-Reply-To: <SA1PR11MB69915B6A778A81CD41F6A544922A2@SA1PR11MB6991.namprd11.prod.outlook.com>

On Wed, 2024-03-13 at 17:06 +0000, Zeng, Oak wrote:
> Hi Thomas,
> 
> For simplicity of the discussion, let's forget about BO vm_bind,
> forget about memory attributes for a moment... Only consider system
> allocator. So with the scheme below, we have a gigantic xe_vma in the
> background holding some immutable state, never split. And we have
> mutable page-table state which is created during GPU access and
> destroyed during CPU munmap/invalidation, dynamically
> 
> For the mutable page-table state, you would maintain another RB-tree
> so you can searching it, as I did  in POC, the tree is in xe_svm. For
> BO driver, you don’t need this extra tree, you just need the xe_vma
> tree as xe_vma has 1:1 mapping with page-table-state for BO driver...
> 
> I saw this scheme can be aligned with my POC....
> 
> Mapping this scheme to the userptr "free without vm_unbind" thing, I
> can see when user free, we can destroy page-table-state during mmu
> notifier callback, while keep the xe_vma. Is this also how you look
> at it? 
> 
> Need to say, the "free without vm_unbind" thing should only affect
> our decision temporarily: once system allocator is ready, UMD
> wouldn't need the userptr vm_bind anymore, so the problem will be
> more perfectly solved with system allocator - umd just remove the
> vm_bind, things would magically work with system allocator. I guess
> what user need is really a system allocator but we don't have it at
> that time, so userptr technology is used. For long term, system
> allocator should eventually replace userptr.

I mostly agree on the above, I think.

> 
> One thing I can't picture clearly is, how hard is it to change the
> current xekmd to separate xe_vma into mutable and unmutable?

It's not that hard at all, it's mostly changing the xe_pt.c interfaces.
An obstacle, though, is that we don't want to do this before Matt's big
vm_bind refactoring is reviewed and in place.

>  
> 
> Is the split scheme with xe_vma maintaining both mutable and
> unmutable simpler? It doesn't have xe_svm concept. No
> xe_svm_range/page_table-state, single RB tree per gpuvm, no need to
> re-construct xe_vma....depending on how we want to solve the multiple
> device problem, the xe_svm concept can come back though...

For the ordinary VMA types we have today, Userptr / Bo /NULL it's
neither simpler nor more complex IMO, but it makes the code clearer and
hopefully easier to maintain.

For hmmptr/SVM system it's too early to answer. Here it depends really
on whether 1) we do an 1:1 mapping between xe_vma and svm_range, or
whether 2) we do an 1:N mapping of xe_vma and svm_range. Probably both
approaches have their benefits so I'd tend to favour Matt's suggestion
there that we start off with 1), make it work and then do a POC with 2)
to see what it looks like.

Comments, suggestions?

/Thomas


> 
> Oak
> 
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Sent: Wednesday, March 13, 2024 6:56 AM
> > To: Brost, Matthew <matthew.brost@intel.com>; Zeng, Oak
> > <oak.zeng@intel.com>
> > Cc: intel-xe@lists.freedesktop.org
> > Subject: Re: Separating xe_vma- and page-table state
> > 
> > On Wed, 2024-03-13 at 01:27 +0000, Matthew Brost wrote:
> > > On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> > > > Hi Thomas,
> > > 
> > > 
> > 
> > ....
> > 
> > > Thomas:
> > > 
> > > I like the idea of VMAs in the PT code function being marked as
> > > const
> > > and having the xe_pt_state as non const. It makes ownership very
> > > clear.
> > > 
> > > Not sure how that will fit into [1] as that series passes around
> > > a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It
> > > does
> > > this
> > > to make "struct xe_vm_ops" a single atomic operation. The VMAs
> > > are
> > > extracted either the GPUVM base operation or "struct xe_vma_op".
> > > Maybe
> > > these can be const? I'll look into that but this might not work
> > > out
> > > in
> > > practice.
> > > 
> > > Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship
> > > fits in
> > > hmmptrs. Could you explain your thinking here?
> > 
> > There is a need for hmmptrs to be sparse. When we fault we create a
> > chunk of PTEs that we populate. This chunk could potentially be
> > large
> > and covering the whole CPU vma or it could be limited to, say 2MiB
> > and
> > aligned to allow for large page-table entries. In Oak's POC these
> > chunks are called "svm ranges"
> > 
> > So the question arises, how do we map that to the current vma
> > management and page-table code? There are basically two ways:
> > 
> > 1) Split VMAs so they are either fully populated or unpopulated,
> > each
> > svm_range becomes an xe_vma.
> > 2) Create xe_pt_range / xe_pt_state whatever with an 1:1 mapping
> > with
> > the svm_mange and a 1:N mapping with xe_vmas.
> > 
> > Initially my thinking was that 1) Would be the simplest approach
> > with
> > the code we have today. I lifted that briefly with Sima and he
> > answered
> > "And why would we want to do that?", and the answer at hand was ofc
> > that the page-table code worked with vmas. Or rather that we mix
> > vma
> > state (the hmmptr range / attributes) and page-table state (the
> > regions
> > of the hmmptr that are actually populated), so it would be a
> > consequence of our current implementation (limitations).
> > 
> > With the suggestion to separate vma state and pt state, the xe_svm
> > ranges map to pt state and are managed per hmmptr vma. The vmas
> > would
> > then be split mainly as a result of UMD mapping something else (bo)
> > on
> > top, or UMD giving new memory attributes for a range (madvise type
> > of
> > operations).
> > 
> > /Thomas
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
>

next prev parent reply	other threads:[~2024-03-14  8:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12  7:43 Separating xe_vma- and page-table state Thomas Hellström
2024-03-12 23:02 ` Zeng, Oak
2024-03-13  1:27   ` Matthew Brost
2024-03-13  2:16     ` Zeng, Oak
2024-03-13  3:16       ` Matthew Brost
2024-03-13 10:56     ` Thomas Hellström
2024-03-13 17:06       ` Zeng, Oak
2024-03-14  8:52         ` Thomas Hellström [this message]
2024-03-14 16:00           ` Zeng, Oak
2024-03-13 19:43       ` Matthew Brost
2024-03-14  8:57         ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5165406f368cc023a5d0fd9879e33b8ac01d8aa7.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox