Re: Separating xe_vma- and page-table state

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "Zeng, Oak" <oak.zeng@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: Separating xe_vma- and page-table state
Date: Thu, 14 Mar 2024 09:57:12 +0100	[thread overview]
Message-ID: <af5a8b6434fad3c8e2647bfc914c954da5edc40c.camel@linux.intel.com> (raw)
In-Reply-To: <ZfIB41awiIoMvotS@DUT025-TGLU.fm.intel.com>

Hi, Matt,


 Wed, 2024-03-13 at 19:43 +0000, Matthew Brost wrote:
> On Wed, Mar 13, 2024 at 11:56:12AM +0100, Thomas Hellström wrote:
> > On Wed, 2024-03-13 at 01:27 +0000, Matthew Brost wrote:
> > > On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> > > > Hi Thomas,
> > > 
> > > 
> > 
> > ....
> > 
> > > Thomas:
> > > 
> > > I like the idea of VMAs in the PT code function being marked as
> > > const
> > > and having the xe_pt_state as non const. It makes ownership very
> > > clear.
> > > 
> > > Not sure how that will fit into [1] as that series passes around
> > > a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It
> > > does
> > > this
> > > to make "struct xe_vm_ops" a single atomic operation. The VMAs
> > > are
> > > extracted either the GPUVM base operation or "struct xe_vma_op".
> > > Maybe
> > > these can be const? I'll look into that but this might not work
> > > out
> > > in
> > > practice.
> > > 
> > > Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship
> > > fits in
> > > hmmptrs. Could you explain your thinking here?
> > 
> > There is a need for hmmptrs to be sparse. When we fault we create a
> > chunk of PTEs that we populate. This chunk could potentially be
> > large
> > and covering the whole CPU vma or it could be limited to, say 2MiB
> > and
> > aligned to allow for large page-table entries. In Oak's POC these
> > chunks are called "svm ranges"
> > 
> > So the question arises, how do we map that to the current vma
> > management and page-table code? There are basically two ways:
> > 
> > 1) Split VMAs so they are either fully populated or unpopulated,
> > each
> > svm_range becomes an xe_vma.
> > 2) Create xe_pt_range / xe_pt_state whatever with an 1:1 mapping
> > with
> > the svm_mange and a 1:N mapping with xe_vmas.
> > 
> > Initially my thinking was that 1) Would be the simplest approach
> > with
> > the code we have today. I lifted that briefly with Sima and he
> > answered
> > "And why would we want to do that?", and the answer at hand was ofc
> > that the page-table code worked with vmas. Or rather that we mix
> > vma
> > state (the hmmptr range / attributes) and page-table state (the
> > regions
> > of the hmmptr that are actually populated), so it would be a
> > consequence of our current implementation (limitations).
> > 
> > With the suggestion to separate vma state and pt state, the xe_svm
> > ranges map to pt state and are managed per hmmptr vma. The vmas
> > would
> > then be split mainly as a result of UMD mapping something else (bo)
> > on
> > top, or UMD giving new memory attributes for a range (madvise type
> > of
> > operations).
> > 
> 
> Thanks for the explaination.
> 
> My 2 cents.
> 
> Do an initial implementation is based on 1) - Split VMAs so they are
> either fully populated or unpopulated, each svm_range becomes an
> xe_vma.
> 
> This is what I did in a quick PoC [A] and got basic system allocator
> working in ~500 loc. Is it fully baked? No, but also it not all that
> far
> off.
> 
> We get the uAPI in place, agree on all the semantics, get IGTs in
> place,
> and basic UMD support. We walk before we run...
> 
> Then transform to code to 2). The real benefit of 2) is going to be
> the
> memory footprint of all the state and the fragmentation of GPUVM RB
> tree
> that comes with spliting VMAs. I'll state one obvious benefit,
> xe_pt_state < xe_vma in size.
> 
> I'll give a quick example of the fragmentation issue too.
> 
> User binds 0-2^46 - 1 xe_vma
> Fault - 3 xe_vma (split into 2 unpopulated xe_vma, 1 populated
> xe_vma)
> Free - 3 xe_vma (replace 1 populated xe_vma with 1 unpopulated
> xe_vma)
> 
> Do this a couple of million times and we will tons of fragmentation
> and
> unnecessary xe_vma floating around. So pursuing 2) is probably worth
> while.
> 
> Is 2) going to quite a bit more complicated implementation. Let me
> give
> fairly simple example.
> 
> User binds 0-2^46 - 1 xe_vma
> Tons of occurs - 1 xe_vma, tons of xe_pt_states
> User binds a BO - 3 xe_vma, need to split xe_pt_states between 2 new
> xe_vma, possibly unmap some xe_pt_states, yikes...
> 
> While 2) is more complicated I don't think it is complete rewrite
> from
> 1) either. [A] is built on top of [B] which introduces the concept of
> PT
> OPs (in addition to existing concepts of GPUVM op / xe_vma_op). With
> a
> bit more of seperation I think getting 2) to work won't be all the
> difficult and the uppers layer (IOCTLs, migrate code, fault handler,
> etc..) should largely be unchanged from 1) -> 2).
> 
> With a staging plan of 1) -> 2) I think this doable. Thoughts?
> 
> [A]
> https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commits/system_allocator_poc/?ref_type=heads
> [B] https://patchwork.freedesktop.org/series/125608/ 
> 
> Matt

Without looking at the code ablve, yes I think this sounds like a good
plan. Oak, it would be good to hear your comments on this as well. I
think, given 1) Implementing 2) as a POC wouldn't be too hard. Then we
also need to figure how both approaches tie into the possibility to
make GPUVM hepers out of this...

/Thomas




> 
> > /Thomas
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >

     prev parent reply	other threads:[~2024-03-14  8:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12  7:43 Separating xe_vma- and page-table state Thomas Hellström
2024-03-12 23:02 ` Zeng, Oak
2024-03-13  1:27   ` Matthew Brost
2024-03-13  2:16     ` Zeng, Oak
2024-03-13  3:16       ` Matthew Brost
2024-03-13 10:56     ` Thomas Hellström
2024-03-13 17:06       ` Zeng, Oak
2024-03-14  8:52         ` Thomas Hellström
2024-03-14 16:00           ` Zeng, Oak
2024-03-13 19:43       ` Matthew Brost
2024-03-14  8:57         ` Thomas Hellström [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af5a8b6434fad3c8e2647bfc914c954da5edc40c.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox