Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: Separating xe_vma- and page-table state
Date: Wed, 13 Mar 2024 01:27:02 +0000	[thread overview]
Message-ID: <ZfEA5ieJ+M9rttI3@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <SA1PR11MB699139B15B97B2B056312F41922B2@SA1PR11MB6991.namprd11.prod.outlook.com>

On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> Hi Thomas,

Going to reply to both Thomas and Oak to not split this thread.

> 
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Sent: Tuesday, March 12, 2024 3:43 AM
> > To: intel-xe@lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost@intel.com>; Zeng, Oak
> > <oak.zeng@intel.com>
> > Subject: Separating xe_vma- and page-table state
> > 
> > Hi,
> > 
> > It's IMO become apparent both in the system allocator discussion and in
> > the patch that enables the presence of invalid vmas 
> > that we need to be
> > better at separating xe_vma and page-table state, so that xe_vma state
> > would contain things that are mostly immutable and that the user
> > requested: PAT index, memory attributes, requested tile presence etc,
> > whereas the page-table state would contain mutable state like actual
> > tile presence, invalidaton state and MMU notifier.

Thomas:

100% agree with the idea. There are 2 distict parts of xe_vma - what I
call GPUVM state and PT state. In addition, there is what I call IOCTL
state (sync objs, etc...) that is updated in various places in xe_vm.c.

Functions to update these respective states should be clearly seperated.
In the current implementaion is a complete mess in this regard, you
mention xe_vm_unbind_vma below, in its current state there is no way
this function could be easily be reused.

My series [1] vastly improves this seperation. It still could be split a
bit better though. In my next rebase I can look at this series through
the lens of clearly maintaining seperation + likely update [2] to do an
unbind rather than a invalidation.

[1] https://patchwork.freedesktop.org/series/125608/
[2] https://patchwork.freedesktop.org/series/130935/

> 
> It is a valid reasoning to me... if we want to do what community want us to do with system allocator, 
> And if we want to meet our umd's requirement of "free w/o vm_unbind", yes,  we need this "invalid" vma concept.
> 
> The strange thing is, it seems Matt can still achieve the goal without introducing invalid vma concept... it doesn't look like he has
> This concept in his patch....
>

Oak:

I'm a little confused by what mean by "invalid" vma concept. Does
that mean no GPU backing or page tables? That is essentially what I call
system allocator VMA in [3], right?

Also based on Thomas say below related to xe_vm_unbind_vma and [2] we
should be able to have VMA that points to a backing store (either
userptr or BO) but doesn't have GPU mappings. We do have this for
faulting VMs before the initial bind of a VMA but we cannot currently
dynamically change this (we can invalidate page tables but cannot remove
them without destroying the VMA). We should fix that.

[3] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c
 
> 
> > 
> > So far we have had no reason to separate the two, but with hmmptr we
> > would likely end up with multiple page-table regions per xe-vma, 
> 
> Can we still maintain 1 xe-vma : 1 page table region relationship for simplicity?
> 
> This requires vma splitting. i.e., if you have a hmmptr vma cover range [0~100],
> And fault happens at range [40~60], then we will end up with 3 vmas:
> [0~40], a dummy hmmptr vma, not mapped gpu
> [40~60], hmmptr vma mapped to gp
> [60~100], dummy, not mapped to gpu.
> 
> Does this work for you? Or do you see a benefit of not splitting vma?
>

Thomas / Oak:

Agree for at least the initial implementation a 1 to 1 relationship will
be easier. Also unsure what the benefit of not splitting VMAs is?

That being said, I don't think this is something we need to decide now.
 
> 
> 
> and
> > with the patch discussed earlier we could've easily reused
> > xe_vm_unbind_vma() that only touches the mutable page-table state and
> > does the correct locking.
> > 
> > The page table could would then typically take a const xe_vma *, and
> > and xe_pt_state *, or whatever we choose to call it. All xe_vmas except
> > hmmptr ones would have an 1:1 xe_vma <-> xe_pt_state relationship.

Thomas:

I like the idea of VMAs in the PT code function being marked as const
and having the xe_pt_state as non const. It makes ownership very clear.

Not sure how that will fit into [1] as that series passes around
a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It does this
to make "struct xe_vm_ops" a single atomic operation. The VMAs are
extracted either the GPUVM base operation or "struct xe_vma_op". Maybe
these can be const? I'll look into that but this might not work out in
practice.

Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship fits in
hmmptrs. Could you explain your thinking here?

> > 
> 
> Matt has POC codes which works for me for system allocator w/o this vma/pt_state splitting: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commits/system_allocator_poc?ref_type=heads
> 
> Can you take a look also... 
> 

Thomas:

It might be helpful to look at that branch too, the last 3 patches [3]
[4] [5] implement a PoC system allocator design built on top of [1],
using existing userptrs for now, and is based on our discussions in the
system allocator design doc. Coding is sometimes the easiest way to
convay ideas and this roughly what I had in mind. Not complete by any
means and still tons of work to do to make this a working solution.

[4] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c
[5] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/d2371e7c007406fb5e84e6605da238d12caa5c62
[6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/91cf8b38428ae9180c92435d3519193d074e837a

> When I worked on system allocator HLD, I pictured the invalid vma concept also. But somehow Matt was able to make it working without such concept....
>

Oak:

Still unclear on this. Maybe we can chat offline to clear this up.

Matt

> Oak
> 
> > Thoughts, comments?
> > 
> > Thanks,
> > Thomas

  reply	other threads:[~2024-03-13  1:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12  7:43 Separating xe_vma- and page-table state Thomas Hellström
2024-03-12 23:02 ` Zeng, Oak
2024-03-13  1:27   ` Matthew Brost [this message]
2024-03-13  2:16     ` Zeng, Oak
2024-03-13  3:16       ` Matthew Brost
2024-03-13 10:56     ` Thomas Hellström
2024-03-13 17:06       ` Zeng, Oak
2024-03-14  8:52         ` Thomas Hellström
2024-03-14 16:00           ` Zeng, Oak
2024-03-13 19:43       ` Matthew Brost
2024-03-14  8:57         ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfEA5ieJ+M9rttI3@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=oak.zeng@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox