From: Matthew Brost <matthew.brost@intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
"Bommu, Krishnaiah" <krishnaiah.bommu@intel.com>,
"Thomas.Hellstrom@linux.intel.com"
<Thomas.Hellstrom@linux.intel.com>,
"Welty, Brian" <brian.welty@intel.com>
Subject: Re: [v2 31/31] drm/xe/svm: Migration from sram to vram for system allocator
Date: Fri, 7 Jun 2024 18:23:58 +0000 [thread overview]
Message-ID: <ZmNQPoG1WwssW4uJ@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <ZmNPEW4hD32E0E/1@DUT025-TGLU.fm.intel.com>
On Fri, Jun 07, 2024 at 06:18:57PM +0000, Matthew Brost wrote:
> On Fri, Jun 07, 2024 at 11:22:42AM -0600, Zeng, Oak wrote:
> > Hi Matt,
> >
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost@intel.com>
> > > Sent: Wednesday, April 10, 2024 10:55 PM
> > > To: Zeng, Oak <oak.zeng@intel.com>
> > > Cc: intel-xe@lists.freedesktop.org; Ghimiray, Himal Prasad
> > > <himal.prasad.ghimiray@intel.com>; Bommu, Krishnaiah
> > > <krishnaiah.bommu@intel.com>; Thomas.Hellstrom@linux.intel.com; Welty,
> > > Brian <brian.welty@intel.com>
> > > Subject: Re: [v2 31/31] drm/xe/svm: Migration from sram to vram for system
> > > allocator
> > >
> > > On Tue, Apr 09, 2024 at 04:17:42PM -0400, Oak Zeng wrote:
> > > > If applicable, migrate a vma from sram to vram for system allocator.
> > > > Traditional userptr is not migrated. Only userptr created during
> > > > fault (aka userptr splitted from system allocator vma) can be
> > > > migrated.
> > > >
> > > > FIXME: The migration should be conditional on user memory attributes
> > > > setting. Add this logic when memory attributes are supported
> > > >
> > > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_gt_pagefault.c | 9 ++++++++-
> > > > drivers/gpu/drm/xe/xe_vm.c | 4 ----
> > > > 2 files changed, 8 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > index 668984f0769e..c6ba00049964 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > @@ -20,6 +20,7 @@
> > > > #include "xe_guc_ct.h"
> > > > #include "xe_migrate.h"
> > > > #include "xe_trace.h"
> > > > +#include "xe_svm.h"
> > > > #include "xe_vm.h"
> > > >
> > > > struct pagefault {
> > > > @@ -209,12 +210,18 @@ static int handle_pagefault(struct xe_gt *gt,
> > > struct pagefault *pf)
> > > >
> > > > if (xe_vma_is_userptr(vma) && write_locked) {
> > > > struct xe_userptr_vma *uvma = to_userptr_vma(vma);
> > > > + struct xe_userptr *userptr = &uvma->userptr;
> > > >
> > > > spin_lock(&vm->userptr.invalidated_lock);
> > > > - list_del_init(&uvma->userptr.invalidate_link);
> > > > + list_del_init(&userptr->invalidate_link);
> > > > spin_unlock(&vm->userptr.invalidated_lock);
> > > >
> > > > + mmap_read_lock(userptr->notifier.mm);
> > > > + /**FIXME: Add migration policy here*/
> > > > + if (xe_vma_is_fault_userptr(vma))
> > > > + xe_svm_migrate_vma_to_vram(vm, vma, tile);
> > >
> > > Agree we need a policy here...
> > >
> > > See my comments about locking in [1] thinking if we migrate we likely
> > > want to hold the mmap lock until at least the bind being issued to
> > > prevent races with the CPU fault handler, at least initially.
> >
> > As explained in [1], I will continue to use mmap read lock for now. And the read lock only covers migration and userptr-pin-pages but not the bind. The bind correctness is guaranteed by a retry mechanism and a userptr's notifier_lock.
> >
>
> See my reply, I think to avoid races VRAM migration and grabbing VRAM
> pages via hmm_range_fault either need the write lock or a some core
> refactoring needs to be done to avoid races.
>
> One more possibly wrt to these races, maybe we simply don't care and
> racing access between the CPU and GPU of shared memory and this is akin
> to a user fault (i.e. the user must synchronize access or the behavior
> is undefined possibly resulting in a segfault). If I recall correctly,
> my tests showed when these races occured either a memory corruption
> happened or the user got a segfault.
>
> FWIW, Cuda seems to have synchronize calls in their SVM documentation
> (e.g. pass a malloc address to a shader, execute shader, call
> synchronize, then access malloc address in user program). I can dig up
> this documentation if you like.
>
Sorry one more comment again. I haven't looked at the level0 spec for
SVM, we should also look at that and perhap spec it out as CPU and GPU
access of SVM memory is mutually exclusive and concurent access is
undefined.
Also missed a comment below.
> Agree binds certainly do not need the mmap lock.
>
> Matt
>
> > >
> > > [1] https://patchwork.freedesktop.org/patch/588542/?series=132229&rev=1
> > >
> > > > ret = xe_vma_userptr_pin_pages(uvma);
> > > > + mmap_read_unlock(userptr->notifier.mm);
> > > > if (ret)
> > > > goto unlock_vm;
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 498b36469d00..8a58fe144a02 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -71,16 +71,12 @@ int xe_vma_userptr_pin_pages(struct
> > > xe_userptr_vma *uvma)
> > > > struct xe_vma *vma = &uvma->vma;
> > > > struct xe_vm *vm = xe_vma_vm(vma);
> > > > struct xe_device *xe = vm->xe;
> > > > - struct xe_userptr *userptr;
> > > > int ret;
> > > >
> > > > lockdep_assert_held(&vm->lock);
> > > > xe_assert(xe, xe_vma_is_userptr(vma));
> > > >
> > > > - userptr = &uvma->userptr;
> > > > - mmap_read_lock(userptr->notifier.mm);
> > > > ret = xe_userptr_populate_range(uvma);
> > > > - mmap_read_unlock(userptr->notifier.mm);
> > >
> > > Now you won't have the lock here other callers of this function...
> > > Probably need to have locked / unlocked version or arguments here.
> >
> > This is addressed in v3.
> >
Misses this. Also to be clear, in you design Xe shouldn't touch the MMAP
lock either, that lock should be owned by the DRM layer.
Matt
> > Oak
> >
> > >
> > > Matt
> > >
> > > >
> > > > return ret;
> > > > }
> > > > --
> > > > 2.26.3
> > > >
next prev parent reply other threads:[~2024-06-07 18:25 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-09 20:17 [v2 00/31] Basic system allocator support in xe driver Oak Zeng
2024-04-09 20:17 ` [v2 01/31] drm/xe: Refactor vm_bind Oak Zeng
2024-04-09 20:17 ` [v2 02/31] drm/xe/svm: Add SVM document Oak Zeng
2024-04-09 20:17 ` [v2 03/31] drm/xe: Invalidate userptr VMA on page pin fault Oak Zeng
2024-04-09 20:17 ` [v2 04/31] drm/xe: Drop unused arguments from vm_bind_ioctl_ops_parse Oak Zeng
2024-04-09 20:17 ` [v2 05/31] drm/xe: Fix op->tile_mask for fault mode Oak Zeng
2024-04-09 20:17 ` [v2 06/31] drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR flag Oak Zeng
2024-04-09 20:17 ` [v2 07/31] drm/xe: Create userptr if page fault occurs on system_allocator VMA Oak Zeng
2024-04-09 20:17 ` [v2 08/31] drm/xe: Add faulted userptr VMA garbage collector Oak Zeng
2024-04-09 20:17 ` [v2 09/31] drm/xe: Introduce helper to populate userptr Oak Zeng
2024-04-09 20:17 ` [v2 10/31] drm/xe: Introduce a helper to free sg table Oak Zeng
2024-04-09 20:17 ` [v2 11/31] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
2024-04-09 20:17 ` [v2 12/31] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-04-10 21:09 ` Matthew Brost
2024-04-16 19:01 ` Matthew Brost
2024-04-09 20:17 ` [v2 13/31] drm/xe/svm: Introduce DRM_XE_SVM kernel config Oak Zeng
2024-04-10 21:13 ` Matthew Brost
2024-06-04 18:57 ` Zeng, Oak
2024-04-09 20:17 ` [v2 14/31] drm/xe: Introduce helper to get tile from memory region Oak Zeng
2024-04-10 21:17 ` Matthew Brost
2024-04-09 20:17 ` [v2 15/31] drm/xe: Introduce a helper to get dpa from pfn Oak Zeng
2024-04-10 21:35 ` Matthew Brost
2024-04-09 20:17 ` [v2 16/31] drm/xe/svm: Get xe memory region from page Oak Zeng
2024-04-10 21:38 ` Matthew Brost
2024-04-09 20:17 ` [v2 17/31] drm/xe: Get xe_vma from xe_userptr Oak Zeng
2024-04-10 21:42 ` Matthew Brost
2024-04-09 20:17 ` [v2 18/31] drm/xe/svm: Build userptr sg table for device pages Oak Zeng
2024-04-10 21:52 ` Matthew Brost
2024-04-09 20:17 ` [v2 19/31] drm/xe/svm: Determine a vma is backed by device memory Oak Zeng
2024-04-10 21:56 ` Matthew Brost
2024-06-05 2:29 ` Zeng, Oak
2024-04-09 20:17 ` [v2 20/31] drm/xe: add xe lock document Oak Zeng
2024-04-09 20:17 ` [v2 21/31] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-04-10 22:06 ` Matthew Brost
2024-04-09 20:17 ` [v2 22/31] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-04-10 22:23 ` Matthew Brost
2024-04-15 20:13 ` Zeng, Oak
2024-04-15 21:19 ` Matthew Brost
2024-06-05 22:16 ` Zeng, Oak
2024-06-05 23:37 ` Matthew Brost
2024-06-06 3:30 ` Zeng, Oak
2024-06-06 4:44 ` Matthew Brost
2024-04-17 20:55 ` Matthew Brost
2024-04-09 20:17 ` [v2 23/31] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-04-09 20:17 ` [v2 24/31] drm/xe/svm: Create and destroy xe svm Oak Zeng
2024-04-10 22:25 ` Matthew Brost
2024-04-09 20:17 ` [v2 25/31] drm/xe/svm: Add vm to xe_svm process Oak Zeng
2024-04-09 20:17 ` [v2 26/31] drm/xe: Make function lookup_vma public Oak Zeng
2024-04-10 22:26 ` Matthew Brost
2024-04-09 20:17 ` [v2 27/31] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-04-11 2:07 ` Matthew Brost
2024-04-12 17:24 ` Zeng, Oak
2024-04-12 18:10 ` Matthew Brost
2024-04-12 18:39 ` Zeng, Oak
2024-06-07 4:44 ` Zeng, Oak
2024-06-07 4:30 ` Zeng, Oak
2024-04-09 20:17 ` [v2 28/31] drm/xe/svm: Introduce helper to migrate vma to vram Oak Zeng
2024-04-11 2:49 ` Matthew Brost
2024-04-12 21:21 ` Zeng, Oak
2024-04-15 19:40 ` Matthew Brost
2024-06-07 17:12 ` Zeng, Oak
2024-06-07 17:56 ` Matthew Brost
2024-06-07 18:10 ` Matthew Brost
2024-04-09 20:17 ` [v2 29/31] drm/xe/svm: trace svm migration Oak Zeng
2024-04-09 20:17 ` [v2 30/31] drm/xe/svm: Add a helper to determine a vma is fault userptr Oak Zeng
2024-04-11 2:50 ` Matthew Brost
2024-04-09 20:17 ` [v2 31/31] drm/xe/svm: Migration from sram to vram for system allocator Oak Zeng
2024-04-11 2:55 ` Matthew Brost
2024-06-07 17:22 ` Zeng, Oak
2024-06-07 18:18 ` Matthew Brost
2024-06-07 18:23 ` Matthew Brost [this message]
2024-04-09 20:52 ` ✗ CI.Patch_applied: failure for Basic system allocator support in xe driver Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmNQPoG1WwssW4uJ@DUT025-TGLU.fm.intel.com \
--to=matthew.brost@intel.com \
--cc=Thomas.Hellstrom@linux.intel.com \
--cc=brian.welty@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=krishnaiah.bommu@intel.com \
--cc=oak.zeng@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox