Re: [Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds
Date: Tue, 4 Apr 2023 15:20:47 +0200	[thread overview]
Message-ID: <fc2de32e-9774-e5b0-3b43-e0e3eab74241@linux.intel.com> (raw)
In-Reply-To: <20230404014228.3738347-1-matthew.brost@intel.com>

Hi, Matthew,


On 4/4/23 03:42, Matthew Brost wrote:
> GPUVA is common code written primarily by Danilo with the idea being a
> common place to track GPUVAs (VMAs in Xe) within an address space (VMs
> in Xe), track all the GPUVAs attached to GEMs, and a common way
> implement VM binds / unbinds with MMAP / MUNMAP semantics via creating
> operation lists. All of this adds up to a common way to implement VK
> sparse bindings.
>
> This series pulls in the GPUVA code written by Danilo plus some small
> fixes by myself into 1 large patch. Once the GPUVA makes it upstream, we
> can rebase and drop this patch. I believe what lands upstream should be
> nearly identical to this patch at least from an API perspective.
>
> The last three patches port Xe to GPUVA and add support for NULL VM binds
> (writes dropped, read zero, VK sparse support). An example of the
> semantics of this is below.

Going through thew new code in xe_vm.c I'm still concerned about our 
error handling. In the cases we attempt to handle, for example an 
-ENOMEM, the sync ioctl appears to be doing a fragile unwind whereas the 
async worker puts the VM in an error state that can only be exited by a 
RESTART ioctl (pls correct me if I'm wrong here). We could of course do 
the same for sync ioctls, but I'm still wondering what happens if, for 
example a MAP operation on a dual-gt VM fails after binding on the first gt?

I think we need to clearly define the desired behaviour in the uAPI and 
then make sure we do the necessary changes with that in mind, and Even 
if I prefer the !async_worker solution this can be don orthogonally to 
that discussion. Ideally I'd see we find a way to *not* put the VM  in 
an error state like this, and I figure there are various ways to aid in 
this, for example keeping a progress cookie in the user-space data or 
deferring any failed bind operations to the next rebind worker or exec, 
but in the end I think this boils down to allocating all needed 
resources up-front for operations or groups of operations that are not 
allowed to fail once we start to modify the page-tables.

Thoughts?

/Thomas




>
> MAP 0x0000-0x8000 to NULL 	- 0x0000-0x8000 writes dropped + read zero
> MAP 0x4000-0x5000 to a GEM 	- 0x0000-0x4000, 0x5000-0x8000 writes dropped + read zero; 0x4000-0x5000 mapped to a GEM
> UNMAP 0x3000-0x6000		- 0x0000-0x3000, 0x6000-0x8000 writes dropped + read zero
> UNMAP 0x0000-0x8000		- Nothing mapped
>
> No changins to existing behavior, rather just new functionality./
>
> v2: Fix CI build failure
> v3: Export mas_preallocate, add patch to avoid rebinds
> v5: Bug fixes, rebase, xe_vma size optimizations
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> Danilo Krummrich (2):
>    maple_tree: split up MA_STATE() macro
>    drm: manager to keep track of GPUs VA mappings
>
> Matthew Brost (6):
>    maple_tree: Export mas_preallocate
>    drm/xe: Port Xe to GPUVA
>    drm/xe: NULL binding implementation
>    drm/xe: Avoid doing rebinds
>    drm/xe: Reduce the number list links in xe_vma
>    drm/xe: Optimize size of xe_vma allocation
>
>   Documentation/gpu/drm-mm.rst                |   31 +
>   drivers/gpu/drm/Makefile                    |    1 +
>   drivers/gpu/drm/drm_debugfs.c               |   56 +
>   drivers/gpu/drm/drm_gem.c                   |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c             | 1890 ++++++++++++++++++
>   drivers/gpu/drm/xe/xe_bo.c                  |   10 +-
>   drivers/gpu/drm/xe/xe_bo.h                  |    1 +
>   drivers/gpu/drm/xe/xe_device.c              |    2 +-
>   drivers/gpu/drm/xe/xe_exec.c                |    4 +-
>   drivers/gpu/drm/xe/xe_gt_pagefault.c        |   29 +-
>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |   14 +-
>   drivers/gpu/drm/xe/xe_guc_ct.c              |    6 +-
>   drivers/gpu/drm/xe/xe_migrate.c             |    8 +-
>   drivers/gpu/drm/xe/xe_pt.c                  |  176 +-
>   drivers/gpu/drm/xe/xe_trace.h               |   10 +-
>   drivers/gpu/drm/xe/xe_vm.c                  | 1947 +++++++++----------
>   drivers/gpu/drm/xe/xe_vm.h                  |   76 +-
>   drivers/gpu/drm/xe/xe_vm_madvise.c          |   87 +-
>   drivers/gpu/drm/xe/xe_vm_types.h            |  276 ++-
>   include/drm/drm_debugfs.h                   |   24 +
>   include/drm/drm_drv.h                       |    7 +
>   include/drm/drm_gem.h                       |   75 +
>   include/drm/drm_gpuva_mgr.h                 |  734 +++++++
>   include/linux/maple_tree.h                  |    7 +-
>   include/uapi/drm/xe_drm.h                   |    8 +
>   lib/maple_tree.c                            |    1 +
>   26 files changed, 4203 insertions(+), 1280 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 include/drm/drm_gpuva_mgr.h
>

next prev parent reply	other threads:[~2023-04-04 13:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04  1:42 [Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 1/8] maple_tree: split up MA_STATE() macro Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 2/8] maple_tree: Export mas_preallocate Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 3/8] drm: manager to keep track of GPUs VA mappings Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 4/8] drm/xe: Port Xe to GPUVA Matthew Brost
2023-04-21 12:52   ` Thomas Hellström
2023-04-21 12:56     ` Thomas Hellström
2023-04-21 14:54   ` Thomas Hellström
2023-04-25 11:41   ` Thomas Hellström
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 5/8] drm/xe: NULL binding implementation Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 6/8] drm/xe: Avoid doing rebinds Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 7/8] drm/xe: Reduce the number list links in xe_vma Matthew Brost
2023-04-04  1:42 ` [Intel-xe] [PATCH v5 8/8] drm/xe: Optimize size of xe_vma allocation Matthew Brost
2023-04-04  1:44 ` [Intel-xe] ✓ CI.Patch_applied: success for Port Xe to use GPUVA and implement NULL VM binds (rev5) Patchwork
2023-04-04  1:45 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-04-04  1:49 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-04-04  2:09 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-04-04 13:20 ` Thomas Hellström [this message]
2023-04-04 14:56   ` [Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds Matthew Brost
2023-04-05 15:31     ` Thomas Hellström
2023-04-06  1:20       ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc2de32e-9774-e5b0-3b43-e0e3eab74241@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.