[PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged)

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged)
@ 2023-08-03 16:52 Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock Danilo Krummrich
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This patch series provides a new UAPI for the Nouveau driver in order to
support Vulkan features, such as sparse bindings and sparse residency.

Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
keep track of GPU virtual address (VA) mappings in a more generic way (merged
into drm-misc/drm-misc-next since V8).

The DRM GPUVA manager is indented to help drivers implement userspace-manageable
GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
serves the following purposes in this context.

    1) Provide infrastructure to track GPU VA allocations and mappings,
       using an interval tree (RB-tree).

    2) Generically connect GPU VA mappings to their backing buffers, in
       particular DRM GEM objects.

    3) Provide a common implementation to perform more complex mapping
       operations on the GPU VA space. In particular splitting and merging
       of GPU VA mappings, e.g. for intersecting mapping requests or partial
       unmap requests.

The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
providing the following new interfaces.

    1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
       for UMDs to specify the portion of VA space managed by the kernel and
       userspace, respectively.

    2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

    3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
scheduler to queue jobs and support asynchronous processing with DRM syncobjs
as synchronization mechanism.

By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
for GEM buffers) by Christian König. Since the patch implementing drm_exec was
not yet merged into drm-next it is part of this series, as well as a small fix
for this patch, which was found while testing this series.

This patch series is also available at [1].

There is a Mesa NVK merge request by Dave Airlie [2] implementing the
corresponding userspace parts for this series.

The Vulkan CTS test suite passes the sparse binding and sparse residency test
cases for the new UAPI together with Dave's Mesa work.

There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
VA manager's logic through Nouveau's new UAPI and should be considered just as
helper for implementation.

However, I absolutely intend to change those test cases to proper kunit test
cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
design.

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
    https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
[2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
[3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind

Changes in V2:
==============
  Nouveau:
    - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
      signalling critical sections. Updates to the VA space are split up in three
      separate stages, where only the 2. stage executes in a fence signalling
      critical section:

        1. update the VA space, allocate new structures and page tables
        2. (un-)map the requested memory bindings
        3. free structures and page tables

    - Separated generic job scheduler code from specific job implementations.
    - Separated the EXEC and VM_BIND implementation of the UAPI.
    - Reworked the locking parts of the nvkm/vmm RAW interface, such that
      (un-)map operations can be executed in fence signalling critical sections.

  GPUVA Manager:
    - made drm_gpuva_regions optional for users of the GPUVA manager
    - allow NULL GEMs for drm_gpuva entries
    - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
      entries
    - provide callbacks for users to allocate custom drm_gpuva_op structures to
      allow inheritance
    - added user bits to drm_gpuva_flags
    - added a prefetch operation type in order to support generating prefetch
      operations in the same way other operations generated
    - hand the responsibility for mutual exclusion for a GEM's
      drm_gpuva list to the user; simplified corresponding (un-)link functions

  Maple Tree:
    - I added two maple tree patches to the series, one to support custom tree
      walk macros and one to hand the locking responsibility to the user of the
      GPUVA manager without pre-defined lockdep checks.

Changes in V3:
==============
  Nouveau:
    - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
      table cleanup) within a workqueue rather than the job_free() callback of
      the scheduler itself. A job_free() callback can stall the execution (run()
      callback) of the next job in the queue. Since the page table cleanup
      requires to take the same locks as need to be taken for page table
      allocation, doing it directly in the job_free() callback would still
      violate the fence signalling critical path.
    - Separated Nouveau fence allocation and emit, such that we do not violate
      the fence signalling critical path in EXEC jobs.
    - Implement "regions" (for handling sparse mappings through PDEs and dual
      page tables) within Nouveau.
    - Drop the requirement for every mapping to be contained within a region.
    - Add necassary synchronization of VM_BIND job operation sequences in order
      to work around limitations in page table handling. This will be addressed
      in a future re-work of Nouveau's page table handling.
    - Fixed a couple of race conditions found through more testing. Thanks to
      Dave for consitently trying to break it. :-)

  GPUVA Manager:
    - Implement pre-allocation capabilities for tree modifications within fence
      signalling critical sections.
    - Implement accessors to to apply tree modification while walking the GPUVA
      tree in order to actually support processing of drm_gpuva_ops through
      callbacks in fence signalling critical sections rather than through
      pre-allocated operation lists.
    - Remove merging of GPUVAs; the kernel has limited to none knowlege about
      the semantics of mapping sequences. Hence, merging is purely speculative.
      It seems that gaining a significant (or at least a measurable) performance
      increase through merging is way more likely to happen when userspace is
      responsible for merging mappings up to the next larger page size if
      possible.
    - Since merging was removed, regions pretty much loose their right to exist.
      They might still be useful for handling dual page tables or similar
      mechanisms, but since Nouveau seems to be the only driver having a need
      for this for now, regions were removed from the GPUVA manager.
    - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
      out.

Changes in V4:
==============
  Nouveau:
    - Refactored how specific VM_BIND and EXEC jobs are created and how their
      arguments are passed to the generic job implementation.
    - Fixed a UAF race condition where bind job ops could have been freed
      already while still waiting for a job cleanup to finish. This is due to
      in certain cases we need to wait for mappings actually being unmapped
      before creating sparse regions in the same area.
    - Re-based the code onto drm_exec v4 patch.

  GPUVA Manager:
    - Fixed a maple tree related bug when pre-allocating MA states.
      (Boris Brezillion)
    - Made struct drm_gpuva_fn_ops a const object in all occurrences.
      (Boris Brezillion)

Changes in V5:
==============
  Nouveau:
    - Link and unlink GPUVAs outside the fence signalling critical path in
      nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
      of BO evicts causing mapping invalidation and regular mapping operations
      is ensured with dma-fences.

  GPUVA Manager:
    - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
      iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
      lock instead.
    - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
      get eviced, they might get invalidated due to eviction.
    - Maple tree uses the 'unsinged long' type for node entries. While this
      works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
      space is limited to 32-bit on 32-bit kernels as well.
      As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
      the GPU VA manager contains checks to throw warnings when GPU VA entries
      exceed the maple tree's storage capabilities.
    - Extended the Documentation and added example code as requested by Donald
      Robson.

Changes in V6
=============

  Nouveau:
    - Re-based the code onto drm_exec v5 patch.

  GPUVA Manager:
    - Switch from maple tree to RB-tree.

      It turned out that mas_preallocate() requires the maple tree not to change
      in between pre-allocating nodes with mas_preallocate() and inserting an
      entry with the help of the pre-allocated memory (mas_insert_prealloc()).

      However, considering that drivers typically implement interfaces where
      jobs to create GPU mappings can be submitted by userspace, are queued up
      by the kernel and are processed asynchronously in dma-fence signalling
      critical paths, this is a major issue. In the ioctl() used to submit a job
      we'd need to pre-allocated memory with mas_preallocate(), however,
      previously queued up jobs could concurrently alter the maple tree
      resulting in potentially insufficient pre-allocated memory for the
      currently submitted job on execution time.

      There is a detailed and still ongoing discussion about this topic one the
      -mm list [1]. So far the only solution seems to be to use GFP_ATOMIC
      and allocate memory directly in the fence signalling critical path, where
      we need it. However, I think that is not what we want to rely on.

      I think we should definitely continue in trying to find a solution on how
      to fit in the maple tree (or how to make the maple tree fit in). However,
      for now it seems to be more expedient to move on using a RB-tree.

      [1] https://lore.kernel.org/lkml/20230612203953.2093911-15-Liam.Howlett@oracle.com/

    - Provide a flag to let driver optionally provide their own lock to lock
      linking and unlinking of GPUVAs to GEM objects. The DRM GPUVA manager
      still does not take the locks itself, but rather contains lockdep checks
      on either the GEMs dma-resv lock (default) or, if
      DRM_GPUVA_MANAGER_LOCK_EXTERN is set, the driver provided lock.
      (Boris Brezillon)

Changes in V7
=============
  Nouveau:
    - Rebase to drm_exec v7.
    - Move drm_gem_gpuva_init() before ttm_bo_init_validate(), but after
      initialization of the corresponding dma-resv.

  GPUVA Manager:
    - Fix drm_gpuva_find_first() range parameter in drm_gpuva_for_each_va*
      macros. (Boris)
    - Simplify drm_gpuva_for_each_va* macros using a __drm_gpuva_next() helper.
      (Boris)
    - Move lockdep checks for an optional external GEM gpuva list lock out of
      the GPUVA Manager to drm_gem.h. (Boris)
    - Fix code style issues pointed out by Thomas.
    - Switch to EXPORT_SYMBOL_GPL(). (Christoph)

Changes in V8
=============
  Nouveau:
    - n/a

  GPUVA Manager:
    - Fix documentation about locking the GEMs GPUVA list. (Donald)
    - Fix a few minor checkpatch warnings.

Changes in V9
=============
  Nouveau:
    - uAPI header (Faith, Dave):
      - documented preconditions to successfully initialize the VM_BIND uAPI
      - renamed drm_nouveau_vm_init unmanaged_{addr,size} to
        kernel_managed_{addr,size}
      - add NOUVEAU_GEM_DOMAIN_NO_SHARE flag
    - allow VM_BIND and EXEC jobs with op_count == 0 (Faith)
    - add a common dma-resv object for the VM and handle
      NOUVEAU_GEM_DOMAIN_NO_SHARE accordingly
    - add armed_submit() callback to nouveau_job
    - make use of drm_gpuva_map() rather than open code the GPUVA initialization

  GPUVA Manager :
    - n/a (merged into drm-misc/drm-misc-next since V8)

  DRM GEM:
    - added a patch to fix lockdep checks of GEM GPUVA locks

Danilo Krummrich (11):
  drm/gem: fix lockdep check for dma-resv lock
  drm/nouveau: new VM_BIND uapi interfaces
  drm/nouveau: get vmm via nouveau_cli_vmm()
  drm/nouveau: bo: initialize GEM GPU VA interface
  drm/nouveau: move usercopy helpers to nouveau_drv.h
  drm/nouveau: fence: separate fence alloc and emit
  drm/nouveau: fence: fail to emit when fence context is killed
  drm/nouveau: chan: provide nouveau_channel_kill()
  drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  drm/nouveau: implement new VM_BIND uAPI
  drm/nouveau: debugfs: implement DRM GPU VA debugfs

 Documentation/gpu/driver-uapi.rst             |   11 +
 drivers/gpu/drm/nouveau/Kbuild                |    3 +
 drivers/gpu/drm/nouveau/Kconfig               |    2 +
 drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c          |  221 +-
 drivers/gpu/drm/nouveau/nouveau_bo.h          |    3 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
 drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
 drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
 drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h         |   93 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c        |  436 ++++
 drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
 drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
 drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |   86 +-
 drivers/gpu/drm/nouveau/nouveau_gem.h         |    3 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c       |   13 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |  444 ++++
 drivers/gpu/drm/nouveau/nouveau_sched.h       |  127 ++
 drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1946 +++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  108 +
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
 include/drm/drm_gem.h                         |   15 +-
 include/uapi/drm/nouveau_drm.h                |  217 ++
 40 files changed, 4362 insertions(+), 250 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h


base-commit: e4774e9968b26dc5d225ce629af8081ddab0029a
-- 
2.41.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-08  7:21   ` Boris Brezillon
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 02/11] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

When no custom lock is set to protect a GEMs GPUVA list, lockdep checks
should fall back to the GEM objects dma-resv lock. With the current
implementation we're setting the lock_dep_map of the GEM objects 'resv'
pointer (in case no custom lock_dep_map is set yet) on
drm_gem_private_object_init().

However, the GEM objects 'resv' pointer might still change after
drm_gem_private_object_init() is called, e.g. through
ttm_bo_init_reserved(). This can result in the wrong lock being tracked.

To fix this, call dma_resv_held() directly from
drm_gem_gpuva_assert_lock_held() and fall back to the GEMs lock_dep_map
pointer only if an actual custom lock is set.

Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 include/drm/drm_gem.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index c0b13c43b459..bc9f6aa2f3fe 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -551,15 +551,17 @@ int drm_gem_evict(struct drm_gem_object *obj);
  * @lock: the lock used to protect the gpuva list. The locking primitive
  * must contain a dep_map field.
  *
- * Call this if you're not proctecting access to the gpuva list
- * with the dma-resv lock, otherwise, drm_gem_gpuva_init() takes care
- * of initializing lock_dep_map for you.
+ * Call this if you're not proctecting access to the gpuva list with the
+ * dma-resv lock, but with a custom lock.
  */
 #define drm_gem_gpuva_set_lock(obj, lock) \
-	if (!(obj)->gpuva.lock_dep_map) \
+	if (!WARN((obj)->gpuva.lock_dep_map, \
+		  "GEM GPUVA lock should be set only once.")) \
 		(obj)->gpuva.lock_dep_map = &(lock)->dep_map
 #define drm_gem_gpuva_assert_lock_held(obj) \
-	lockdep_assert(lock_is_held((obj)->gpuva.lock_dep_map))
+	lockdep_assert((obj)->gpuva.lock_dep_map ? \
+		       lock_is_held((obj)->gpuva.lock_dep_map) : \
+		       dma_resv_held((obj)->resv))
 #else
 #define drm_gem_gpuva_set_lock(obj, lock) do {} while (0)
 #define drm_gem_gpuva_assert_lock_held(obj) do {} while (0)
@@ -573,11 +575,12 @@ int drm_gem_evict(struct drm_gem_object *obj);
  *
  * Calling this function is only necessary for drivers intending to support the
  * &drm_driver_feature DRIVER_GEM_GPUVA.
+ *
+ * See also drm_gem_gpuva_set_lock().
  */
 static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
 {
 	INIT_LIST_HEAD(&obj->gpuva.list);
-	drm_gem_gpuva_set_lock(obj, &obj->resv->lock.base);
 }
 
 /**
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 02/11] drm/nouveau: new VM_BIND uapi interfaces
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 03/11] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich,
	Dave Airlie

This commit provides the interfaces for the new UAPI motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
   DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
   VA area.

2) Bind and unbind GPU VA space mappings via the new
   DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
asynchronous processing with DRM syncobjs as synchronization mechanism.

The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/driver-uapi.rst |   8 ++
 include/uapi/drm/nouveau_drm.h    | 217 ++++++++++++++++++++++++++++++
 2 files changed, 225 insertions(+)

diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
index 4411e6919a3d..9c7ca6e33a68 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -6,3 +6,11 @@ drm/i915 uAPI
 =============
 
 .. kernel-doc:: include/uapi/drm/i915_drm.h
+
+drm/nouveau uAPI
+================
+
+VM_BIND / EXEC uAPI
+-------------------
+
+.. kernel-doc:: include/uapi/drm/nouveau_drm.h
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 853a327433d3..b567892c128d 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -38,6 +38,8 @@ extern "C" {
 #define NOUVEAU_GEM_DOMAIN_GART      (1 << 2)
 #define NOUVEAU_GEM_DOMAIN_MAPPABLE  (1 << 3)
 #define NOUVEAU_GEM_DOMAIN_COHERENT  (1 << 4)
+/* The BO will never be shared via import or export. */
+#define NOUVEAU_GEM_DOMAIN_NO_SHARE  (1 << 5)
 
 #define NOUVEAU_GEM_TILE_COMP        0x00030000 /* nv50-only */
 #define NOUVEAU_GEM_TILE_LAYOUT_MASK 0x0000ff00
@@ -126,6 +128,215 @@ struct drm_nouveau_gem_cpu_fini {
 	__u32 handle;
 };
 
+/**
+ * struct drm_nouveau_sync - sync object
+ *
+ * This structure serves as synchronization mechanism for (potentially)
+ * asynchronous operations such as EXEC or VM_BIND.
+ */
+struct drm_nouveau_sync {
+	/**
+	 * @flags: the flags for a sync object
+	 *
+	 * The first 8 bits are used to determine the type of the sync object.
+	 */
+	__u32 flags;
+#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
+#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
+#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
+	/**
+	 * @handle: the handle of the sync object
+	 */
+	__u32 handle;
+	/**
+	 * @timeline_value:
+	 *
+	 * The timeline point of the sync object in case the syncobj is of
+	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
+	 */
+	__u64 timeline_value;
+};
+
+/**
+ * struct drm_nouveau_vm_init - GPU VA space init structure
+ *
+ * Used to initialize the GPU's VA space for a user client, telling the kernel
+ * which portion of the VA space is managed by the UMD and kernel respectively.
+ *
+ * For the UMD to use the VM_BIND uAPI, this must be called before any BOs or
+ * channels are created; if called afterwards DRM_IOCTL_NOUVEAU_VM_INIT fails
+ * with -ENOSYS.
+ */
+struct drm_nouveau_vm_init {
+	/**
+	 * @kernel_managed_addr: start address of the kernel managed VA space
+	 * region
+	 */
+	__u64 kernel_managed_addr;
+	/**
+	 * @kernel_managed_size: size of the kernel managed VA space region in
+	 * bytes
+	 */
+	__u64 kernel_managed_size;
+};
+
+/**
+ * struct drm_nouveau_vm_bind_op - VM_BIND operation
+ *
+ * This structure represents a single VM_BIND operation. UMDs should pass
+ * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
+ */
+struct drm_nouveau_vm_bind_op {
+	/**
+	 * @op: the operation type
+	 */
+	__u32 op;
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_MAP:
+ *
+ * Map a GEM object to the GPU's VA space. Optionally, the
+ * &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to instruct the kernel to
+ * create sparse mappings for the given range.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x0
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
+ *
+ * Unmap an existing mapping in the GPU's VA space. If the region the mapping
+ * is located in is a sparse region, new sparse mappings are created where the
+ * unmapped (memory backed) mapping was mapped previously. To remove a sparse
+ * region the &DRM_NOUVEAU_VM_BIND_SPARSE must be set.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x1
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind_op
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_SPARSE:
+ *
+ * Indicates that an allocated VA space region should be sparse.
+ */
+#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
+	/**
+	 * @handle: the handle of the DRM GEM object to map
+	 */
+	__u32 handle;
+	/**
+	 * @pad: 32 bit padding, should be 0
+	 */
+	__u32 pad;
+	/**
+	 * @addr:
+	 *
+	 * the address the VA space region or (memory backed) mapping should be mapped to
+	 */
+	__u64 addr;
+	/**
+	 * @bo_offset: the offset within the BO backing the mapping
+	 */
+	__u64 bo_offset;
+	/**
+	 * @range: the size of the requested mapping in bytes
+	 */
+	__u64 range;
+};
+
+/**
+ * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
+ */
+struct drm_nouveau_vm_bind {
+	/**
+	 * @op_count: the number of &drm_nouveau_vm_bind_op
+	 */
+	__u32 op_count;
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
+ *
+ * Indicates that the given VM_BIND operation should be executed asynchronously
+ * by the kernel.
+ *
+ * If this flag is not supplied the kernel executes the associated operations
+ * synchronously and doesn't accept any &drm_nouveau_sync objects.
+ */
+#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
+	 */
+	__u64 op_ptr;
+};
+
+/**
+ * struct drm_nouveau_exec_push - EXEC push operation
+ *
+ * This structure represents a single EXEC push operation. UMDs should pass an
+ * array of this structure via struct drm_nouveau_exec's &push_ptr field.
+ */
+struct drm_nouveau_exec_push {
+	/**
+	 * @va: the virtual address of the push buffer mapping
+	 */
+	__u64 va;
+	/**
+	 * @va_len: the length of the push buffer mapping
+	 */
+	__u64 va_len;
+};
+
+/**
+ * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
+ */
+struct drm_nouveau_exec {
+	/**
+	 * @channel: the channel to execute the push buffer in
+	 */
+	__u32 channel;
+	/**
+	 * @push_count: the number of &drm_nouveau_exec_push ops
+	 */
+	__u32 push_count;
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
+	 */
+	__u64 push_ptr;
+};
+
 #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
 #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
 #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
@@ -136,6 +347,9 @@ struct drm_nouveau_gem_cpu_fini {
 #define DRM_NOUVEAU_NVIF               0x07
 #define DRM_NOUVEAU_SVM_INIT           0x08
 #define DRM_NOUVEAU_SVM_BIND           0x09
+#define DRM_NOUVEAU_VM_INIT            0x10
+#define DRM_NOUVEAU_VM_BIND            0x11
+#define DRM_NOUVEAU_EXEC               0x12
 #define DRM_NOUVEAU_GEM_NEW            0x40
 #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
 #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
@@ -197,6 +411,9 @@ struct drm_nouveau_svm_bind {
 #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
 #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
 
+#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
+#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
+#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
 #if defined(__cplusplus)
 }
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 03/11] drm/nouveau: get vmm via nouveau_cli_vmm()
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 02/11] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 04/11] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Provide a getter function for the client's current vmm context. Since
we'll add a new (u)vmm context for UMD bindings in subsequent commits,
this will keep the code clean.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h  | 9 +++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c  | 6 +++---
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c2ec91cc845d..7724fe63067d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -204,7 +204,7 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	struct nouveau_drm *drm = cli->drm;
 	struct nouveau_bo *nvbo;
 	struct nvif_mmu *mmu = &cli->mmu;
-	struct nvif_vmm *vmm = cli->svm.cli ? &cli->svm.vmm : &cli->vmm.vmm;
+	struct nvif_vmm *vmm = &nouveau_cli_vmm(cli)->vmm;
 	int i, pi = -1;
 
 	if (!*size) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 3dfbc374478e..6d639314250a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -149,7 +149,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device,
 
 	chan->device = device;
 	chan->drm = drm;
-	chan->vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	chan->vmm = nouveau_cli_vmm(cli);
 	atomic_set(&chan->killed, 0);
 
 	/* allocate memory for dma push buffer */
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index b5de312a523f..81350e685b50 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -112,6 +112,15 @@ struct nouveau_cli_work {
 	struct dma_fence_cb cb;
 };
 
+static inline struct nouveau_vmm *
+nouveau_cli_vmm(struct nouveau_cli *cli)
+{
+	if (cli->svm.cli)
+		return &cli->svm;
+
+	return &cli->vmm;
+}
+
 void nouveau_cli_work_queue(struct nouveau_cli *, struct dma_fence *,
 			    struct nouveau_cli_work *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index ab9062e50977..45ca4eb98f54 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -103,7 +103,7 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -180,7 +180,7 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : & cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -269,7 +269,7 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
 {
 	struct nouveau_cli *cli = nouveau_cli(file_priv);
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 
 	if (is_power_of_2(nvbo->valid_domains))
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 04/11] drm/nouveau: bo: initialize GEM GPU VA interface
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (2 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 03/11] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 05/11] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Initialize the GEM's DRM GPU VA manager interface in preparation for the
(u)vmm implementation, provided by subsequent commits, to make use of it.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7724fe63067d..057bc995f19b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -215,11 +215,14 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	nvbo = kzalloc(sizeof(struct nouveau_bo), GFP_KERNEL);
 	if (!nvbo)
 		return ERR_PTR(-ENOMEM);
+
 	INIT_LIST_HEAD(&nvbo->head);
 	INIT_LIST_HEAD(&nvbo->entry);
 	INIT_LIST_HEAD(&nvbo->vma_list);
 	nvbo->bo.bdev = &drm->ttm.bdev;
 
+	drm_gem_gpuva_init(&nvbo->bo.base);
+
 	/* This is confusing, and doesn't actually mean we want an uncached
 	 * mapping, but is what NOUVEAU_GEM_DOMAIN_COHERENT gets translated
 	 * into in nouveau_gem_new().
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 05/11] drm/nouveau: move usercopy helpers to nouveau_drv.h
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (3 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 04/11] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit Danilo Krummrich
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Move the usercopy helpers to a common driver header file to make it
usable for the new API added in subsequent commits.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_drv.h | 26 ++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c | 26 --------------------------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 81350e685b50..d28236021971 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -130,6 +130,32 @@ nouveau_cli(struct drm_file *fpriv)
 	return fpriv ? fpriv->driver_priv : NULL;
 }
 
+static inline void
+u_free(void *addr)
+{
+	kvfree(addr);
+}
+
+static inline void *
+u_memcpya(uint64_t user, unsigned int nmemb, unsigned int size)
+{
+	void *mem;
+	void __user *userptr = (void __force __user *)(uintptr_t)user;
+
+	size *= nmemb;
+
+	mem = kvmalloc(size, GFP_KERNEL);
+	if (!mem)
+		return ERR_PTR(-ENOMEM);
+
+	if (copy_from_user(mem, userptr, size)) {
+		u_free(mem);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return mem;
+}
+
 #include <nvif/object.h>
 #include <nvif/parent.h>
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 45ca4eb98f54..a48f42aaeab9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -613,32 +613,6 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	return 0;
 }
 
-static inline void
-u_free(void *addr)
-{
-	kvfree(addr);
-}
-
-static inline void *
-u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
-{
-	void *mem;
-	void __user *userptr = (void __force __user *)(uintptr_t)user;
-
-	size *= nmemb;
-
-	mem = kvmalloc(size, GFP_KERNEL);
-	if (!mem)
-		return ERR_PTR(-ENOMEM);
-
-	if (copy_from_user(mem, userptr, size)) {
-		u_free(mem);
-		return ERR_PTR(-EFAULT);
-	}
-
-	return mem;
-}
-
 static int
 nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 				struct drm_nouveau_gem_pushbuf *req,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (4 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 05/11] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-07 18:07   ` Christian König
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 07/11] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
in order to emit fences within DMA fence signalling critical sections
(e.g. as typically done in the DRM GPU schedulers run_job() callback) we
need to separate fence allocation and fence emitting.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 ++++-
 drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 +++++++++++++++----------
 drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
 drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
 drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-----
 drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
 7 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
index a6f2e681bde9..a34924523133 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
 	PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x00000000);
 	PUSH_KICK(push);
 
-	ret = nouveau_fence_new(chan, false, pfence);
+	ret = nouveau_fence_new(pfence);
 	if (ret)
 		goto fail;
 
+	ret = nouveau_fence_emit(*pfence, chan);
+	if (ret)
+		goto fail_fence_unref;
+
 	return 0;
+
+fail_fence_unref:
+	nouveau_fence_unref(pfence);
 fail:
 	spin_lock_irqsave(&dev->event_lock, flags);
 	list_del(&s->head);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 057bc995f19b..e9cbbf594e6f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
 		mutex_lock(&cli->mutex);
 	else
 		mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
+
 	ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, ctx->interruptible);
-	if (ret == 0) {
-		ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
-		if (ret == 0) {
-			ret = nouveau_fence_new(chan, false, &fence);
-			if (ret == 0) {
-				/* TODO: figure out a better solution here
-				 *
-				 * wait on the fence here explicitly as going through
-				 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
-				 *
-				 * Without this the operation can timeout and we'll fallback to a
-				 * software copy, which might take several minutes to finish.
-				 */
-				nouveau_fence_wait(fence, false, false);
-				ret = ttm_bo_move_accel_cleanup(bo,
-								&fence->base,
-								evict, false,
-								new_reg);
-				nouveau_fence_unref(&fence);
-			}
-		}
+	if (ret)
+		goto out_unlock;
+
+	ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
+	if (ret)
+		goto out_unlock;
+
+	ret = nouveau_fence_new(&fence);
+	if (ret)
+		goto out_unlock;
+
+	ret = nouveau_fence_emit(fence, chan);
+	if (ret) {
+		nouveau_fence_unref(&fence);
+		goto out_unlock;
 	}
+
+	/* TODO: figure out a better solution here
+	 *
+	 * wait on the fence here explicitly as going through
+	 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+	 *
+	 * Without this the operation can timeout and we'll fallback to a
+	 * software copy, which might take several minutes to finish.
+	 */
+	nouveau_fence_wait(fence, false, false);
+	ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict, false,
+					new_reg);
+	nouveau_fence_unref(&fence);
+
+out_unlock:
 	mutex_unlock(&cli->mutex);
 	return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 6d639314250a..f69be4c8f9f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
 		struct nouveau_fence *fence = NULL;
 		int ret;
 
-		ret = nouveau_fence_new(chan, false, &fence);
+		ret = nouveau_fence_new(&fence);
 		if (!ret) {
-			ret = nouveau_fence_wait(fence, false, false);
+			ret = nouveau_fence_emit(fence, chan);
+			if (!ret)
+				ret = nouveau_fence_wait(fence, false, false);
 			nouveau_fence_unref(&fence);
 		}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 789857faa048..4ad40e42cae1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -209,7 +209,8 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
 		goto done;
 	}
 
-	nouveau_fence_new(dmem->migrate.chan, false, &fence);
+	if (!nouveau_fence_new(&fence))
+		nouveau_fence_emit(fence, dmem->migrate.chan);
 	migrate_vma_pages(&args);
 	nouveau_dmem_fence_done(&fence);
 	dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
@@ -402,7 +403,8 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
 		}
 	}
 
-	nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence);
+	if (!nouveau_fence_new(&fence))
+		nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
 	migrate_device_pages(src_pfns, dst_pfns, npages);
 	nouveau_dmem_fence_done(&fence);
 	migrate_device_finalize(src_pfns, dst_pfns, npages);
@@ -675,7 +677,8 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm,
 		addr += PAGE_SIZE;
 	}
 
-	nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
+	if (!nouveau_fence_new(&fence))
+		nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
 	migrate_vma_pages(args);
 	nouveau_dmem_fence_done(&fence);
 	nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ee5e9d40c166..e946408f945b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -210,6 +210,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 	struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
 	int ret;
 
+	if (unlikely(!chan->fence))
+		return -ENODEV;
+
 	fence->channel  = chan;
 	fence->timeout  = jiffies + (15 * HZ);
 
@@ -396,25 +399,16 @@ nouveau_fence_unref(struct nouveau_fence **pfence)
 }
 
 int
-nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
-		  struct nouveau_fence **pfence)
+nouveau_fence_new(struct nouveau_fence **pfence)
 {
 	struct nouveau_fence *fence;
-	int ret = 0;
-
-	if (unlikely(!chan->fence))
-		return -ENODEV;
 
 	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
 	if (!fence)
 		return -ENOMEM;
 
-	ret = nouveau_fence_emit(fence, chan);
-	if (ret)
-		nouveau_fence_unref(&fence);
-
 	*pfence = fence;
-	return ret;
+	return 0;
 }
 
 static const char *nouveau_fence_get_get_driver_name(struct dma_fence *fence)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 0ca2bc85adf6..7c73c7c9834a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -17,8 +17,7 @@ struct nouveau_fence {
 	unsigned long timeout;
 };
 
-int  nouveau_fence_new(struct nouveau_channel *, bool sysmem,
-		       struct nouveau_fence **);
+int  nouveau_fence_new(struct nouveau_fence **);
 void nouveau_fence_unref(struct nouveau_fence **);
 
 int  nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a48f42aaeab9..9c8d1b911a01 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -873,8 +873,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 		}
 	}
 
-	ret = nouveau_fence_new(chan, false, &fence);
+	ret = nouveau_fence_new(&fence);
+	if (!ret)
+		ret = nouveau_fence_emit(fence, chan);
 	if (ret) {
+		nouveau_fence_unref(&fence);
 		NV_PRINTK(err, cli, "error fencing pushbuf: %d\n", ret);
 		WIND_RING(chan);
 		goto out;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 07/11] drm/nouveau: fence: fail to emit when fence context is killed
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (5 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 08/11] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index e946408f945b..77c739a55b19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -229,6 +230,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 7c73c7c9834a..2c72d96ef17d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -44,7 +44,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 08/11] drm/nouveau: chan: provide nouveau_channel_kill()
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (6 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 07/11] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 09/11] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting fences.

If a job times out, we need a way to recover from this situation. For
now, simply kill the channel to unblock all hung up jobs and signal
userspace that the device is dead on the next EXEC or VM_BIND ioctl.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_chan.c | 14 +++++++++++---
 drivers/gpu/drm/nouveau/nouveau_chan.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index f69be4c8f9f2..1fd5ccf41128 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -40,6 +40,14 @@ MODULE_PARM_DESC(vram_pushbuf, "Create DMA push buffers in VRAM");
 int nouveau_vram_pushbuf;
 module_param_named(vram_pushbuf, nouveau_vram_pushbuf, int, 0400);
 
+void
+nouveau_channel_kill(struct nouveau_channel *chan)
+{
+	atomic_set(&chan->killed, 1);
+	if (chan->fence)
+		nouveau_fence_context_kill(chan->fence, -ENODEV);
+}
+
 static int
 nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 {
@@ -47,9 +55,9 @@ nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 	struct nouveau_cli *cli = (void *)chan->user.client;
 
 	NV_PRINTK(warn, cli, "channel %d killed!\n", chan->chid);
-	atomic_set(&chan->killed, 1);
-	if (chan->fence)
-		nouveau_fence_context_kill(chan->fence, -ENODEV);
+
+	if (unlikely(!atomic_read(&chan->killed)))
+		nouveau_channel_kill(chan);
 
 	return NVIF_EVENT_DROP;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.h b/drivers/gpu/drm/nouveau/nouveau_chan.h
index bad7466bd0d5..5de2ef4e98c2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.h
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.h
@@ -66,6 +66,7 @@ int  nouveau_channel_new(struct nouveau_drm *, struct nvif_device *, bool priv,
 			 u32 vram, u32 gart, struct nouveau_channel **);
 void nouveau_channel_del(struct nouveau_channel **);
 int  nouveau_channel_idle(struct nouveau_channel *);
+void nouveau_channel_kill(struct nouveau_channel *);
 
 extern int nouveau_vram_pushbuf;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 09/11] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (7 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 08/11] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 11/11] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
  2023-08-03 21:44 ` [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Dave Airlie
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI uses the DRM GPU VA manager to manage the VA space.
Hence, we a need a way to manipulate the MMUs page tables without going
through the internal range allocator implemented by nvkm/vmm.

This patch adds a raw interface for nvkm/vmm to pass the resposibility
for managing the address space and the corresponding map/unmap/sparse
operations to the upper layers.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |  26 ++-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |  19 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |  20 +-
 drivers/gpu/drm/nouveau/nouveau_svm.c         |   2 +-
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |   4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            | 100 +++++++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    | 213 ++++++++++++++++--
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c | 197 ++++++++++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |  25 ++
 .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |  16 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |  16 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |  27 ++-
 12 files changed, 566 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index 9c7ff56831c5..a5a182b3c28d 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -3,7 +3,10 @@
 struct nvif_vmm_v0 {
 	__u8  version;
 	__u8  page_nr;
-	__u8  managed;
+#define NVIF_VMM_V0_TYPE_UNMANAGED                                         0x00
+#define NVIF_VMM_V0_TYPE_MANAGED                                           0x01
+#define NVIF_VMM_V0_TYPE_RAW                                               0x02
+	__u8  type;
 	__u8  pad03[5];
 	__u64 addr;
 	__u64 size;
@@ -17,6 +20,7 @@ struct nvif_vmm_v0 {
 #define NVIF_VMM_V0_UNMAP                                                  0x04
 #define NVIF_VMM_V0_PFNMAP                                                 0x05
 #define NVIF_VMM_V0_PFNCLR                                                 0x06
+#define NVIF_VMM_V0_RAW                                                    0x07
 #define NVIF_VMM_V0_MTHD(i)                                         ((i) + 0x80)
 
 struct nvif_vmm_page_v0 {
@@ -66,6 +70,26 @@ struct nvif_vmm_unmap_v0 {
 	__u64 addr;
 };
 
+struct nvif_vmm_raw_v0 {
+	__u8 version;
+#define NVIF_VMM_RAW_V0_GET	0x0
+#define NVIF_VMM_RAW_V0_PUT	0x1
+#define NVIF_VMM_RAW_V0_MAP	0x2
+#define NVIF_VMM_RAW_V0_UNMAP	0x3
+#define NVIF_VMM_RAW_V0_SPARSE	0x4
+	__u8  op;
+	__u8  sparse;
+	__u8  ref;
+	__u8  shift;
+	__u32 argc;
+	__u8  pad01[7];
+	__u64 addr;
+	__u64 size;
+	__u64 offset;
+	__u64 memory;
+	__u64 argv;
+};
+
 struct nvif_vmm_pfnmap_v0 {
 	__u8  version;
 	__u8  page;
diff --git a/drivers/gpu/drm/nouveau/include/nvif/vmm.h b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
index a2ee92201ace..0ecedd0ee0a5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/vmm.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
@@ -4,6 +4,12 @@
 struct nvif_mem;
 struct nvif_mmu;
 
+enum nvif_vmm_type {
+	UNMANAGED,
+	MANAGED,
+	RAW,
+};
+
 enum nvif_vmm_get {
 	ADDR,
 	PTES,
@@ -30,8 +36,9 @@ struct nvif_vmm {
 	int page_nr;
 };
 
-int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass, bool managed,
-		  u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *);
+int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass,
+		  enum nvif_vmm_type, u64 addr, u64 size, void *argv, u32 argc,
+		  struct nvif_vmm *);
 void nvif_vmm_dtor(struct nvif_vmm *);
 int nvif_vmm_get(struct nvif_vmm *, enum nvif_vmm_get, bool sparse,
 		 u8 page, u8 align, u64 size, struct nvif_vma *);
@@ -39,4 +46,12 @@ void nvif_vmm_put(struct nvif_vmm *, struct nvif_vma *);
 int nvif_vmm_map(struct nvif_vmm *, u64 addr, u64 size, void *argv, u32 argc,
 		 struct nvif_mem *, u64 offset);
 int nvif_vmm_unmap(struct nvif_vmm *, u64);
+
+int nvif_vmm_raw_get(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift);
+int nvif_vmm_raw_put(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift);
+int nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift,
+		     void *argv, u32 argc, struct nvif_mem *mem, u64 offset);
+int nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 addr, u64 size,
+		       u8 shift, bool sparse);
+int nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref);
 #endif
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
index 70e7887ef4b4..2fd2f2433fc7 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
@@ -17,6 +17,7 @@ struct nvkm_vma {
 	bool part:1; /* Region was split from an allocated region by map(). */
 	bool busy:1; /* Region busy (for temporarily preventing user access). */
 	bool mapped:1; /* Region contains valid pages. */
+	bool no_comp:1; /* Force no memory compression. */
 	struct nvkm_memory *memory; /* Memory currently mapped into VMA. */
 	struct nvkm_tags *tags; /* Compression tag reference. */
 };
@@ -27,10 +28,26 @@ struct nvkm_vmm {
 	const char *name;
 	u32 debug;
 	struct kref kref;
-	struct mutex mutex;
+
+	struct {
+		struct mutex vmm;
+		struct mutex ref;
+		struct mutex map;
+	} mutex;
 
 	u64 start;
 	u64 limit;
+	struct {
+		struct {
+			u64 addr;
+			u64 size;
+		} p;
+		struct {
+			u64 addr;
+			u64 size;
+		} n;
+		bool raw;
+	} managed;
 
 	struct nvkm_vmm_pt *pd;
 	struct list_head join;
@@ -70,6 +87,7 @@ struct nvkm_vmm_map {
 
 	const struct nvkm_vmm_page *page;
 
+	bool no_comp;
 	struct nvkm_tags *tags;
 	u64 next;
 	u64 type;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a74ba8d84ba7..186351ecf72f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -350,7 +350,7 @@ nouveau_svmm_init(struct drm_device *dev, void *data,
 	 * VMM instead of the standard one.
 	 */
 	ret = nvif_vmm_ctor(&cli->mmu, "svmVmm",
-			    cli->vmm.vmm.object.oclass, true,
+			    cli->vmm.vmm.object.oclass, MANAGED,
 			    args->unmanaged_addr, args->unmanaged_size,
 			    &(struct gp100_vmm_v0) {
 				.fault_replay = true,
diff --git a/drivers/gpu/drm/nouveau/nouveau_vmm.c b/drivers/gpu/drm/nouveau/nouveau_vmm.c
index 67d6619fcd5e..a6602c012671 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vmm.c
@@ -128,8 +128,8 @@ nouveau_vmm_fini(struct nouveau_vmm *vmm)
 int
 nouveau_vmm_init(struct nouveau_cli *cli, s32 oclass, struct nouveau_vmm *vmm)
 {
-	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, false, PAGE_SIZE,
-				0, NULL, 0, &vmm->vmm);
+	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, UNMANAGED,
+				PAGE_SIZE, 0, NULL, 0, &vmm->vmm);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nvif/vmm.c b/drivers/gpu/drm/nouveau/nvif/vmm.c
index 6053d6dc2184..99296f03371a 100644
--- a/drivers/gpu/drm/nouveau/nvif/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvif/vmm.c
@@ -104,6 +104,90 @@ nvif_vmm_get(struct nvif_vmm *vmm, enum nvif_vmm_get type, bool sparse,
 	return ret;
 }
 
+int
+nvif_vmm_raw_get(struct nvif_vmm *vmm, u64 addr, u64 size,
+		 u8 shift)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_GET,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_put(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_PUT,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size, u8 shift,
+		 void *argv, u32 argc, struct nvif_mem *mem, u64 offset)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_MAP,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+		.memory = nvif_handle(&mem->object),
+		.offset = offset,
+		.argv = (u64)(uintptr_t)argv,
+		.argc = argc,
+	};
+
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 addr, u64 size,
+		   u8 shift, bool sparse)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_UNMAP,
+		.addr = addr,
+		.size = size,
+		.shift = shift,
+		.sparse = sparse,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_SPARSE,
+		.addr = addr,
+		.size = size,
+		.ref = ref,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
 void
 nvif_vmm_dtor(struct nvif_vmm *vmm)
 {
@@ -112,8 +196,9 @@ nvif_vmm_dtor(struct nvif_vmm *vmm)
 }
 
 int
-nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
-	      u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *vmm)
+nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass,
+	      enum nvif_vmm_type type, u64 addr, u64 size, void *argv, u32 argc,
+	      struct nvif_vmm *vmm)
 {
 	struct nvif_vmm_v0 *args;
 	u32 argn = sizeof(*args) + argc;
@@ -125,9 +210,18 @@ nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
 	if (!(args = kmalloc(argn, GFP_KERNEL)))
 		return -ENOMEM;
 	args->version = 0;
-	args->managed = managed;
 	args->addr = addr;
 	args->size = size;
+
+	switch (type) {
+	case UNMANAGED: args->type = NVIF_VMM_V0_TYPE_UNMANAGED; break;
+	case MANAGED: args->type = NVIF_VMM_V0_TYPE_MANAGED; break;
+	case RAW: args->type = NVIF_VMM_V0_TYPE_RAW; break;
+	default:
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
 	memcpy(args->data, argv, argc);
 
 	ret = nvif_object_ctor(&mmu->object, name ? name : "nvifVmm", 0,
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
index 524cd3c0e3fe..38b7ced934b1 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
@@ -58,10 +58,13 @@ nvkm_uvmm_mthd_pfnclr(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	if (size) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = nvkm_vmm_pfn_unmap(vmm, addr, size);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 
 	return ret;
@@ -88,10 +91,13 @@ nvkm_uvmm_mthd_pfnmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	if (size) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = nvkm_vmm_pfn_map(vmm, page, addr, size, phys);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 
 	return ret;
@@ -113,7 +119,10 @@ nvkm_uvmm_mthd_unmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	if (nvkm_vmm_in_managed_range(vmm, addr, 0) && vmm->managed.raw)
+		return -EINVAL;
+
+	mutex_lock(&vmm->mutex.vmm);
 	vma = nvkm_vmm_node_search(vmm, addr);
 	if (ret = -ENOENT, !vma || vma->addr != addr) {
 		VMM_DEBUG(vmm, "lookup %016llx: %016llx",
@@ -134,7 +143,7 @@ nvkm_uvmm_mthd_unmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	nvkm_vmm_unmap_locked(vmm, vma, false);
 	ret = 0;
 done:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -159,13 +168,16 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_vmm_in_managed_range(vmm, addr, size) && vmm->managed.raw)
+		return -EINVAL;
+
 	memory = nvkm_umem_search(client, handle);
 	if (IS_ERR(memory)) {
 		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
 		return PTR_ERR(memory);
 	}
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	if (ret = -ENOENT, !(vma = nvkm_vmm_node_search(vmm, addr))) {
 		VMM_DEBUG(vmm, "lookup %016llx", addr);
 		goto fail;
@@ -198,7 +210,7 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 		}
 	}
 	vma->busy = true;
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 
 	ret = nvkm_memory_map(memory, offset, vmm, vma, argv, argc);
 	if (ret == 0) {
@@ -207,11 +219,11 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 		return 0;
 	}
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	vma->busy = false;
 	nvkm_vmm_unmap_region(vmm, vma);
 fail:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	nvkm_memory_unref(&memory);
 	return ret;
 }
@@ -232,7 +244,7 @@ nvkm_uvmm_mthd_put(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	vma = nvkm_vmm_node_search(vmm, args->v0.addr);
 	if (ret = -ENOENT, !vma || vma->addr != addr || vma->part) {
 		VMM_DEBUG(vmm, "lookup %016llx: %016llx %d", addr,
@@ -248,7 +260,7 @@ nvkm_uvmm_mthd_put(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	nvkm_vmm_put_locked(vmm, vma);
 	ret = 0;
 done:
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -275,10 +287,10 @@ nvkm_uvmm_mthd_get(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_get_locked(vmm, getref, mapref, sparse,
 				  page, align, size, &vma);
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	if (ret)
 		return ret;
 
@@ -314,6 +326,167 @@ nvkm_uvmm_mthd_page(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	return 0;
 }
 
+static inline int
+nvkm_uvmm_page_index(struct nvkm_uvmm *uvmm, u64 size, u8 shift, u8 *refd)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	const struct nvkm_vmm_page *page;
+
+	if (likely(shift)) {
+		for (page = vmm->func->page; page->shift; page++) {
+			if (shift == page->shift)
+				break;
+		}
+
+		if (!page->shift || !IS_ALIGNED(size, 1ULL << page->shift)) {
+			VMM_DEBUG(vmm, "page %d %016llx", shift, size);
+			return -EINVAL;
+		}
+	} else {
+		return -EINVAL;
+	}
+	*refd = page - vmm->func->page;
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_get(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	return nvkm_vmm_raw_get(vmm, args->addr, args->size, refd);
+}
+
+static int
+nvkm_uvmm_mthd_raw_put(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	nvkm_vmm_raw_put(vmm, args->addr, args->size, refd);
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_map(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_client *client = uvmm->object.client;
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	struct nvkm_vma vma = {
+		.addr = args->addr,
+		.size = args->size,
+		.used = true,
+		.mapref = false,
+		.no_comp = true,
+	};
+	struct nvkm_memory *memory;
+	u64 handle = args->memory;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	vma.page = vma.refd = refd;
+
+	memory = nvkm_umem_search(client, args->memory);
+	if (IS_ERR(memory)) {
+		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
+		return PTR_ERR(memory);
+	}
+
+	ret = nvkm_memory_map(memory, args->offset, vmm, &vma,
+			      (void *)args->argv, args->argc);
+
+	nvkm_memory_unref(&vma.memory);
+	nvkm_memory_unref(&memory);
+	return ret;
+}
+
+static int
+nvkm_uvmm_mthd_raw_unmap(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	u8 refd;
+	int ret;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	ret = nvkm_uvmm_page_index(uvmm, args->size, args->shift, &refd);
+	if (ret)
+		return ret;
+
+	nvkm_vmm_raw_unmap(vmm, args->addr, args->size,
+			   args->sparse, refd);
+
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_sparse(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+
+	if (!nvkm_vmm_in_managed_range(vmm, args->addr, args->size))
+		return -EINVAL;
+
+	return nvkm_vmm_raw_sparse(vmm, args->addr, args->size, args->ref);
+}
+
+static int
+nvkm_uvmm_mthd_raw(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
+{
+	union {
+		struct nvif_vmm_raw_v0 v0;
+	} *args = argv;
+	int ret = -ENOSYS;
+
+	if (!uvmm->vmm->managed.raw)
+		return -EINVAL;
+
+	if ((ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, true)))
+		return ret;
+
+	switch (args->v0.op) {
+	case NVIF_VMM_RAW_V0_GET:
+		return nvkm_uvmm_mthd_raw_get(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_PUT:
+		return nvkm_uvmm_mthd_raw_put(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_MAP:
+		return nvkm_uvmm_mthd_raw_map(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_UNMAP:
+		return nvkm_uvmm_mthd_raw_unmap(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_SPARSE:
+		return nvkm_uvmm_mthd_raw_sparse(uvmm, &args->v0);
+	default:
+		return -EINVAL;
+	};
+}
+
 static int
 nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 {
@@ -326,6 +499,7 @@ nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 	case NVIF_VMM_V0_UNMAP : return nvkm_uvmm_mthd_unmap (uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNMAP: return nvkm_uvmm_mthd_pfnmap(uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNCLR: return nvkm_uvmm_mthd_pfnclr(uvmm, argv, argc);
+	case NVIF_VMM_V0_RAW   : return nvkm_uvmm_mthd_raw   (uvmm, argv, argc);
 	case NVIF_VMM_V0_MTHD(0x00) ... NVIF_VMM_V0_MTHD(0x7f):
 		if (uvmm->vmm->func->mthd) {
 			return uvmm->vmm->func->mthd(uvmm->vmm,
@@ -366,10 +540,11 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 	struct nvkm_uvmm *uvmm;
 	int ret = -ENOSYS;
 	u64 addr, size;
-	bool managed;
+	bool managed, raw;
 
 	if (!(ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, more))) {
-		managed = args->v0.managed != 0;
+		managed = args->v0.type == NVIF_VMM_V0_TYPE_MANAGED;
+		raw = args->v0.type == NVIF_VMM_V0_TYPE_RAW;
 		addr = args->v0.addr;
 		size = args->v0.size;
 	} else
@@ -377,12 +552,13 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 
 	if (!(uvmm = kzalloc(sizeof(*uvmm), GFP_KERNEL)))
 		return -ENOMEM;
+
 	nvkm_object_ctor(&nvkm_uvmm, oclass, &uvmm->object);
 	*pobject = &uvmm->object;
 
 	if (!mmu->vmm) {
-		ret = mmu->func->vmm.ctor(mmu, managed, addr, size, argv, argc,
-					  NULL, "user", &uvmm->vmm);
+		ret = mmu->func->vmm.ctor(mmu, managed || raw, addr, size,
+					  argv, argc, NULL, "user", &uvmm->vmm);
 		if (ret)
 			return ret;
 
@@ -393,6 +569,7 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 
 		uvmm->vmm = nvkm_vmm_ref(mmu->vmm);
 	}
+	uvmm->vmm->managed.raw = raw;
 
 	page = uvmm->vmm->func->page;
 	args->v0.page_nr = 0;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
index ae793f400ba1..eb5fcadcb39a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -676,41 +676,18 @@ nvkm_vmm_ptes_sparse(struct nvkm_vmm *vmm, u64 addr, u64 size, bool ref)
 	return 0;
 }
 
-static void
-nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-			u64 addr, u64 size, bool sparse, bool pfn)
-{
-	const struct nvkm_vmm_desc_func *func = page->desc->func;
-	nvkm_vmm_iter(vmm, page, addr, size, "unmap + unref",
-		      false, pfn, nvkm_vmm_unref_ptes, NULL, NULL,
-		      sparse ? func->sparse : func->invalid ? func->invalid :
-							      func->unmap);
-}
-
-static int
-nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-		      u64 addr, u64 size, struct nvkm_vmm_map *map,
-		      nvkm_vmm_pte_func func)
-{
-	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref + map", true,
-				 false, nvkm_vmm_ref_ptes, func, map, NULL);
-	if (fail != ~0ULL) {
-		if ((size = fail - addr))
-			nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, false, false);
-		return -ENOMEM;
-	}
-	return 0;
-}
-
 static void
 nvkm_vmm_ptes_unmap(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		    u64 addr, u64 size, bool sparse, bool pfn)
 {
 	const struct nvkm_vmm_desc_func *func = page->desc->func;
+
+	mutex_lock(&vmm->mutex.map);
 	nvkm_vmm_iter(vmm, page, addr, size, "unmap", false, pfn,
 		      NULL, NULL, NULL,
 		      sparse ? func->sparse : func->invalid ? func->invalid :
 							      func->unmap);
+	mutex_unlock(&vmm->mutex.map);
 }
 
 static void
@@ -718,33 +695,108 @@ nvkm_vmm_ptes_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		  u64 addr, u64 size, struct nvkm_vmm_map *map,
 		  nvkm_vmm_pte_func func)
 {
+	mutex_lock(&vmm->mutex.map);
 	nvkm_vmm_iter(vmm, page, addr, size, "map", false, false,
 		      NULL, func, map, NULL);
+	mutex_unlock(&vmm->mutex.map);
 }
 
 static void
-nvkm_vmm_ptes_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
-		  u64 addr, u64 size)
+nvkm_vmm_ptes_put_locked(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			 u64 addr, u64 size)
 {
 	nvkm_vmm_iter(vmm, page, addr, size, "unref", false, false,
 		      nvkm_vmm_unref_ptes, NULL, NULL, NULL);
 }
 
+static void
+nvkm_vmm_ptes_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+		  u64 addr, u64 size)
+{
+	mutex_lock(&vmm->mutex.ref);
+	nvkm_vmm_ptes_put_locked(vmm, page, addr, size);
+	mutex_unlock(&vmm->mutex.ref);
+}
+
 static int
 nvkm_vmm_ptes_get(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 		  u64 addr, u64 size)
 {
-	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref", true, false,
-				 nvkm_vmm_ref_ptes, NULL, NULL, NULL);
+	u64 fail;
+
+	mutex_lock(&vmm->mutex.ref);
+	fail = nvkm_vmm_iter(vmm, page, addr, size, "ref", true, false,
+			     nvkm_vmm_ref_ptes, NULL, NULL, NULL);
 	if (fail != ~0ULL) {
 		if (fail != addr)
-			nvkm_vmm_ptes_put(vmm, page, addr, fail - addr);
+			nvkm_vmm_ptes_put_locked(vmm, page, addr, fail - addr);
+		mutex_unlock(&vmm->mutex.ref);
+		return -ENOMEM;
+	}
+	mutex_unlock(&vmm->mutex.ref);
+	return 0;
+}
+
+static void
+__nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			  u64 addr, u64 size, bool sparse, bool pfn)
+{
+	const struct nvkm_vmm_desc_func *func = page->desc->func;
+
+	nvkm_vmm_iter(vmm, page, addr, size, "unmap + unref",
+		      false, pfn, nvkm_vmm_unref_ptes, NULL, NULL,
+		      sparse ? func->sparse : func->invalid ? func->invalid :
+							      func->unmap);
+}
+
+static void
+nvkm_vmm_ptes_unmap_put(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			u64 addr, u64 size, bool sparse, bool pfn)
+{
+	if (vmm->managed.raw) {
+		nvkm_vmm_ptes_unmap(vmm, page, addr, size, sparse, pfn);
+		nvkm_vmm_ptes_put(vmm, page, addr, size);
+	} else {
+		__nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, sparse, pfn);
+	}
+}
+
+static int
+__nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+			u64 addr, u64 size, struct nvkm_vmm_map *map,
+			nvkm_vmm_pte_func func)
+{
+	u64 fail = nvkm_vmm_iter(vmm, page, addr, size, "ref + map", true,
+				 false, nvkm_vmm_ref_ptes, func, map, NULL);
+	if (fail != ~0ULL) {
+		if ((size = fail - addr))
+			nvkm_vmm_ptes_unmap_put(vmm, page, addr, size, false, false);
 		return -ENOMEM;
 	}
 	return 0;
 }
 
-static inline struct nvkm_vma *
+static int
+nvkm_vmm_ptes_get_map(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
+		      u64 addr, u64 size, struct nvkm_vmm_map *map,
+		      nvkm_vmm_pte_func func)
+{
+	int ret;
+
+	if (vmm->managed.raw) {
+		ret = nvkm_vmm_ptes_get(vmm, page, addr, size);
+		if (ret)
+			return ret;
+
+		nvkm_vmm_ptes_map(vmm, page, addr, size, map, func);
+
+		return 0;
+	} else {
+		return __nvkm_vmm_ptes_get_map(vmm, page, addr, size, map, func);
+	}
+}
+
+struct nvkm_vma *
 nvkm_vma_new(u64 addr, u64 size)
 {
 	struct nvkm_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
@@ -1045,7 +1097,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 	vmm->debug = mmu->subdev.debug;
 	kref_init(&vmm->kref);
 
-	__mutex_init(&vmm->mutex, "&vmm->mutex", key ? key : &_key);
+	__mutex_init(&vmm->mutex.vmm, "&vmm->mutex.vmm", key ? key : &_key);
+	mutex_init(&vmm->mutex.ref);
+	mutex_init(&vmm->mutex.map);
 
 	/* Locate the smallest page size supported by the backend, it will
 	 * have the deepest nesting of page tables.
@@ -1101,6 +1155,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		if (addr && (ret = nvkm_vmm_ctor_managed(vmm, 0, addr)))
 			return ret;
 
+		vmm->managed.p.addr = 0;
+		vmm->managed.p.size = addr;
+
 		/* NVKM-managed area. */
 		if (size) {
 			if (!(vma = nvkm_vma_new(addr, size)))
@@ -1114,6 +1171,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		size = vmm->limit - addr;
 		if (size && (ret = nvkm_vmm_ctor_managed(vmm, addr, size)))
 			return ret;
+
+		vmm->managed.n.addr = addr;
+		vmm->managed.n.size = size;
 	} else {
 		/* Address-space fully managed by NVKM, requiring calls to
 		 * nvkm_vmm_get()/nvkm_vmm_put() to allocate address-space.
@@ -1362,9 +1422,9 @@ void
 nvkm_vmm_unmap(struct nvkm_vmm *vmm, struct nvkm_vma *vma)
 {
 	if (vma->memory) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		nvkm_vmm_unmap_locked(vmm, vma, false);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 }
 
@@ -1423,6 +1483,8 @@ nvkm_vmm_map_locked(struct nvkm_vmm *vmm, struct nvkm_vma *vma,
 	nvkm_vmm_pte_func func;
 	int ret;
 
+	map->no_comp = vma->no_comp;
+
 	/* Make sure we won't overrun the end of the memory object. */
 	if (unlikely(nvkm_memory_size(map->memory) < map->offset + vma->size)) {
 		VMM_DEBUG(vmm, "overrun %016llx %016llx %016llx",
@@ -1507,10 +1569,15 @@ nvkm_vmm_map(struct nvkm_vmm *vmm, struct nvkm_vma *vma, void *argv, u32 argc,
 	     struct nvkm_vmm_map *map)
 {
 	int ret;
-	mutex_lock(&vmm->mutex);
+
+	if (nvkm_vmm_in_managed_range(vmm, vma->addr, vma->size) &&
+	    vmm->managed.raw)
+		return nvkm_vmm_map_locked(vmm, vma, argv, argc, map);
+
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_map_locked(vmm, vma, argv, argc, map);
 	vma->busy = false;
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
 	return ret;
 }
 
@@ -1620,9 +1687,9 @@ nvkm_vmm_put(struct nvkm_vmm *vmm, struct nvkm_vma **pvma)
 {
 	struct nvkm_vma *vma = *pvma;
 	if (vma) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		nvkm_vmm_put_locked(vmm, vma);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 		*pvma = NULL;
 	}
 }
@@ -1769,9 +1836,49 @@ int
 nvkm_vmm_get(struct nvkm_vmm *vmm, u8 page, u64 size, struct nvkm_vma **pvma)
 {
 	int ret;
-	mutex_lock(&vmm->mutex);
+	mutex_lock(&vmm->mutex.vmm);
 	ret = nvkm_vmm_get_locked(vmm, false, true, false, page, 0, size, pvma);
-	mutex_unlock(&vmm->mutex);
+	mutex_unlock(&vmm->mutex.vmm);
+	return ret;
+}
+
+void
+nvkm_vmm_raw_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size,
+		   bool sparse, u8 refd)
+{
+	const struct nvkm_vmm_page *page = &vmm->func->page[refd];
+
+	nvkm_vmm_ptes_unmap(vmm, page, addr, size, sparse, false);
+}
+
+void
+nvkm_vmm_raw_put(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd)
+{
+	const struct nvkm_vmm_page *page = vmm->func->page;
+
+	nvkm_vmm_ptes_put(vmm, &page[refd], addr, size);
+}
+
+int
+nvkm_vmm_raw_get(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd)
+{
+	const struct nvkm_vmm_page *page = vmm->func->page;
+
+	if (unlikely(!size))
+		return -EINVAL;
+
+	return nvkm_vmm_ptes_get(vmm, &page[refd], addr, size);
+}
+
+int
+nvkm_vmm_raw_sparse(struct nvkm_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	int ret;
+
+	mutex_lock(&vmm->mutex.ref);
+	ret = nvkm_vmm_ptes_sparse(vmm, addr, size, ref);
+	mutex_unlock(&vmm->mutex.ref);
+
 	return ret;
 }
 
@@ -1779,9 +1886,9 @@ void
 nvkm_vmm_part(struct nvkm_vmm *vmm, struct nvkm_memory *inst)
 {
 	if (inst && vmm && vmm->func->part) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		vmm->func->part(vmm, inst);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 }
 
@@ -1790,9 +1897,9 @@ nvkm_vmm_join(struct nvkm_vmm *vmm, struct nvkm_memory *inst)
 {
 	int ret = 0;
 	if (vmm->func->join) {
-		mutex_lock(&vmm->mutex);
+		mutex_lock(&vmm->mutex.vmm);
 		ret = vmm->func->join(vmm, inst);
-		mutex_unlock(&vmm->mutex);
+		mutex_unlock(&vmm->mutex.vmm);
 	}
 	return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
index f6188aa9171c..f9bc30cdb2b3 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
@@ -163,6 +163,7 @@ int nvkm_vmm_new_(const struct nvkm_vmm_func *, struct nvkm_mmu *,
 		  u32 pd_header, bool managed, u64 addr, u64 size,
 		  struct lock_class_key *, const char *name,
 		  struct nvkm_vmm **);
+struct nvkm_vma *nvkm_vma_new(u64 addr, u64 size);
 struct nvkm_vma *nvkm_vmm_node_search(struct nvkm_vmm *, u64 addr);
 struct nvkm_vma *nvkm_vmm_node_split(struct nvkm_vmm *, struct nvkm_vma *,
 				     u64 addr, u64 size);
@@ -173,6 +174,30 @@ void nvkm_vmm_put_locked(struct nvkm_vmm *, struct nvkm_vma *);
 void nvkm_vmm_unmap_locked(struct nvkm_vmm *, struct nvkm_vma *, bool pfn);
 void nvkm_vmm_unmap_region(struct nvkm_vmm *, struct nvkm_vma *);
 
+int nvkm_vmm_raw_get(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd);
+void nvkm_vmm_raw_put(struct nvkm_vmm *vmm, u64 addr, u64 size, u8 refd);
+void nvkm_vmm_raw_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size,
+			bool sparse, u8 refd);
+int nvkm_vmm_raw_sparse(struct nvkm_vmm *, u64 addr, u64 size, bool ref);
+
+static inline bool
+nvkm_vmm_in_managed_range(struct nvkm_vmm *vmm, u64 start, u64 size)
+{
+	u64 p_start = vmm->managed.p.addr;
+	u64 p_end = p_start + vmm->managed.p.size;
+	u64 n_start = vmm->managed.n.addr;
+	u64 n_end = n_start + vmm->managed.n.size;
+	u64 end = start + size;
+
+	if (start >= p_start && end <= p_end)
+		return true;
+
+	if (start >= n_start && end <= n_end)
+		return true;
+
+	return false;
+}
+
 #define NVKM_VMM_PFN_ADDR                                 0xfffffffffffff000ULL
 #define NVKM_VMM_PFN_ADDR_SHIFT                                              12
 #define NVKM_VMM_PFN_APER                                 0x00000000000000f0ULL
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
index 5438384d9a67..5e857c02e9aa 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c
@@ -287,15 +287,17 @@ gf100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags,
-					   nvkm_ltc_tags_clear,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags,
+						   nvkm_ltc_tags_clear,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 		}
 
-		if (map->tags->mn) {
+		if (!map->no_comp && map->tags->mn) {
 			u64 tags = map->tags->mn->offset + (map->offset >> 17);
 			if (page->shift == 17 || !gm20x) {
 				map->type |= tags << 44;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index 17899fc95b2d..f3630d0e0d55 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -453,15 +453,17 @@ gp100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags,
-					   nvkm_ltc_tags_clear,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags,
+						   nvkm_ltc_tags_clear,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 		}
 
-		if (map->tags->mn) {
+		if (!map->no_comp && map->tags->mn) {
 			tags = map->tags->mn->offset + (map->offset >> 16);
 			map->ctag |= ((1ULL << page->shift) >> 16) << 36;
 			map->type |= tags << 36;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
index b7548dcd72c7..ff08ad5005a9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c
@@ -296,19 +296,22 @@ nv50_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
 			return -EINVAL;
 		}
 
-		ret = nvkm_memory_tags_get(memory, device, tags, NULL,
-					   &map->tags);
-		if (ret) {
-			VMM_DEBUG(vmm, "comp %d", ret);
-			return ret;
-		}
+		if (!map->no_comp) {
+			ret = nvkm_memory_tags_get(memory, device, tags, NULL,
+						   &map->tags);
+			if (ret) {
+				VMM_DEBUG(vmm, "comp %d", ret);
+				return ret;
+			}
 
-		if (map->tags->mn) {
-			u32 tags = map->tags->mn->offset + (map->offset >> 16);
-			map->ctag |= (u64)comp << 49;
-			map->type |= (u64)comp << 47;
-			map->type |= (u64)tags << 49;
-			map->next |= map->ctag;
+			if (map->tags->mn) {
+				u32 tags = map->tags->mn->offset +
+					   (map->offset >> 16);
+				map->ctag |= (u64)comp << 49;
+				map->type |= (u64)comp << 47;
+				map->type |= (u64)tags << 49;
+				map->next |= map->ctag;
+			}
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH drm-misc-next v9 11/11] drm/nouveau: debugfs: implement DRM GPU VA debugfs
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (8 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 09/11] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
@ 2023-08-03 16:52 ` Danilo Krummrich
  2023-08-03 21:44 ` [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Dave Airlie
  10 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-03 16:52 UTC (permalink / raw)
  To: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Provide the driver indirection iterating over all DRM GPU VA spaces to
enable the common 'gpuvas' debugfs file for dumping DRM GPU VA spaces.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_debugfs.c | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 99d022a91afc..053f703f2f68 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -203,6 +203,44 @@ nouveau_debugfs_pstate_open(struct inode *inode, struct file *file)
 	return single_open(file, nouveau_debugfs_pstate_get, inode->i_private);
 }
 
+static void
+nouveau_debugfs_gpuva_regions(struct seq_file *m, struct nouveau_uvmm *uvmm)
+{
+	MA_STATE(mas, &uvmm->region_mt, 0, 0);
+	struct nouveau_uvma_region *reg;
+
+	seq_puts  (m, " VA regions  | start              | range              | end                \n");
+	seq_puts  (m, "----------------------------------------------------------------------------\n");
+	mas_for_each(&mas, reg, ULONG_MAX)
+		seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx\n",
+			   reg->va.addr, reg->va.range, reg->va.addr + reg->va.range);
+}
+
+static int
+nouveau_debugfs_gpuva(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct nouveau_drm *drm = nouveau_drm(node->minor->dev);
+	struct nouveau_cli *cli;
+
+	mutex_lock(&drm->clients_lock);
+	list_for_each_entry(cli, &drm->clients, head) {
+		struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
+
+		if (!uvmm)
+			continue;
+
+		nouveau_uvmm_lock(uvmm);
+		drm_debugfs_gpuva_info(m, &uvmm->umgr);
+		seq_puts(m, "\n");
+		nouveau_debugfs_gpuva_regions(m, uvmm);
+		nouveau_uvmm_unlock(uvmm);
+	}
+	mutex_unlock(&drm->clients_lock);
+
+	return 0;
+}
+
 static const struct file_operations nouveau_pstate_fops = {
 	.owner = THIS_MODULE,
 	.open = nouveau_debugfs_pstate_open,
@@ -214,6 +252,7 @@ static const struct file_operations nouveau_pstate_fops = {
 static struct drm_info_list nouveau_debugfs_list[] = {
 	{ "vbios.rom",  nouveau_debugfs_vbios_image, 0, NULL },
 	{ "strap_peek", nouveau_debugfs_strap_peek, 0, NULL },
+	DRM_DEBUGFS_GPUVA_INFO(nouveau_debugfs_gpuva, NULL),
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged)
  2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
                   ` (9 preceding siblings ...)
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 11/11] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
@ 2023-08-03 21:44 ` Dave Airlie
  10 siblings, 0 replies; 17+ messages in thread
From: Dave Airlie @ 2023-08-03 21:44 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, tzimmermann, mripard, corbet, christian.koenig, bskeggs,
	Liam.Howlett, matthew.brost, boris.brezillon, alexdeucher,
	ogabbay, bagasdotme, willy, jason, donald.robson, dri-devel,
	nouveau, linux-doc, linux-kernel

On Fri, 4 Aug 2023 at 02:52, Danilo Krummrich <dakr@redhat.com> wrote:
>
> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
>

Now that Faith has reviewed the uAPI and userspace work, I think we
should try and steer this in.

I think the only thing I see is the SPDX + MIT header in some places,
I think we can drop the MIT bits where SPDX is there, and leave
copyright and authorship (if you like), personally I've been leaving
authorship up to git, as it saves trouble with people randomly
emailing you about things you wrote 10 years ago.

Otherwise for the series:
Reviewed-by: Dave Airlie <airlied@redhat.com>

Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit Danilo Krummrich
@ 2023-08-07 18:07   ` Christian König
  2023-08-07 18:54     ` Danilo Krummrich
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2023-08-07 18:07 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson,
	Daniel Vetter
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

Am 03.08.23 um 18:52 schrieb Danilo Krummrich:
> The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
> in order to emit fences within DMA fence signalling critical sections
> (e.g. as typically done in the DRM GPU schedulers run_job() callback) we
> need to separate fence allocation and fence emitting.

At least from the description that sounds like it might be illegal. 
Daniel can you take a look as well.

What exactly are you doing here?

Regards,
Christian.

>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 ++++-
>   drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 +++++++++++++++----------
>   drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
>   drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
>   drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-----
>   drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
>   7 files changed, 59 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index a6f2e681bde9..a34924523133 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
>   	PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x00000000);
>   	PUSH_KICK(push);
>   
> -	ret = nouveau_fence_new(chan, false, pfence);
> +	ret = nouveau_fence_new(pfence);
>   	if (ret)
>   		goto fail;
>   
> +	ret = nouveau_fence_emit(*pfence, chan);
> +	if (ret)
> +		goto fail_fence_unref;
> +
>   	return 0;
> +
> +fail_fence_unref:
> +	nouveau_fence_unref(pfence);
>   fail:
>   	spin_lock_irqsave(&dev->event_lock, flags);
>   	list_del(&s->head);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 057bc995f19b..e9cbbf594e6f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
>   		mutex_lock(&cli->mutex);
>   	else
>   		mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
> +
>   	ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, ctx->interruptible);
> -	if (ret == 0) {
> -		ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
> -		if (ret == 0) {
> -			ret = nouveau_fence_new(chan, false, &fence);
> -			if (ret == 0) {
> -				/* TODO: figure out a better solution here
> -				 *
> -				 * wait on the fence here explicitly as going through
> -				 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> -				 *
> -				 * Without this the operation can timeout and we'll fallback to a
> -				 * software copy, which might take several minutes to finish.
> -				 */
> -				nouveau_fence_wait(fence, false, false);
> -				ret = ttm_bo_move_accel_cleanup(bo,
> -								&fence->base,
> -								evict, false,
> -								new_reg);
> -				nouveau_fence_unref(&fence);
> -			}
> -		}
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = nouveau_fence_new(&fence);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = nouveau_fence_emit(fence, chan);
> +	if (ret) {
> +		nouveau_fence_unref(&fence);
> +		goto out_unlock;
>   	}
> +
> +	/* TODO: figure out a better solution here
> +	 *
> +	 * wait on the fence here explicitly as going through
> +	 * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> +	 *
> +	 * Without this the operation can timeout and we'll fallback to a
> +	 * software copy, which might take several minutes to finish.
> +	 */
> +	nouveau_fence_wait(fence, false, false);
> +	ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict, false,
> +					new_reg);
> +	nouveau_fence_unref(&fence);
> +
> +out_unlock:
>   	mutex_unlock(&cli->mutex);
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
> index 6d639314250a..f69be4c8f9f2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_chan.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
> @@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
>   		struct nouveau_fence *fence = NULL;
>   		int ret;
>   
> -		ret = nouveau_fence_new(chan, false, &fence);
> +		ret = nouveau_fence_new(&fence);
>   		if (!ret) {
> -			ret = nouveau_fence_wait(fence, false, false);
> +			ret = nouveau_fence_emit(fence, chan);
> +			if (!ret)
> +				ret = nouveau_fence_wait(fence, false, false);
>   			nouveau_fence_unref(&fence);
>   		}
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 789857faa048..4ad40e42cae1 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -209,7 +209,8 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>   		goto done;
>   	}
>   
> -	nouveau_fence_new(dmem->migrate.chan, false, &fence);
> +	if (!nouveau_fence_new(&fence))
> +		nouveau_fence_emit(fence, dmem->migrate.chan);
>   	migrate_vma_pages(&args);
>   	nouveau_dmem_fence_done(&fence);
>   	dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> @@ -402,7 +403,8 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
>   		}
>   	}
>   
> -	nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence);
> +	if (!nouveau_fence_new(&fence))
> +		nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>   	migrate_device_pages(src_pfns, dst_pfns, npages);
>   	nouveau_dmem_fence_done(&fence);
>   	migrate_device_finalize(src_pfns, dst_pfns, npages);
> @@ -675,7 +677,8 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm,
>   		addr += PAGE_SIZE;
>   	}
>   
> -	nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
> +	if (!nouveau_fence_new(&fence))
> +		nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>   	migrate_vma_pages(args);
>   	nouveau_dmem_fence_done(&fence);
>   	nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index ee5e9d40c166..e946408f945b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -210,6 +210,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
>   	struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
>   	int ret;
>   
> +	if (unlikely(!chan->fence))
> +		return -ENODEV;
> +
>   	fence->channel  = chan;
>   	fence->timeout  = jiffies + (15 * HZ);
>   
> @@ -396,25 +399,16 @@ nouveau_fence_unref(struct nouveau_fence **pfence)
>   }
>   
>   int
> -nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
> -		  struct nouveau_fence **pfence)
> +nouveau_fence_new(struct nouveau_fence **pfence)
>   {
>   	struct nouveau_fence *fence;
> -	int ret = 0;
> -
> -	if (unlikely(!chan->fence))
> -		return -ENODEV;
>   
>   	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>   	if (!fence)
>   		return -ENOMEM;
>   
> -	ret = nouveau_fence_emit(fence, chan);
> -	if (ret)
> -		nouveau_fence_unref(&fence);
> -
>   	*pfence = fence;
> -	return ret;
> +	return 0;
>   }
>   
>   static const char *nouveau_fence_get_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
> index 0ca2bc85adf6..7c73c7c9834a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
> @@ -17,8 +17,7 @@ struct nouveau_fence {
>   	unsigned long timeout;
>   };
>   
> -int  nouveau_fence_new(struct nouveau_channel *, bool sysmem,
> -		       struct nouveau_fence **);
> +int  nouveau_fence_new(struct nouveau_fence **);
>   void nouveau_fence_unref(struct nouveau_fence **);
>   
>   int  nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index a48f42aaeab9..9c8d1b911a01 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -873,8 +873,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
>   		}
>   	}
>   
> -	ret = nouveau_fence_new(chan, false, &fence);
> +	ret = nouveau_fence_new(&fence);
> +	if (!ret)
> +		ret = nouveau_fence_emit(fence, chan);
>   	if (ret) {
> +		nouveau_fence_unref(&fence);
>   		NV_PRINTK(err, cli, "error fencing pushbuf: %d\n", ret);
>   		WIND_RING(chan);
>   		goto out;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit
  2023-08-07 18:07   ` Christian König
@ 2023-08-07 18:54     ` Danilo Krummrich
  2023-08-08  6:06       ` Christian König
  0 siblings, 1 reply; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-07 18:54 UTC (permalink / raw)
  To: Christian König, airlied, daniel, tzimmermann, mripard,
	corbet, bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

Hi Christian,

On 8/7/23 20:07, Christian König wrote:
> Am 03.08.23 um 18:52 schrieb Danilo Krummrich:
>> The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
>> in order to emit fences within DMA fence signalling critical sections
>> (e.g. as typically done in the DRM GPU schedulers run_job() callback) we
>> need to separate fence allocation and fence emitting.
> 
> At least from the description that sounds like it might be illegal. 
> Daniel can you take a look as well.
> 
> What exactly are you doing here?

I'm basically doing exactly the same as amdgpu_fence_emit() does in 
amdgpu_ib_schedule() called by amdgpu_job_run().

The difference - and this is what this patch is for - is that I separate 
the fence allocation from emitting the fence, such that the fence 
structure is allocated before the job is submitted to the GPU scheduler. 
amdgpu solves this with GFP_ATOMIC within amdgpu_fence_emit() to 
allocate the fence structure in this case.

- Danilo

> 
> Regards,
> Christian.
> 
>>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 ++++-
>>   drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 +++++++++++++++----------
>>   drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
>>   drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-----
>>   drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
>>   drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
>>   7 files changed, 59 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
>> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>> index a6f2e681bde9..a34924523133 100644
>> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>> @@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel *chan,
>>       PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x00000000);
>>       PUSH_KICK(push);
>> -    ret = nouveau_fence_new(chan, false, pfence);
>> +    ret = nouveau_fence_new(pfence);
>>       if (ret)
>>           goto fail;
>> +    ret = nouveau_fence_emit(*pfence, chan);
>> +    if (ret)
>> +        goto fail_fence_unref;
>> +
>>       return 0;
>> +
>> +fail_fence_unref:
>> +    nouveau_fence_unref(pfence);
>>   fail:
>>       spin_lock_irqsave(&dev->event_lock, flags);
>>       list_del(&s->head);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
>> b/drivers/gpu/drm/nouveau/nouveau_bo.c
>> index 057bc995f19b..e9cbbf594e6f 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
>> @@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object 
>> *bo, int evict,
>>           mutex_lock(&cli->mutex);
>>       else
>>           mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
>> +
>>       ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
>> ctx->interruptible);
>> -    if (ret == 0) {
>> -        ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
>> -        if (ret == 0) {
>> -            ret = nouveau_fence_new(chan, false, &fence);
>> -            if (ret == 0) {
>> -                /* TODO: figure out a better solution here
>> -                 *
>> -                 * wait on the fence here explicitly as going through
>> -                 * ttm_bo_move_accel_cleanup somehow doesn't seem to 
>> do it.
>> -                 *
>> -                 * Without this the operation can timeout and we'll 
>> fallback to a
>> -                 * software copy, which might take several minutes to 
>> finish.
>> -                 */
>> -                nouveau_fence_wait(fence, false, false);
>> -                ret = ttm_bo_move_accel_cleanup(bo,
>> -                                &fence->base,
>> -                                evict, false,
>> -                                new_reg);
>> -                nouveau_fence_unref(&fence);
>> -            }
>> -        }
>> +    if (ret)
>> +        goto out_unlock;
>> +
>> +    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
>> +    if (ret)
>> +        goto out_unlock;
>> +
>> +    ret = nouveau_fence_new(&fence);
>> +    if (ret)
>> +        goto out_unlock;
>> +
>> +    ret = nouveau_fence_emit(fence, chan);
>> +    if (ret) {
>> +        nouveau_fence_unref(&fence);
>> +        goto out_unlock;
>>       }
>> +
>> +    /* TODO: figure out a better solution here
>> +     *
>> +     * wait on the fence here explicitly as going through
>> +     * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
>> +     *
>> +     * Without this the operation can timeout and we'll fallback to a
>> +     * software copy, which might take several minutes to finish.
>> +     */
>> +    nouveau_fence_wait(fence, false, false);
>> +    ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict, false,
>> +                    new_reg);
>> +    nouveau_fence_unref(&fence);
>> +
>> +out_unlock:
>>       mutex_unlock(&cli->mutex);
>>       return ret;
>>   }
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
>> b/drivers/gpu/drm/nouveau/nouveau_chan.c
>> index 6d639314250a..f69be4c8f9f2 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_chan.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
>> @@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
>>           struct nouveau_fence *fence = NULL;
>>           int ret;
>> -        ret = nouveau_fence_new(chan, false, &fence);
>> +        ret = nouveau_fence_new(&fence);
>>           if (!ret) {
>> -            ret = nouveau_fence_wait(fence, false, false);
>> +            ret = nouveau_fence_emit(fence, chan);
>> +            if (!ret)
>> +                ret = nouveau_fence_wait(fence, false, false);
>>               nouveau_fence_unref(&fence);
>>           }
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index 789857faa048..4ad40e42cae1 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -209,7 +209,8 @@ static vm_fault_t 
>> nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>>           goto done;
>>       }
>> -    nouveau_fence_new(dmem->migrate.chan, false, &fence);
>> +    if (!nouveau_fence_new(&fence))
>> +        nouveau_fence_emit(fence, dmem->migrate.chan);
>>       migrate_vma_pages(&args);
>>       nouveau_dmem_fence_done(&fence);
>>       dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, 
>> DMA_BIDIRECTIONAL);
>> @@ -402,7 +403,8 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk 
>> *chunk)
>>           }
>>       }
>> -    nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence);
>> +    if (!nouveau_fence_new(&fence))
>> +        nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>>       migrate_device_pages(src_pfns, dst_pfns, npages);
>>       nouveau_dmem_fence_done(&fence);
>>       migrate_device_finalize(src_pfns, dst_pfns, npages);
>> @@ -675,7 +677,8 @@ static void nouveau_dmem_migrate_chunk(struct 
>> nouveau_drm *drm,
>>           addr += PAGE_SIZE;
>>       }
>> -    nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
>> +    if (!nouveau_fence_new(&fence))
>> +        nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>>       migrate_vma_pages(args);
>>       nouveau_dmem_fence_done(&fence);
>>       nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
>> b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> index ee5e9d40c166..e946408f945b 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> @@ -210,6 +210,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, 
>> struct nouveau_channel *chan)
>>       struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
>>       int ret;
>> +    if (unlikely(!chan->fence))
>> +        return -ENODEV;
>> +
>>       fence->channel  = chan;
>>       fence->timeout  = jiffies + (15 * HZ);
>> @@ -396,25 +399,16 @@ nouveau_fence_unref(struct nouveau_fence **pfence)
>>   }
>>   int
>> -nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
>> -          struct nouveau_fence **pfence)
>> +nouveau_fence_new(struct nouveau_fence **pfence)
>>   {
>>       struct nouveau_fence *fence;
>> -    int ret = 0;
>> -
>> -    if (unlikely(!chan->fence))
>> -        return -ENODEV;
>>       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>       if (!fence)
>>           return -ENOMEM;
>> -    ret = nouveau_fence_emit(fence, chan);
>> -    if (ret)
>> -        nouveau_fence_unref(&fence);
>> -
>>       *pfence = fence;
>> -    return ret;
>> +    return 0;
>>   }
>>   static const char *nouveau_fence_get_get_driver_name(struct 
>> dma_fence *fence)
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
>> b/drivers/gpu/drm/nouveau/nouveau_fence.h
>> index 0ca2bc85adf6..7c73c7c9834a 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
>> @@ -17,8 +17,7 @@ struct nouveau_fence {
>>       unsigned long timeout;
>>   };
>> -int  nouveau_fence_new(struct nouveau_channel *, bool sysmem,
>> -               struct nouveau_fence **);
>> +int  nouveau_fence_new(struct nouveau_fence **);
>>   void nouveau_fence_unref(struct nouveau_fence **);
>>   int  nouveau_fence_emit(struct nouveau_fence *, struct 
>> nouveau_channel *);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
>> b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> index a48f42aaeab9..9c8d1b911a01 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> @@ -873,8 +873,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, 
>> void *data,
>>           }
>>       }
>> -    ret = nouveau_fence_new(chan, false, &fence);
>> +    ret = nouveau_fence_new(&fence);
>> +    if (!ret)
>> +        ret = nouveau_fence_emit(fence, chan);
>>       if (ret) {
>> +        nouveau_fence_unref(&fence);
>>           NV_PRINTK(err, cli, "error fencing pushbuf: %d\n", ret);
>>           WIND_RING(chan);
>>           goto out;
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit
  2023-08-07 18:54     ` Danilo Krummrich
@ 2023-08-08  6:06       ` Christian König
  0 siblings, 0 replies; 17+ messages in thread
From: Christian König @ 2023-08-08  6:06 UTC (permalink / raw)
  To: Danilo Krummrich, airlied, daniel, tzimmermann, mripard, corbet,
	bskeggs, Liam.Howlett, matthew.brost, boris.brezillon,
	alexdeucher, ogabbay, bagasdotme, willy, jason, donald.robson
  Cc: dri-devel, nouveau, linux-doc, linux-kernel



Am 07.08.23 um 20:54 schrieb Danilo Krummrich:
> Hi Christian,
>
> On 8/7/23 20:07, Christian König wrote:
>> Am 03.08.23 um 18:52 schrieb Danilo Krummrich:
>>> The new (VM_BIND) UAPI exports DMA fences through DRM syncobjs. Hence,
>>> in order to emit fences within DMA fence signalling critical sections
>>> (e.g. as typically done in the DRM GPU schedulers run_job() 
>>> callback) we
>>> need to separate fence allocation and fence emitting.
>>
>> At least from the description that sounds like it might be illegal. 
>> Daniel can you take a look as well.
>>
>> What exactly are you doing here?
>
> I'm basically doing exactly the same as amdgpu_fence_emit() does in 
> amdgpu_ib_schedule() called by amdgpu_job_run().
>
> The difference - and this is what this patch is for - is that I 
> separate the fence allocation from emitting the fence, such that the 
> fence structure is allocated before the job is submitted to the GPU 
> scheduler. amdgpu solves this with GFP_ATOMIC within 
> amdgpu_fence_emit() to allocate the fence structure in this case.

Yeah, that use case is perfectly valid. Maybe update the commit message 
a bit to better describe that.

Something like "Separate fence allocation and emitting to avoid 
allocation within DMA fence signalling critical sections inside the DRM 
scheduler. This helps implementing the new UAPI....".

Regards,
Christian.

>
> - Danilo
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>> ---
>>>   drivers/gpu/drm/nouveau/dispnv04/crtc.c |  9 ++++-
>>>   drivers/gpu/drm/nouveau/nouveau_bo.c    | 52 
>>> +++++++++++++++----------
>>>   drivers/gpu/drm/nouveau/nouveau_chan.c  |  6 ++-
>>>   drivers/gpu/drm/nouveau/nouveau_dmem.c  |  9 +++--
>>>   drivers/gpu/drm/nouveau/nouveau_fence.c | 16 +++-----
>>>   drivers/gpu/drm/nouveau/nouveau_fence.h |  3 +-
>>>   drivers/gpu/drm/nouveau/nouveau_gem.c   |  5 ++-
>>>   7 files changed, 59 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
>>> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> index a6f2e681bde9..a34924523133 100644
>>> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> @@ -1122,11 +1122,18 @@ nv04_page_flip_emit(struct nouveau_channel 
>>> *chan,
>>>       PUSH_NVSQ(push, NV_SW, NV_SW_PAGE_FLIP, 0x00000000);
>>>       PUSH_KICK(push);
>>> -    ret = nouveau_fence_new(chan, false, pfence);
>>> +    ret = nouveau_fence_new(pfence);
>>>       if (ret)
>>>           goto fail;
>>> +    ret = nouveau_fence_emit(*pfence, chan);
>>> +    if (ret)
>>> +        goto fail_fence_unref;
>>> +
>>>       return 0;
>>> +
>>> +fail_fence_unref:
>>> +    nouveau_fence_unref(pfence);
>>>   fail:
>>>       spin_lock_irqsave(&dev->event_lock, flags);
>>>       list_del(&s->head);
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_bo.c
>>> index 057bc995f19b..e9cbbf594e6f 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
>>> @@ -820,29 +820,39 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object 
>>> *bo, int evict,
>>>           mutex_lock(&cli->mutex);
>>>       else
>>>           mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
>>> +
>>>       ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, 
>>> ctx->interruptible);
>>> -    if (ret == 0) {
>>> -        ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
>>> -        if (ret == 0) {
>>> -            ret = nouveau_fence_new(chan, false, &fence);
>>> -            if (ret == 0) {
>>> -                /* TODO: figure out a better solution here
>>> -                 *
>>> -                 * wait on the fence here explicitly as going through
>>> -                 * ttm_bo_move_accel_cleanup somehow doesn't seem 
>>> to do it.
>>> -                 *
>>> -                 * Without this the operation can timeout and we'll 
>>> fallback to a
>>> -                 * software copy, which might take several minutes 
>>> to finish.
>>> -                 */
>>> -                nouveau_fence_wait(fence, false, false);
>>> -                ret = ttm_bo_move_accel_cleanup(bo,
>>> -                                &fence->base,
>>> -                                evict, false,
>>> -                                new_reg);
>>> -                nouveau_fence_unref(&fence);
>>> -            }
>>> -        }
>>> +    if (ret)
>>> +        goto out_unlock;
>>> +
>>> +    ret = drm->ttm.move(chan, bo, bo->resource, new_reg);
>>> +    if (ret)
>>> +        goto out_unlock;
>>> +
>>> +    ret = nouveau_fence_new(&fence);
>>> +    if (ret)
>>> +        goto out_unlock;
>>> +
>>> +    ret = nouveau_fence_emit(fence, chan);
>>> +    if (ret) {
>>> +        nouveau_fence_unref(&fence);
>>> +        goto out_unlock;
>>>       }
>>> +
>>> +    /* TODO: figure out a better solution here
>>> +     *
>>> +     * wait on the fence here explicitly as going through
>>> +     * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
>>> +     *
>>> +     * Without this the operation can timeout and we'll fallback to a
>>> +     * software copy, which might take several minutes to finish.
>>> +     */
>>> +    nouveau_fence_wait(fence, false, false);
>>> +    ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict, false,
>>> +                    new_reg);
>>> +    nouveau_fence_unref(&fence);
>>> +
>>> +out_unlock:
>>>       mutex_unlock(&cli->mutex);
>>>       return ret;
>>>   }
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_chan.c
>>> index 6d639314250a..f69be4c8f9f2 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_chan.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
>>> @@ -62,9 +62,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
>>>           struct nouveau_fence *fence = NULL;
>>>           int ret;
>>> -        ret = nouveau_fence_new(chan, false, &fence);
>>> +        ret = nouveau_fence_new(&fence);
>>>           if (!ret) {
>>> -            ret = nouveau_fence_wait(fence, false, false);
>>> +            ret = nouveau_fence_emit(fence, chan);
>>> +            if (!ret)
>>> +                ret = nouveau_fence_wait(fence, false, false);
>>>               nouveau_fence_unref(&fence);
>>>           }
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> index 789857faa048..4ad40e42cae1 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> @@ -209,7 +209,8 @@ static vm_fault_t 
>>> nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>>>           goto done;
>>>       }
>>> -    nouveau_fence_new(dmem->migrate.chan, false, &fence);
>>> +    if (!nouveau_fence_new(&fence))
>>> +        nouveau_fence_emit(fence, dmem->migrate.chan);
>>>       migrate_vma_pages(&args);
>>>       nouveau_dmem_fence_done(&fence);
>>>       dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, 
>>> DMA_BIDIRECTIONAL);
>>> @@ -402,7 +403,8 @@ nouveau_dmem_evict_chunk(struct 
>>> nouveau_dmem_chunk *chunk)
>>>           }
>>>       }
>>> - nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence);
>>> +    if (!nouveau_fence_new(&fence))
>>> +        nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>>>       migrate_device_pages(src_pfns, dst_pfns, npages);
>>>       nouveau_dmem_fence_done(&fence);
>>>       migrate_device_finalize(src_pfns, dst_pfns, npages);
>>> @@ -675,7 +677,8 @@ static void nouveau_dmem_migrate_chunk(struct 
>>> nouveau_drm *drm,
>>>           addr += PAGE_SIZE;
>>>       }
>>> -    nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
>>> +    if (!nouveau_fence_new(&fence))
>>> +        nouveau_fence_emit(fence, chunk->drm->dmem->migrate.chan);
>>>       migrate_vma_pages(args);
>>>       nouveau_dmem_fence_done(&fence);
>>>       nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i);
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> index ee5e9d40c166..e946408f945b 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> @@ -210,6 +210,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, 
>>> struct nouveau_channel *chan)
>>>       struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
>>>       int ret;
>>> +    if (unlikely(!chan->fence))
>>> +        return -ENODEV;
>>> +
>>>       fence->channel  = chan;
>>>       fence->timeout  = jiffies + (15 * HZ);
>>> @@ -396,25 +399,16 @@ nouveau_fence_unref(struct nouveau_fence 
>>> **pfence)
>>>   }
>>>   int
>>> -nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
>>> -          struct nouveau_fence **pfence)
>>> +nouveau_fence_new(struct nouveau_fence **pfence)
>>>   {
>>>       struct nouveau_fence *fence;
>>> -    int ret = 0;
>>> -
>>> -    if (unlikely(!chan->fence))
>>> -        return -ENODEV;
>>>       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>       if (!fence)
>>>           return -ENOMEM;
>>> -    ret = nouveau_fence_emit(fence, chan);
>>> -    if (ret)
>>> -        nouveau_fence_unref(&fence);
>>> -
>>>       *pfence = fence;
>>> -    return ret;
>>> +    return 0;
>>>   }
>>>   static const char *nouveau_fence_get_get_driver_name(struct 
>>> dma_fence *fence)
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h 
>>> b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> index 0ca2bc85adf6..7c73c7c9834a 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> @@ -17,8 +17,7 @@ struct nouveau_fence {
>>>       unsigned long timeout;
>>>   };
>>> -int  nouveau_fence_new(struct nouveau_channel *, bool sysmem,
>>> -               struct nouveau_fence **);
>>> +int  nouveau_fence_new(struct nouveau_fence **);
>>>   void nouveau_fence_unref(struct nouveau_fence **);
>>>   int  nouveau_fence_emit(struct nouveau_fence *, struct 
>>> nouveau_channel *);
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_gem.c
>>> index a48f42aaeab9..9c8d1b911a01 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>>> @@ -873,8 +873,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device 
>>> *dev, void *data,
>>>           }
>>>       }
>>> -    ret = nouveau_fence_new(chan, false, &fence);
>>> +    ret = nouveau_fence_new(&fence);
>>> +    if (!ret)
>>> +        ret = nouveau_fence_emit(fence, chan);
>>>       if (ret) {
>>> +        nouveau_fence_unref(&fence);
>>>           NV_PRINTK(err, cli, "error fencing pushbuf: %d\n", ret);
>>>           WIND_RING(chan);
>>>           goto out;
>>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock
  2023-08-03 16:52 ` [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock Danilo Krummrich
@ 2023-08-08  7:21   ` Boris Brezillon
  2023-08-09 22:40     ` Danilo Krummrich
  0 siblings, 1 reply; 17+ messages in thread
From: Boris Brezillon @ 2023-08-08  7:21 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, alexdeucher, ogabbay,
	bagasdotme, willy, jason, donald.robson, dri-devel, nouveau,
	linux-doc, linux-kernel

On Thu,  3 Aug 2023 18:52:20 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> When no custom lock is set to protect a GEMs GPUVA list, lockdep checks
> should fall back to the GEM objects dma-resv lock. With the current
> implementation we're setting the lock_dep_map of the GEM objects 'resv'
> pointer (in case no custom lock_dep_map is set yet) on
> drm_gem_private_object_init().
> 
> However, the GEM objects 'resv' pointer might still change after
> drm_gem_private_object_init() is called, e.g. through
> ttm_bo_init_reserved(). This can result in the wrong lock being tracked.
> 
> To fix this, call dma_resv_held() directly from
> drm_gem_gpuva_assert_lock_held() and fall back to the GEMs lock_dep_map
> pointer only if an actual custom lock is set.
> 
> Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

but I'm wondering if it wouldn't be a good thing to add a
drm_gem_set_resv() helper, so the core can control drm_gem_object::resv
re-assignments (block them if it's happening after the GEM has been
exposed to the outside world or update auxiliary data if it's happening
before that).

> ---
>  include/drm/drm_gem.h | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index c0b13c43b459..bc9f6aa2f3fe 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -551,15 +551,17 @@ int drm_gem_evict(struct drm_gem_object *obj);
>   * @lock: the lock used to protect the gpuva list. The locking primitive
>   * must contain a dep_map field.
>   *
> - * Call this if you're not proctecting access to the gpuva list
> - * with the dma-resv lock, otherwise, drm_gem_gpuva_init() takes care
> - * of initializing lock_dep_map for you.
> + * Call this if you're not proctecting access to the gpuva list with the
> + * dma-resv lock, but with a custom lock.
>   */
>  #define drm_gem_gpuva_set_lock(obj, lock) \
> -	if (!(obj)->gpuva.lock_dep_map) \
> +	if (!WARN((obj)->gpuva.lock_dep_map, \
> +		  "GEM GPUVA lock should be set only once.")) \
>  		(obj)->gpuva.lock_dep_map = &(lock)->dep_map
>  #define drm_gem_gpuva_assert_lock_held(obj) \
> -	lockdep_assert(lock_is_held((obj)->gpuva.lock_dep_map))
> +	lockdep_assert((obj)->gpuva.lock_dep_map ? \
> +		       lock_is_held((obj)->gpuva.lock_dep_map) : \
> +		       dma_resv_held((obj)->resv))
>  #else
>  #define drm_gem_gpuva_set_lock(obj, lock) do {} while (0)
>  #define drm_gem_gpuva_assert_lock_held(obj) do {} while (0)
> @@ -573,11 +575,12 @@ int drm_gem_evict(struct drm_gem_object *obj);
>   *
>   * Calling this function is only necessary for drivers intending to support the
>   * &drm_driver_feature DRIVER_GEM_GPUVA.
> + *
> + * See also drm_gem_gpuva_set_lock().
>   */
>  static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>  {
>  	INIT_LIST_HEAD(&obj->gpuva.list);
> -	drm_gem_gpuva_set_lock(obj, &obj->resv->lock.base);
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock
  2023-08-08  7:21   ` Boris Brezillon
@ 2023-08-09 22:40     ` Danilo Krummrich
  0 siblings, 0 replies; 17+ messages in thread
From: Danilo Krummrich @ 2023-08-09 22:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: airlied, daniel, tzimmermann, mripard, corbet, christian.koenig,
	bskeggs, Liam.Howlett, matthew.brost, alexdeucher, ogabbay,
	bagasdotme, willy, jason, donald.robson, dri-devel, nouveau,
	linux-doc, linux-kernel

On 8/8/23 09:21, Boris Brezillon wrote:
> On Thu,  3 Aug 2023 18:52:20 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
>> When no custom lock is set to protect a GEMs GPUVA list, lockdep checks
>> should fall back to the GEM objects dma-resv lock. With the current
>> implementation we're setting the lock_dep_map of the GEM objects 'resv'
>> pointer (in case no custom lock_dep_map is set yet) on
>> drm_gem_private_object_init().
>>
>> However, the GEM objects 'resv' pointer might still change after
>> drm_gem_private_object_init() is called, e.g. through
>> ttm_bo_init_reserved(). This can result in the wrong lock being tracked.
>>
>> To fix this, call dma_resv_held() directly from
>> drm_gem_gpuva_assert_lock_held() and fall back to the GEMs lock_dep_map
>> pointer only if an actual custom lock is set.
>>
>> Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> 
> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> 
> but I'm wondering if it wouldn't be a good thing to add a
> drm_gem_set_resv() helper, so the core can control drm_gem_object::resv
> re-assignments (block them if it's happening after the GEM has been
> exposed to the outside world or update auxiliary data if it's happening
> before that).

I agree, this might be a good idea. There are quite a few places where 
drm_gem_object::resv is set from external code.

> 
>> ---
>>   include/drm/drm_gem.h | 15 +++++++++------
>>   1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index c0b13c43b459..bc9f6aa2f3fe 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -551,15 +551,17 @@ int drm_gem_evict(struct drm_gem_object *obj);
>>    * @lock: the lock used to protect the gpuva list. The locking primitive
>>    * must contain a dep_map field.
>>    *
>> - * Call this if you're not proctecting access to the gpuva list
>> - * with the dma-resv lock, otherwise, drm_gem_gpuva_init() takes care
>> - * of initializing lock_dep_map for you.
>> + * Call this if you're not proctecting access to the gpuva list with the
>> + * dma-resv lock, but with a custom lock.
>>    */
>>   #define drm_gem_gpuva_set_lock(obj, lock) \
>> -	if (!(obj)->gpuva.lock_dep_map) \
>> +	if (!WARN((obj)->gpuva.lock_dep_map, \
>> +		  "GEM GPUVA lock should be set only once.")) \
>>   		(obj)->gpuva.lock_dep_map = &(lock)->dep_map
>>   #define drm_gem_gpuva_assert_lock_held(obj) \
>> -	lockdep_assert(lock_is_held((obj)->gpuva.lock_dep_map))
>> +	lockdep_assert((obj)->gpuva.lock_dep_map ? \
>> +		       lock_is_held((obj)->gpuva.lock_dep_map) : \
>> +		       dma_resv_held((obj)->resv))
>>   #else
>>   #define drm_gem_gpuva_set_lock(obj, lock) do {} while (0)
>>   #define drm_gem_gpuva_assert_lock_held(obj) do {} while (0)
>> @@ -573,11 +575,12 @@ int drm_gem_evict(struct drm_gem_object *obj);
>>    *
>>    * Calling this function is only necessary for drivers intending to support the
>>    * &drm_driver_feature DRIVER_GEM_GPUVA.
>> + *
>> + * See also drm_gem_gpuva_set_lock().
>>    */
>>   static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>>   {
>>   	INIT_LIST_HEAD(&obj->gpuva.list);
>> -	drm_gem_gpuva_set_lock(obj, &obj->resv->lock.base);
>>   }
>>   
>>   /**
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-09 22:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-03 16:52 [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 01/11] drm/gem: fix lockdep check for dma-resv lock Danilo Krummrich
2023-08-08  7:21   ` Boris Brezillon
2023-08-09 22:40     ` Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 02/11] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 03/11] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 04/11] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 05/11] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 06/11] drm/nouveau: fence: separate fence alloc and emit Danilo Krummrich
2023-08-07 18:07   ` Christian König
2023-08-07 18:54     ` Danilo Krummrich
2023-08-08  6:06       ` Christian König
2023-08-03 16:52 ` [PATCH drm-misc-next v9 07/11] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 08/11] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 09/11] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-08-03 16:52 ` [PATCH drm-misc-next v9 11/11] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-08-03 21:44 ` [PATCH drm-misc-next v9 00/11] Nouveau VM_BIND UAPI & DRM GPUVA Manager (merged) Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).