public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@redhat.com>,
	airlied@gmail.com, daniel@ffwll.ch, matthew.brost@intel.com,
	thomas.hellstrom@linux.intel.com, sarah.walker@imgtec.com,
	donald.robson@imgtec.com, faith.ekstrand@collabora.com,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH drm-misc-next v4 4/8] drm/gpuvm: add common dma-resv per struct drm_gpuvm
Date: Thu, 21 Sep 2023 17:27:02 +0200	[thread overview]
Message-ID: <20230921172702.1b9a49a9@collabora.com> (raw)
In-Reply-To: <72ea51ca-f7b0-2e2a-b276-6c6c7413374b@amd.com>

On Thu, 21 Sep 2023 16:34:54 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 21.09.23 um 16:25 schrieb Boris Brezillon:
> > On Thu, 21 Sep 2023 15:34:44 +0200
> > Danilo Krummrich <dakr@redhat.com> wrote:
> >  
> >> On 9/21/23 09:39, Christian König wrote:  
> >>> Am 20.09.23 um 16:42 schrieb Danilo Krummrich:  
> >>>> Provide a common dma-resv for GEM objects not being used outside of this
> >>>> GPU-VM. This is used in a subsequent patch to generalize dma-resv,
> >>>> external and evicted object handling and GEM validation.
> >>>>
> >>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> >>>> ---
> >>>>    drivers/gpu/drm/drm_gpuvm.c            |  9 +++++++--
> >>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.c |  2 +-
> >>>>    include/drm/drm_gpuvm.h                | 17 ++++++++++++++++-
> >>>>    3 files changed, 24 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> >>>> index bfea4a8a19ec..cbf4b738a16c 100644
> >>>> --- a/drivers/gpu/drm/drm_gpuvm.c
> >>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
> >>>> @@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuvm *gpuvm,
> >>>>    /**
> >>>>     * drm_gpuvm_init() - initialize a &drm_gpuvm
> >>>>     * @gpuvm: pointer to the &drm_gpuvm to initialize
> >>>> + * @drm: the drivers &drm_device
> >>>>     * @name: the name of the GPU VA space
> >>>>     * @start_offset: the start offset of the GPU VA space
> >>>>     * @range: the size of the GPU VA space
> >>>> @@ -668,7 +669,7 @@ drm_gpuva_range_valid(struct drm_gpuvm *gpuvm,
> >>>>     * &name is expected to be managed by the surrounding driver structures.
> >>>>     */
> >>>>    void
> >>>> -drm_gpuvm_init(struct drm_gpuvm *gpuvm,
> >>>> +drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> >>>>               const char *name,
> >>>>               u64 start_offset, u64 range,
> >>>>               u64 reserve_offset, u64 reserve_range,
> >>>> @@ -694,6 +695,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
> >>>>                                 reserve_range)))
> >>>>                __drm_gpuva_insert(gpuvm, &gpuvm->kernel_alloc_node);
> >>>>        }
> >>>> +
> >>>> +    drm_gem_private_object_init(drm, &gpuvm->d_obj, 0);
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(drm_gpuvm_init);
> >>>> @@ -713,7 +716,9 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
> >>>>            __drm_gpuva_remove(&gpuvm->kernel_alloc_node);
> >>>>        WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> >>>> -         "GPUVA tree is not empty, potentially leaking memory.");
> >>>> +         "GPUVA tree is not empty, potentially leaking memory.\n");
> >>>> +
> >>>> +    drm_gem_private_object_fini(&gpuvm->d_obj);
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
> >>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>> index 6c86b64273c3..a80ac8767843 100644
> >>>> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>> @@ -1836,7 +1836,7 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
> >>>>        uvmm->kernel_managed_addr = kernel_managed_addr;
> >>>>        uvmm->kernel_managed_size = kernel_managed_size;
> >>>> -    drm_gpuvm_init(&uvmm->base, cli->name,
> >>>> +    drm_gpuvm_init(&uvmm->base, cli->drm->dev, cli->name,
> >>>>                   NOUVEAU_VA_SPACE_START,
> >>>>                   NOUVEAU_VA_SPACE_END,
> >>>>                   kernel_managed_addr, kernel_managed_size,
> >>>> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> >>>> index 0e802676e0a9..6666c07d7c3e 100644
> >>>> --- a/include/drm/drm_gpuvm.h
> >>>> +++ b/include/drm/drm_gpuvm.h
> >>>> @@ -240,14 +240,29 @@ struct drm_gpuvm {
> >>>>         * @ops: &drm_gpuvm_ops providing the split/merge steps to drivers
> >>>>         */
> >>>>        const struct drm_gpuvm_ops *ops;
> >>>> +
> >>>> +    /**
> >>>> +     * @d_obj: Dummy GEM object; used internally to pass the GPU VMs
> >>>> +     * dma-resv to &drm_exec. Provides the GPUVM's &dma-resv.
> >>>> +     */
> >>>> +    struct drm_gem_object d_obj;  
> >>> Yeah, as pointed out in the other mail that won't work like this.  
> >> Which one? Seems that I missed it.
> >>  
> >>> The GPUVM contains GEM objects and therefore should probably have a reference to those objects.
> >>>
> >>> When those GEM objects now use the dma-resv object embedded inside the GPUVM then they also need a reference to the GPUVM to make sure the dma-resv object won't be freed before they are freed.  
> >> My assumption here is that GEM objects being local to a certain VM never out-live the VM. We never share it with anyone, otherwise it would be external and hence wouldn't carray the VM's dma-resv. The only references I see are from the VM itself (which is fine) and from userspace. The latter isn't a problem as long as all GEM handles are closed before the VM is destroyed on FD close.  
> > But we don't want to rely on userspace doing the right thing (calling
> > GEM_CLOSE before releasing the VM), do we?
> >
> > BTW, even though my private BOs have a ref to their exclusive VM, I just
> > ran into a bug because drm_gem_shmem_free() acquires the resv lock
> > (which is questionable, but that's not the topic :-)) and
> > I was calling vm_put(bo->exclusive_vm) before drm_gem_shmem_free(),
> > leading to a use-after-free when the gem->resv is acquired. This has
> > nothing to do with drm_gpuvm, but it proves that this sort of bug is
> > likely to happen if we don't pay attention.
> >  
> >> Do I miss something? Do we have use cases where this isn't true?  
> > The other case I can think of is GEM being v[un]map-ed (kernel
> > mapping) after the VM was released.  
> 
> I think the file reference and the VM stays around in those cases as 
> well, but yes I also think we have use cases which won't work.
> 
> >  
> >>> This is a circle reference dependency.  
> > FWIW, I solved that by having a vm_destroy() function that kills all the
> > mappings in a VM, which in turn releases all the refs the VM had on
> > private BOs. Then, it's just a matter of waiting for all private GEMs
> > to be destroyed to get the final steps of the VM destruction, which is
> > really just about releasing resources (it's called panthor_vm_release()
> > in my case) executed when the VM refcount drops to zero.
> >  
> >>> The simplest solution I can see is to let the driver provide the GEM object to use. Amdgpu uses the root page directory object for this.  
> >> Sure, we can do that, if we see cases where VM local GEM objects can out-live the VM.  
> >>> Apart from that I strongly think that we shouldn't let the GPUVM code create a driver GEM object. We did that in TTM for the ghost objects and it turned out to be a bad idea.  
> > Would that really solve the circular ref issue? I mean, if you're
> > taking the root page dir object as your VM resv, you still have to make
> > sure it outlives the private GEMs, which means, you either need
> > to take a ref on the object, leading to the same circular ref mess, or
> > you need to reset private GEMs resvs before destroying this root page
> > dir GEM (whose lifecyle is likely the same as your VM object which
> > embeds the drm_gpuvm instance).  
> 
> Yes it does help, see how amdgpu does it:
> 
> The VM references all BOs, e.g. page tables as well as user BOs.
> 
> The BOs which use the dma-resv of the root page directory also reference 
> the root page directorys BO.
> 
> So when the VM drops all references the page tables and user BO are 
> released first and the root page directory which everybody references last.

Right, now I see how having a dynamically allocated GEM on which both
the VM and private BOs hold a reference solve problem.

> 
> > Making it driver-specific just moves the responsibility back to drivers
> > (and also allows re-using an real GEM object instead of a dummy one,
> > but I'm not sure we care about saving a few hundreds bytes at that
> > point), which is a good way to not take the blame if the driver does
> > something wrong, but also doesn't really help people do the right thing.  
> 
> The additional memory usage is irrelevant, but we have very very bad 
> experience with TTM using dummy objects similar to this here.
> 
> They tend to end up in driver specific functions and then the driver 
> will try to upcast those dummy to driver specific BOs. In the end you 
> get really hard to figure out memory corruptions.

Hm, I see. Anyway, I guess creating a dummy GEM is simple enough that
we can leave it to drivers (for drivers that don't have a real GEM to
pass, of course).

  reply	other threads:[~2023-09-21 19:05 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 14:42 [PATCH drm-misc-next v4 0/8] [RFC] DRM GPUVA Manager GPU-VM features Danilo Krummrich
2023-09-20 14:42 ` [PATCH drm-misc-next v4 1/8] drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm Danilo Krummrich
2023-09-21  6:48   ` Christian König
2023-09-25  0:42     ` Dave Airlie
2023-09-20 14:42 ` [PATCH drm-misc-next v4 2/8] drm/gpuvm: allow building as module Danilo Krummrich
2023-09-25  0:42   ` Dave Airlie
2023-09-20 14:42 ` [PATCH drm-misc-next v4 3/8] drm/nouveau: uvmm: rename 'umgr' to 'base' Danilo Krummrich
2023-09-25  0:43   ` Dave Airlie
2023-09-20 14:42 ` [PATCH drm-misc-next v4 4/8] drm/gpuvm: add common dma-resv per struct drm_gpuvm Danilo Krummrich
2023-09-21  7:39   ` Christian König
2023-09-21 13:34     ` Danilo Krummrich
2023-09-21 14:21       ` Christian König
2023-09-21 14:25       ` Boris Brezillon
2023-09-21 14:34         ` Christian König
2023-09-21 15:27           ` Boris Brezillon [this message]
2023-09-21 15:30           ` Danilo Krummrich
2023-09-21 14:38         ` Danilo Krummrich
2023-09-20 14:42 ` [PATCH drm-misc-next v4 5/8] drm/gpuvm: add an abstraction for a VM / BO combination Danilo Krummrich
2023-09-20 14:42 ` [PATCH drm-misc-next v4 6/8] drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm Danilo Krummrich
2023-09-20 16:40   ` kernel test robot
2023-09-22 11:42   ` Boris Brezillon
2023-09-22 11:58   ` Boris Brezillon
2023-09-27 16:52     ` Danilo Krummrich
2023-09-28 12:19       ` Boris Brezillon
2023-09-20 14:42 ` [PATCH drm-misc-next v4 7/8] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation Danilo Krummrich
2023-09-22 11:45   ` Boris Brezillon
2023-09-27 16:59     ` Danilo Krummrich
2023-09-20 14:42 ` [PATCH drm-misc-next v4 8/8] drm/nouveau: GPUVM dma-resv/extobj handling, " Danilo Krummrich
2023-09-28 12:09 ` [PATCH drm-misc-next v4 0/8] [RFC] DRM GPUVA Manager GPU-VM features Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230921172702.1b9a49a9@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=sarah.walker@imgtec.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox