public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Danilo Krummrich <dakr@kernel.org>
To: Rob Clark <robdclark@gmail.com>
Cc: dri-devel@lists.freedesktop.org, freedreno@lists.freedesktop.org,
	linux-arm-msm@vger.kernel.org,
	Connor Abbott <cwabbott0@gmail.com>,
	Rob Clark <robdclark@chromium.org>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 01/40] drm/gpuvm: Don't require obj lock in destructor path
Date: Tue, 20 May 2025 17:49:42 +0200	[thread overview]
Message-ID: <aCyklmgRUq1wGb5S@cassiopeiae> (raw)
In-Reply-To: <CAF6AEGvegfkAeMA9-3PZN3wectQwt7=YVHoDxoK2fJcjOLbH2g@mail.gmail.com>

On Tue, May 20, 2025 at 08:45:24AM -0700, Rob Clark wrote:
> On Tue, May 20, 2025 at 8:21 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Tue, May 20, 2025 at 07:57:36AM -0700, Rob Clark wrote:
> > > On Tue, May 20, 2025 at 12:23 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > > > On Mon, May 19, 2025 at 10:51:24AM -0700, Rob Clark wrote:
> > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > > > > index f9eb56f24bef..1e89a98caad4 100644
> > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > @@ -1511,7 +1511,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > >       drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
> > > > >       drm_gpuvm_bo_list_del(vm_bo, evict, lock);
> > > > >
> > > > > -     drm_gem_gpuva_assert_lock_held(obj);
> > > > > +     if (kref_read(&obj->refcount) > 0)
> > > > > +             drm_gem_gpuva_assert_lock_held(obj);
> > > >
> > > > Again, this is broken. What if the reference count drops to zero right after
> > > > the kref_read() check, but before drm_gem_gpuva_assert_lock_held() is called?
> > >
> > > No, it is not.  If you find yourself having this race condition, then
> > > you already have bigger problems.  There are only two valid cases when
> > > drm_gpuvm_bo_destroy() is called.  Either:
> > >
> > > 1) You somehow hold a reference to the GEM object, in which case the
> > > refcount will be a positive integer.  Maybe you race but on either
> > > side of the race you have a value that is greater than zero.
> > > 2) Or, you are calling this in the GEM object destructor path, in
> > > which case no one else should have a reference to the object, so it
> > > isn't possible to race
> >
> > What about:
> >
> > 3) You destroy the VM_BO, because the VM is destroyed, but someone else (e.g.
> >    another VM) holds a reference of this BO, which is dropped concurrently?
> 
> I mean, that is already broken, so I'm not sure what your point is?

No, it's not. In upstream GPUVM the last thing drm_gpuvm_bo_destroy() does is
calling drm_gem_object_put(), because a struct drm_gpuvm_bo holds a reference to
the GEM object.

The above is only racy with your patch that disables this reference count and
leaves this trap for the caller.

> 
> This patch is specifically about the case were VMAs are torn down in
> gem->free_object().
> 
> BR,
> -R
> 
> > Please don't tell me "but MSM doesn't do that". This is generic infrastructure,
> > it is perfectly valid for drivers to do that.
> >
> > > If the refcount drops to zero after the check, you are about to blow
> > > up regardless.
> >
> > Exactly, that's why the whole approach of removing the reference count a VM_BO
> > has on the BO, i.e. the proposed DRM_GPUVM_VA_WEAK_REF is broken.
> >
> > As mentioned, make it DRM_GPUVM_MSM_LEGACY_QUIRK and get an approval from Dave /
> > Sima for it.
> >
> > You can't make DRM_GPUVM_VA_WEAK_REF work as a generic thing without breaking
> > the whole design and lifetimes of GPUVM.
> >
> > We'd just end up with tons of traps for drivers with lots of WARN_ON() paths and
> > footguns like the one above if a driver works slightly different than MSM.

  reply	other threads:[~2025-05-20 15:49 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19 17:51 [PATCH v5 00/40] drm/msm: sparse / "VM_BIND" support Rob Clark
2025-05-19 17:51 ` [PATCH v5 01/40] drm/gpuvm: Don't require obj lock in destructor path Rob Clark
2025-05-20  7:23   ` Danilo Krummrich
2025-05-20 14:57     ` Rob Clark
2025-05-20 15:21       ` Danilo Krummrich
2025-05-20 15:45         ` Rob Clark
2025-05-20 15:49           ` Danilo Krummrich [this message]
2025-05-19 17:51 ` [PATCH v5 02/40] drm/gpuvm: Allow VAs to hold soft reference to BOs Rob Clark
2025-05-20  7:40   ` Danilo Krummrich
2025-05-20 15:54     ` Rob Clark
2025-05-20 16:22       ` Danilo Krummrich
2025-05-19 17:51 ` [PATCH v5 03/40] drm/gem: Add ww_acquire_ctx support to drm_gem_lru_scan() Rob Clark
2025-05-19 17:51 ` [PATCH v5 04/40] drm/sched: Add enqueue credit limit Rob Clark
2025-05-19 17:51 ` [PATCH v5 05/40] iommu/io-pgtable-arm: Add quirk to quiet WARN_ON() Rob Clark
2025-05-19 17:51 ` [PATCH v5 06/40] drm/msm: Rename msm_file_private -> msm_context Rob Clark
2025-05-19 17:51 ` [PATCH v5 07/40] drm/msm: Improve msm_context comments Rob Clark
2025-05-19 17:51 ` [PATCH v5 08/40] drm/msm: Rename msm_gem_address_space -> msm_gem_vm Rob Clark
2025-05-19 17:51 ` [PATCH v5 09/40] drm/msm: Remove vram carveout support Rob Clark
2025-05-19 17:51 ` [PATCH v5 10/40] drm/msm: Collapse vma allocation and initialization Rob Clark
2025-05-19 17:51 ` [PATCH v5 11/40] drm/msm: Collapse vma close and delete Rob Clark
2025-05-19 17:51 ` [PATCH v5 12/40] drm/msm: Don't close VMAs on purge Rob Clark
2025-05-19 17:57 ` [PATCH v5 13/40] drm/msm: drm_gpuvm conversion Rob Clark
2025-05-19 17:57   ` [PATCH v5 14/40] drm/msm: Convert vm locking Rob Clark
2025-05-19 17:57   ` [PATCH v5 15/40] drm/msm: Use drm_gpuvm types more Rob Clark
2025-05-19 17:57   ` [PATCH v5 16/40] drm/msm: Split out helper to get iommu prot flags Rob Clark
2025-05-19 17:57   ` [PATCH v5 17/40] drm/msm: Add mmu support for non-zero offset Rob Clark
2025-05-19 17:57   ` [PATCH v5 18/40] drm/msm: Add PRR support Rob Clark
2025-05-19 17:57   ` [PATCH v5 19/40] drm/msm: Rename msm_gem_vma_purge() -> _unmap() Rob Clark
2025-05-19 17:57   ` [PATCH v5 20/40] drm/msm: Drop queued submits on lastclose() Rob Clark
2025-05-19 17:57   ` [PATCH v5 21/40] drm/msm: Lazily create context VM Rob Clark
2025-05-19 17:57   ` [PATCH v5 22/40] drm/msm: Add opt-in for VM_BIND Rob Clark
2025-05-19 17:57   ` [PATCH v5 23/40] drm/msm: Mark VM as unusable on GPU hangs Rob Clark
2025-05-19 17:57   ` [PATCH v5 24/40] drm/msm: Add _NO_SHARE flag Rob Clark
2025-05-19 17:57   ` [PATCH v5 25/40] drm/msm: Crashdump prep for sparse mappings Rob Clark
2025-05-19 17:57   ` [PATCH v5 26/40] drm/msm: rd dumping " Rob Clark
2025-05-19 17:57   ` [PATCH v5 27/40] drm/msm: Crashdec support for sparse Rob Clark
2025-05-19 17:57   ` [PATCH v5 28/40] drm/msm: rd dumping " Rob Clark
2025-05-19 17:57   ` [PATCH v5 29/40] drm/msm: Extract out syncobj helpers Rob Clark
2025-05-19 17:57   ` [PATCH v5 30/40] drm/msm: Use DMA_RESV_USAGE_BOOKKEEP/KERNEL Rob Clark
2025-05-19 17:57   ` [PATCH v5 31/40] drm/msm: Add VM_BIND submitqueue Rob Clark
2025-05-19 17:57   ` [PATCH v5 32/40] drm/msm: Support IO_PGTABLE_QUIRK_NO_WARN_ON Rob Clark
2025-05-19 17:57   ` [PATCH v5 33/40] drm/msm: Support pgtable preallocation Rob Clark
2025-05-19 17:57   ` [PATCH v5 34/40] drm/msm: Split out map/unmap ops Rob Clark
2025-05-19 17:57   ` [PATCH v5 35/40] drm/msm: Add VM_BIND ioctl Rob Clark
2025-05-19 17:57   ` [PATCH v5 36/40] drm/msm: Add VM logging for VM_BIND updates Rob Clark
2025-05-19 17:57   ` [PATCH v5 37/40] drm/msm: Add VMA unmap reason Rob Clark
2025-05-19 17:57   ` [PATCH v5 38/40] drm/msm: Add mmu prealloc tracepoint Rob Clark
2025-05-19 17:57   ` [PATCH v5 39/40] drm/msm: use trylock for debugfs Rob Clark
2025-05-19 17:57   ` [PATCH v5 40/40] drm/msm: Bump UAPI version Rob Clark
2025-05-19 21:15 ` [Linaro-mm-sig] [PATCH v5 00/40] drm/msm: sparse / "VM_BIND" support Dave Airlie
2025-05-19 21:24   ` Rob Clark
2025-05-19 21:45     ` Dave Airlie
2025-05-19 21:51       ` Rob Clark
2025-05-19 22:23         ` Connor Abbott
2025-05-20 15:41 ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aCyklmgRUq1wGb5S@cassiopeiae \
    --to=dakr@kernel.org \
    --cc=airlied@gmail.com \
    --cc=cwabbott0@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox