Re: [PATCH] drm/msm/gem: Drop obj lock in msm_gem_free_object()

public inbox for linux-arm-msm@vger.kernel.org
 help / color / mirror / Atom feed

From: Daniel Vetter <daniel@ffwll.ch>
To: Rob Clark <robdclark@gmail.com>
Cc: Stephen Boyd <swboyd@chromium.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	freedreno <freedreno@lists.freedesktop.org>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Rob Clark <robdclark@chromium.org>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
	Sean Paul <sean@poorly.run>, David Airlie <airlied@linux.ie>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH] drm/msm/gem: Drop obj lock in msm_gem_free_object()
Date: Fri, 24 Jun 2022 23:36:30 +0200	[thread overview]
Message-ID: <YrYuXvI8ZyHGWAwZ@phenom.ffwll.local> (raw)
In-Reply-To: <CAF6AEGvXUsyVUubhPJy0B=ZDoxofFgPkKh=7g2BEnRwt1N+qmw@mail.gmail.com>

On Fri, Jun 24, 2022 at 02:28:25PM -0700, Rob Clark wrote:
> On Fri, Jun 24, 2022 at 1:58 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Jun 16, 2022 at 06:59:46AM -0700, Rob Clark wrote:
> > > On Thu, Jun 16, 2022 at 1:28 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > >
> > > > Quoting Rob Clark (2022-06-13 13:50:32)
> > > > > diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> > > > > index d608339c1643..432032ad4aed 100644
> > > > > --- a/drivers/gpu/drm/msm/msm_gem.h
> > > > > +++ b/drivers/gpu/drm/msm/msm_gem.h
> > > > > @@ -229,7 +229,19 @@ msm_gem_unlock(struct drm_gem_object *obj)
> > > > >  static inline bool
> > > > >  msm_gem_is_locked(struct drm_gem_object *obj)
> > > > >  {
> > > > > -       return dma_resv_is_locked(obj->resv);
> > > > > +       /*
> > > > > +        * Destroying the object is a special case.. msm_gem_free_object()
> > > > > +        * calls many things that WARN_ON if the obj lock is not held.  But
> > > > > +        * acquiring the obj lock in msm_gem_free_object() can cause a
> > > > > +        * locking order inversion between reservation_ww_class_mutex and
> > > > > +        * fs_reclaim.
> > > > > +        *
> > > > > +        * This deadlock is not actually possible, because no one should
> > > > > +        * be already holding the lock when msm_gem_free_object() is called.
> > > > > +        * Unfortunately lockdep is not aware of this detail.  So when the
> > > > > +        * refcount drops to zero, we pretend it is already locked.
> > > > > +        */
> > > > > +       return dma_resv_is_locked(obj->resv) || (kref_read(&obj->refcount) == 0);
> > > >
> > > > Instead of modifying this function can we push down the fact that this
> > > > function is being called from the free path and skip checking this
> > > > condition in that case? Or add some "_locked/free_path" wrappers that
> > > > skip the lock assertion? That would make it clearer to understand while
> > > > reading the code that it is locked when it is asserted to be locked, and
> > > > that we don't care when we're freeing because all references to the
> > > > object are gone.
> > >
> > > that was my earlier attempt, and I wasn't too happy with the result.
> > > And then I realized if refcount==0 then by definition we aren't racing
> > > with anyone else ;-)
> >
> > I think that's not entirely correct, at least not for fairly reasonable
> > shrinker designs:
> >
> > If the shrinker trylocks for throwing stuff out it might race with a
> > concurrent finalization. Depends a bit upon your design, but usually
> > that's possible.
> 
> Kinda but in fact no.  At least not if your shrinker is designed properly.
> 
> The shrinker does kref_get_unless_zero() and bails in the case that
> we've already started finalizing.  See:
> 
> https://patchwork.freedesktop.org/patch/490160/

Oh you have the order differently than what I'd have typed. If you do
dma_resv_trylock under the lru lock then the kref_get_unless_zero isn't
needed.

Ofc if you do it like you do then you need your locking check trickery.
 
> > There won't be a problem since you'll serialize on a lock eventually. But
> > if you drop all your locking debug checks like this then it's very hard to
> > figure this out :-)
> >
> > Ofc you can adjust your refcounting scheme to make this impossible, but
> > then there's also not really any need to call any functions which might
> > need some locking, since by that time all that stuff must have been
> > cleaned up already (or the refcount could not have dropped to zero). Like
> > if any iova mapping holds a reference you never have these problems.
> >
> > Long story short, this kind of design freaks me out big time. Especially
> > when it involves both a cross-driver refcount like the gem_bo one and a
> > cross driver lock ...
> >
> > The standard way to fix this is to trylock dma_resv on cleanup and push to
> > a worker if you can't get it. This is what ttm and i915 does. Might be
> > good to lift that into drm_gem.c helpers and just use it.
> 
> We used to do that (but unconditionally).. and got rid of it because
> it was causing jank issues (lots of queued up finalizers delaying
> retire work, or something along those lines, IIRC).  I guess back then
> struct_mutex was also involved, and it might not be as bad if we only
> took the async path if we didn't win the trylock.  But IMO that is
> totally unnecessary.  And I kinda am skeptical about pushing too much
> locking stuff to drm core.

Yeah with dev->struct_mutex and unconditionally it's going to be terrible.
It really should't be that bad.

Pulling back into the big picture, I really don't like drivers to spin
their own world for this stuff. And with the ttm drivers (and the i915-gem
one or so) doing one thing, I think msm should do the same. Unless there's
a reason why that's really stupid, and then we should probably switch ttm
over to that too?

If ever driver uses dma_resv differently in the cleanup paths (which are
really the tricky ones) then cross driver code reading becomes an exercise
in pitfall counting and leg regeneration :-(

Also I really don't care about which bikeshed we settle on, as least as
they're all the same.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

next prev parent reply	other threads:[~2022-06-24 21:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 20:50 [PATCH] drm/msm/gem: Drop obj lock in msm_gem_free_object() Rob Clark
2022-06-16  8:28 ` Stephen Boyd
2022-06-16 13:59   ` Rob Clark
2022-06-24 20:58     ` Daniel Vetter
2022-06-24 21:28       ` Rob Clark
2022-06-24 21:36         ` Daniel Vetter [this message]
2022-06-24 21:49           ` Rob Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrYuXvI8ZyHGWAwZ@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@linux.ie \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    --cc=swboyd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox