From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
"Simona Vetter" <simona.vetter@ffwll.ch>
Cc: Matthew Brost <matthew.brost@intel.com>, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header
Date: Fri, 11 Oct 2024 18:52:58 +0200 [thread overview]
Message-ID: <20228fe728d710477c8c856422ee66c2b0bb00e5.camel@linux.intel.com> (raw)
In-Reply-To: <08a82bc9-b9bd-4491-8a25-b8d0e6cb20d2@amd.com>
On Thu, 2024-10-10 at 10:00 +0200, Christian König wrote:
> Am 09.10.24 um 16:17 schrieb Thomas Hellström:
> > On Wed, 2024-10-09 at 15:39 +0200, Thomas Hellström wrote:
> > > On Mon, 2024-10-07 at 11:08 +0200, Christian König wrote:
> > > > Hi Thomas,
> > > >
> > > > I'm on sick leave, but I will try to answer questions and share
> > > > some
> > > > thoughts on this to unblock you.
> > > >
> > > > Am 18.09.24 um 14:57 schrieb Thomas Hellström:
> > > > > Sima, Christian
> > > > >
> > > > > I've updated the shrinker series now with a guarded for_each
> > > > > macro
> > > > > instead:
> > > > >
> > > > > https://patchwork.freedesktop.org/patch/614514/?series=131815&rev=9
> > > > Yeah that looks like a big step in the right direction.
> > > >
> > > > > (Note I forgot to remove the export of the previous LRU
> > > > > walker).
> > > > >
> > > > > so the midlayer argument is now not an issue anymore. The
> > > > > cleanup.h
> > > > > guard provides some additional protection against drivers
> > > > > exiting
> > > > > the
> > > > > LRU loop early.
> > > > >
> > > > > So remaining is the question whether the driver is allowed to
> > > > > discard a
> > > > > suggested bo to shrink from TTM.
> > > > >
> > > > > Arguments for:
> > > > >
> > > > > 1) Not allowing that would require teaching TTM about
> > > > > purgeable
> > > > > objects.
> > > > I think that is actually not correct. TTM already knows about
> > > > purgeable
> > > > objects.
> > > >
> > > > E.g. when TTM asks the driver what to do with evicted objects
> > > > it is
> > > > perfectly valid to return an empty placement list indicating
> > > > that
> > > > the
> > > > backing store can just be thrown away.
> > > >
> > > > We use this for things like temporary buffers for example.
> > > >
> > > > That this doesn't apply to swapping looks like a design bug in
> > > > the
> > > > driver/TTM interface to me.
> > > Yes we can do that with TTM, but for shrinking there's no point
> > > in
> > > trying to shrink when there is no swap-space left, other than
> > > purgeable
> > > object. The number of shrinkable objects returned in
> > > shrinker::count
> > > needs to reflect that, and *then* only those objects should be
> > > selected
> > > for shrinking. If we were to announce that to TTM using a
> > > callback,
> > > we're actually back to version 1 of this series which was
> > > rejected by
> > > you exactly since it was using callbacks a year or so ago????
>
> Yeah that indeed doesn't sound like a good idea.
>
> On the other hand the decision that only purgeable objects should be
> selected when there is no swap space left sounds like something TTM
> should do and not the driver.
>
> > > > > 2) Devices who need the blitter during shrinking would want
> > > > > to
> > > > > punt
> > > > > runtime_pm_get() to kswapd to avoid sleeping direct reclaim.
> > > > I think the outcome of the discussion is that runtime PM should
> > > > never
> > > > be
> > > > mixed into TTM swapping.
> > > >
> > > > You can power up blocks to enable a HW blitter for swapping,
> > > > but
> > > > this
> > > > then can't be driven by the runtime PM framework.
> > > Still that power-on might be sleeping, so what's the difference
> > > using
> > > runtime-pm or not? Why should the driver implement yet another
> > > power
> > > interface, just because TTM refuses to publish a sane LRU walk
> > > interface?
>
> See the discussion with Sima around the PM functions.
>
> My understanding might be wrong, but I now think that with local
> memory
> you can't do the i915 approach where the PM functions are so low
> level
> that they can also be called inside the shrinker.
>
> So you basically have PM functions for your whole device and PM
> functions for only the HW blocks necessary for the shrinker.
>
> > >
> > > > > 3) If those devices end up blitting (LNL) to be able to
> > > > > shrink,
> > > > > they
> > > > > would want to punt waiting for the fence to signal to kswapd
> > > > > to
> > > > > avoid
> > > > > waiting in direct reclaim.
> > > > Mhm, what do you mean with that?
> > > When we fired the blitter from direct reclaim, we get a fence. If
> > > we
> > > wait for it in direct reclaim we will be sleeping waiting for
> > > gpu,
> > > which is bad form. We'd like return a failure so the object is
> > > retried
> > > when idle, or from kswapd.
>
> Oh, that is a really good point. So basically you want to avoid
> dependencies on the dma_fence.
>
> > > > > 4) It looks like we need to resort to folio_trylock in the
> > > > > shmem
> > > > > backup
> > > > > backend when shrinking is called for gfp_t = GFP_NOFS. A
> > > > > failing
> > > > > trylock will require a new bo.
> > > > Why would a folio trylock succeed with one BO and not another?
> > > Good point. We'd fail anyway but would perhaps need to call
> > > SHRINK_STOP..
> > >
> > > > And why would that trylock something the device specific driver
> > > > should
> > > > handle?
> > > It happens in the TTM shrinker helper called from the driver in
> > > the
> > > spirit of demidlayering.
> > >
> > > > > Arguments against:
> > > > > None really. I thought the idea of demidlayering would be to
> > > > > allow
> > > > > the
> > > > > driver more freedom.
> > > > Well that is a huge argument against it. Giving drivers more
> > > > freedom
> > > > is
> > > > absolutely not something which turned out to be valuable in the
> > > > past.
> > > So then what's the point of demidlayering?
>
> So that drivers can prepare the environment for TTM to work with
> instead
> of TTM asking for it.
>
> In your case for example that means powering up HW blocks so that BOs
> could be moved.
>
> > > > Instead we should put device drivers in a very strict corset of
> > > > validated approaches to do things.
> > > >
> > > > Background is that in my experience driver developers are
> > > > perfectly
> > > > willing to do unclean approaches which only work in like 99% of
> > > > all
> > > > cases just to get a bit more performance or simpler driver
> > > > implementation.
> > > >
> > > > Those approaches are not legal and in my opinion as subsystem
> > > > maintainer
> > > > I think we need to be more strict and push back much harder on
> > > > stuff
> > > > like that.
> > > Still, historically that has made developers abandon common
> > > components
> > > for driver-specific solutions.
> > >
> > > And the question is still not answered.
> > >
> > > Exactly *why* can't the driver fail and continue traversing the
> > > LRU,
> > > because all our argumentation revolves around this and you have
> > > yet
> > > to
> > > provide an objective reason why.
> > And in the end, while I think there definitely things worthy of
> > discussion in this series,
> >
> > I don't think there is a point in trying to land a LNL shrinker
> > operating against a crippled TTM LRU walk interface, nor do I think
> > anyone would want to attempt to port i915 over, and reworking it
> > three
> > times I'm now pretty familiar with what works and what doesn't.
> >
> > So question becomes, with proper reviews can I merge the series
> > here as
> > is, *with* the de-midlayered LRU walk and noting your advise
> > against
> > it, or not?
>
> More or less yes, I still have a bad feeling about it but we probably
> need to see the whole thing getting used to judge if it really works
> or not.
>
> I mean it's not UAPI we are talking about, so even if we find in
> 5years
> from now that it was a bad idea we can still roll it back and try
> something else.
>
> So yeah, feel free to go ahead.
Thanks, I'll respin a version moving the checks you pointed out,
(get_nr_swap_pages() + all other checks TTM can really do) etc, into
TTM helpers to simplify such a change in the future if it becomes
needed.
/Thomas
>
> Regards,
> Christian.
>
>
> >
> > Thanks,
> > Thomas
> >
> >
> >
> >
> >
> > > /Thomas
> > >
> > >
> > >
> > >
> > > > Regards,
> > > > Christian.
> > > >
> > > > > So any feedback appreciated. If that is found acceptable we
> > > > > can
> > > > > proceed
> > > > > with reviewing this patch and also with the shrinker series.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > >
> > > > > On Mon, 2024-09-02 at 13:07 +0200, Daniel Vetter wrote:
> > > > > > On Wed, Aug 28, 2024 at 02:20:34PM +0200, Christian König
> > > > > > wrote:
> > > > > > > Am 27.08.24 um 19:53 schrieb Daniel Vetter:
> > > > > > > > On Tue, Aug 27, 2024 at 06:52:13PM +0200, Daniel Vetter
> > > > > > > > wrote:
> > > > > > > > > On Thu, Aug 22, 2024 at 03:19:29PM +0200, Christian
> > > > > > > > > König
> > > > > > > > > wrote:
> > > > > > > > > > Completely agree that this is complicated, but I
> > > > > > > > > > still
> > > > > > > > > > don't
> > > > > > > > > > see the need
> > > > > > > > > > for it.
> > > > > > > > > >
> > > > > > > > > > Drivers just need to use pm_runtime_get_if_in_use()
> > > > > > > > > > inside
> > > > > > > > > > the shrinker and
> > > > > > > > > > postpone all hw activity until resume.
> > > > > > > > > Not good enough, at least long term I think. Also
> > > > > > > > > postponing hw
> > > > > > > > > activity
> > > > > > > > > to resume doesn't solve the deadlock issue, if you
> > > > > > > > > still
> > > > > > > > > need
> > > > > > > > > to grab ttm
> > > > > > > > > locks on resume.
> > > > > > > > Pondered this specific aspect some more, and I think
> > > > > > > > you
> > > > > > > > still
> > > > > > > > have a race
> > > > > > > > here (even if you avoid the deadlock): If the
> > > > > > > > condiditional
> > > > > > > > rpm_get call
> > > > > > > > fails there's no guarantee that the device will
> > > > > > > > suspend/resume
> > > > > > > > and clean
> > > > > > > > up the GART mapping.
> > > > > > > Well I think we have a major disconnect here. When the
> > > > > > > device
> > > > > > > is
> > > > > > > powered
> > > > > > > down there is no GART mapping to clean up any more.
> > > > > > >
> > > > > > > In other words GART is a table in local memory (VRAM)
> > > > > > > when
> > > > > > > the
> > > > > > > device is
> > > > > > > powered down this table is completely destroyed. Any BO
> > > > > > > which
> > > > > > > was
> > > > > > > mapped
> > > > > > > inside this table is now not mapped any more.
> > > > > > >
> > > > > > > So when the shrinker wants to evict a BO which is marked
> > > > > > > as
> > > > > > > mapped
> > > > > > > to GART
> > > > > > > and the device is powered down we just skip the GART
> > > > > > > unmapping
> > > > > > > part
> > > > > > > because
> > > > > > > that has already implicitly happened during power down.
> > > > > > >
> > > > > > > Before mapping any BO into the GART again we power the
> > > > > > > GPU up
> > > > > > > through the
> > > > > > > runtime PM calls. And while powering it up again the GART
> > > > > > > is
> > > > > > > restored.
> > > > > > My point is that you can't tell whether the device will
> > > > > > power
> > > > > > down or
> > > > > > not,
> > > > > > you can only tell whether there's a chance it might be
> > > > > > powering
> > > > > > down
> > > > > > and
> > > > > > so you can't get at the rpm reference without deadlock
> > > > > > issues.
> > > > > >
> > > > > > > > The race gets a bit smaller if you use
> > > > > > > > pm_runtime_get_if_active(), but even then you might
> > > > > > > > catch
> > > > > > > > it
> > > > > > > > right when
> > > > > > > > resume almost finished.
> > > > > > > What race are you talking about?
> > > > > > >
> > > > > > > The worst thing which could happen is that we restore a
> > > > > > > GART
> > > > > > > entry
> > > > > > > which
> > > > > > > isn't needed any more, but that is pretty much irrelevant
> > > > > > > since
> > > > > > > we
> > > > > > > only
> > > > > > > clear them to avoid some hw bugs.
> > > > > > The race I'm seeing is where you thought the GART entry is
> > > > > > not
> > > > > > issue,
> > > > > > tossed an object, but the device didn't suspend, so might
> > > > > > still
> > > > > > use
> > > > > > it.
> > > > > >
> > > > > > I guess if we're clearly separating the sw allocation of
> > > > > > the
> > > > > > TTM_TT
> > > > > > with
> > > > > > the physical entries in the GART that should all work, but
> > > > > > feels
> > > > > > a
> > > > > > bit
> > > > > > tricky. The race I've seen is essentially these two getting
> > > > > > out
> > > > > > of
> > > > > > sync.
> > > > > >
> > > > > > So maybe it was me who's stuck.
> > > > > >
> > > > > > What I wonder is whether it works in practice, since on the
> > > > > > restore
> > > > > > side
> > > > > > you need to get some locks to figure out which gart
> > > > > > mappings
> > > > > > exist
> > > > > > and
> > > > > > need restoring. And that's the same locks as the shrinker
> > > > > > needs
> > > > > > to
> > > > > > figure
> > > > > > out whether it might need to reap a gart mapping.
> > > > > >
> > > > > > Or do you just copy the gart entries over and restore them
> > > > > > exactly
> > > > > > as-is,
> > > > > > so that there's no shared locks?
> > > > > >
> > > > > > > > That means we'll have ttm bo hanging around with GART
> > > > > > > > allocations/mappings
> > > > > > > > which aren't actually valid anymore (since they might
> > > > > > > > escape
> > > > > > > > the
> > > > > > > > cleanup
> > > > > > > > upon resume due to the race). That doesn't feel like a
> > > > > > > > solid
> > > > > > > > design
> > > > > > > > either.
> > > > > > > I'm most likely missing something, but I'm really
> > > > > > > scratching
> > > > > > > my
> > > > > > > head where
> > > > > > > you see a problem here.
> > > > > > I guess one issue is that at least traditionally, igfx
> > > > > > drivers
> > > > > > have
> > > > > > nested
> > > > > > runtime pm within dma_resv lock. And dgpu drivers the other
> > > > > > way
> > > > > > round.
> > > > > > Which is a bit awkward if you're trying for common code.
> > > > > >
> > > > > > Cheers, Sima
next prev parent reply other threads:[~2024-10-11 16:53 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 12:42 Using drm_exec in TTM Christian König
2024-07-10 12:42 ` [PATCH 1/7] dma-buf/dma-resv: Introduce dma_resv_trylock_ctx() Christian König
2024-07-10 12:42 ` [PATCH 2/7] drm/exec: don't immediately add the prelocked obj Christian König
2024-07-10 12:42 ` [PATCH 3/7] drm/exec: provide trylock interface for eviction Christian König
2024-07-10 12:42 ` [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header Christian König
2024-07-10 18:19 ` Matthew Brost
2024-07-11 12:01 ` Christian König
2024-07-11 16:00 ` Matthew Brost
2024-08-06 8:29 ` Thomas Hellström
2024-08-19 11:03 ` Christian König
2024-08-19 11:38 ` Thomas Hellström
2024-08-19 14:14 ` Daniel Vetter
2024-08-19 15:26 ` Christian König
2024-08-20 10:37 ` Thomas Hellström
2024-08-20 15:45 ` Christian König
2024-08-20 16:00 ` Thomas Hellström
2024-08-21 8:14 ` Christian König
2024-08-21 8:57 ` Thomas Hellström
2024-08-21 9:31 ` Thomas Hellström
2024-08-21 9:48 ` Christian König
2024-08-21 12:00 ` Thomas Hellström
2024-08-22 6:47 ` Thomas Hellström
2024-08-22 7:55 ` Christian König
2024-08-22 8:21 ` Thomas Hellström
2024-08-22 8:36 ` Thomas Hellström
2024-08-22 9:29 ` Christian König
2024-08-22 13:16 ` Thomas Hellström
2024-08-27 16:58 ` Daniel Vetter
2024-08-22 9:23 ` Daniel Vetter
2024-08-22 13:19 ` Christian König
2024-08-27 16:52 ` Daniel Vetter
2024-08-27 17:53 ` Daniel Vetter
2024-08-28 12:20 ` Christian König
2024-08-28 14:05 ` Thomas Hellström
2024-08-28 15:25 ` Christian König
2024-08-28 15:35 ` Alex Deucher
2024-08-28 15:45 ` Thomas Hellström
2024-09-02 11:07 ` Daniel Vetter
2024-09-18 12:57 ` Thomas Hellström
2024-10-02 11:30 ` Thomas Hellström
2024-10-02 11:32 ` Christian König
2024-10-02 11:36 ` Thomas Hellström
2024-10-07 9:08 ` Christian König
2024-10-09 13:39 ` Thomas Hellström
2024-10-09 14:17 ` Thomas Hellström
2024-10-10 8:00 ` Christian König
2024-10-11 16:52 ` Thomas Hellström [this message]
2024-08-27 18:24 ` Alex Deucher
2024-09-02 11:00 ` Daniel Vetter
2024-08-21 7:02 ` Thomas Hellström
2024-07-10 12:42 ` [PATCH 5/7] drm/ttm: move needs_unlock into the walk Christian König
2024-07-10 12:43 ` [PATCH 6/7] drm/ttm: support using drm_exec during eviction v2 Christian König
2024-07-10 12:43 ` [PATCH 7/7] drm/amdgpu: use drm_exec during BO validation Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20228fe728d710477c8c856422ee66c2b0bb00e5.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=simona.vetter@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.