dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Dave Airlie <airlied@gmail.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 1/5] drm/vmwgfx: unbind in vmw_ttm_unpopulate
Date: Mon, 26 Jul 2021 10:03:22 +1000	[thread overview]
Message-ID: <CAPM=9txffDPERe_iZ2obsekJcbjfph32bca-18ROCJqEPByQWg@mail.gmail.com> (raw)
In-Reply-To: <4c30a961-0a73-3f69-7733-2491549a8743@gmail.com>

On Fri, 23 Jul 2021 at 19:40, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.07.21 um 11:21 schrieb Daniel Vetter:
> > On Fri, Jul 23, 2021 at 11:13 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> Am 23.07.21 um 10:47 schrieb Daniel Vetter:
> >>> On Thu, Jul 22, 2021 at 02:41:23PM +0200, Christian König wrote:
> >>>> Doing this in vmw_ttm_destroy() is to late.
> >>>>
> >>>> It turned out that this is not a good idea at all because it leaves pointers
> >>>> to freed up system memory pages in the GART tables of the drivers.
> >>> So I wanted to review this series, and I can't reconcile your claim here
> >>> with the demidlayering Dave has done. The driver patches here don't
> >>> ouright undo what Dave has done, but that means the bug has been
> >>> preexisting since forever (or is due to some other change?), and your
> >>> commit message is a bit confusing here.
> >>>
> >>> The final patch just undoes the demidlayering from Dave, and I really
> >>> don't see where there's even a functional change there.
> >>>
> >>> And even these patches here don't really change a hole lot with the
> >>> calling sequence for at least final teardown: ttm_tt_destroy_common calls
> >>> ttm_tt_unpopulate as the first thing, so at least there there's no change.
> >>>
> >>> Can you pls elaborate more clearly what exactly you're fixing and what
> >>> exactly needs to be reordered and where this bug is from (commit sha1)? As
> >>> is I'm playing detective and the evidence presented is extremely since and
> >>> I can't reconcile it at all.
> >>>
> >>> I mean I know you don't like typing commit message and documentation, but
> >>> it does get occasionally rather frustrating on the reviewer side if I have
> >>> to interpolate between some very sparse hints for this stuff :-/
> >> Yeah, when have seen the history it's rather obvious what's wrong here
> >> and I expected Dave to review it himself.
> >>
> >> Previously we had three states in TTM for a tt object: Allocated ->
> >> Populated -> Bound which on destruction where done in the order unbind
> >> -> unpopulate -> free.
> >>
> >> Dave moved handling of the bound state into the drivers since it is
> >> basically a driver decision and not a TTM decision what should be bound
> >> and what not (that part perfectly makes sense).
> > I haven't reviewed all the patches from Dave, only the one you pointed
> > at (in the last patch). And that one I still can't match up with your
> > description. If there's other commits relevant, can you pls dig them
> > out?
> >
> > Like it all makes sense what you're saying and matches the code, I
> > just can't match it up with the commit you're referencing.
>
> That is the patch directly following the one I've mentioned:
>
> commit 37bff6542c4e140a11657406c1bab50a40329cc1
> Author: Dave Airlie <airlied@redhat.com>
> Date:   Thu Sep 17 13:24:50 2020 +1000
>
>      drm/ttm: move unbind into the tt destroy.
>
>      This moves unbind into the driver side on destroy paths.
>
> I will add a Fixes tag to make that clear.
>
> But this patch also just moves the undbind from the TTM destroy path to
> the driver destroy path.
>
> To be honest I'm not 100% sure either when the when the unbind moved
> from the unpopulate path into the destroy path, but I think that this
> wasn't always the case. Let me try to dig that up.
>
> >> The problem is that he also moved doing the unbind into the free
> >> callback instead of the unpopulate callback. This result in stale page
> >> pointers in the GART if that unpopulate operation isn't immediately
> >> followed by a free.
> >>
> >> Thinking more about it if we do populated->unpopulated->populated then
> >> we would also have stale pointers to the old pages which is even worse.
> >>
> >> This is also not de-midlayering since we already have a proper
> >> ttm_tt_init()/ttm_tt_fini() functions which should work nicely for the
> >> tt object.
> > Well you're last patch moves the ttm_tt_destroy_common stuff back into
> > ttm, which kinda is de-demidlayering. So I'm confused.
>
> Ah, yes that is correct. I've also considered to move this in
> ttm_tt_fini instead of there.
>
> But that would be a larger change and I wanted to fix the problem at
> hand first, potentially even adding a CC stable tag.
>
> > Other bit: I think it'd be good to document this properly in the
> > callbacks, and maybe ideally go about and kerneldoc-ify the entire
> > ttm_tt.h header. Otherwise when we eventually (never?) get around to
> > that, everyone has forgotten these semantic details and issues again.
>
> Already working towards including more of the TTM headers and code files
> in kerneldoc. But not quite there yet.
>
> But you know, normal human: Only equipped with one head and two hands
> and not cloneable.

I'm the same, but I'm not seeing where this problem happens at all, do
we have any backtraces or demos for this?

I split bind/unbind into the driver, but the driver should still
always be moving things to unbound states before an unpopulate is
called, is there a driver doing something unexpected here?

at worst I'd like to see a WARN_ON put in first and a test in igt that
triggers it, because right now I'm not see that path through the
drivers/ttm that leads to unpopulated pages ending up happening while
bound.

From 5.14-rc3 unpopulate is called from ttm_bo_pipeline_gutting in
non-ghost path and there is no unbind,
pipeline gutting is called from evict/validate, when there is no
placement suggested for the object, is this case getting hit somewhere
without the driver having previously unbound things?

ttm_tt_destroy_common: calls unpopulate, everyone calls this directly
after unbinding
ttm_tt_swapout: calls unpopulate, we don't swapout objects from TT
directly, we should always unbind first, this used to have an assert
against that,
ttm_tt_populate: call unpopulate in failure path

So any place I can see unpopulate getting called with a bound TT
should be a bug, and fixed, we could protect against it better but I'm
not seeing the need for this series to outright revert things back as
helping.

Dave.

  reply	other threads:[~2021-07-26  0:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22 12:41 [PATCH 1/5] drm/vmwgfx: unbind in vmw_ttm_unpopulate Christian König
2021-07-22 12:41 ` [PATCH 2/5] drm/amdgpu: unbind in amdgpu_ttm_tt_unpopulate Christian König
2021-07-22 12:41 ` [PATCH 3/5] drm/nouveau: unbind in nouveau_ttm_tt_unpopulate Christian König
2021-07-22 12:41 ` [PATCH 4/5] drm/radeon: unbind in radeon_ttm_tt_unpopulate() Christian König
2021-07-22 12:41 ` [PATCH 5/5] drm/ttm: revert "flip tt destroy ordering." Christian König
2021-07-23  8:47 ` [PATCH 1/5] drm/vmwgfx: unbind in vmw_ttm_unpopulate Daniel Vetter
2021-07-23  9:13   ` Christian König
2021-07-23  9:21     ` Daniel Vetter
2021-07-23  9:40       ` Christian König
2021-07-26  0:03         ` Dave Airlie [this message]
2021-07-26 19:35           ` Christian König
2021-07-26 20:03             ` Dave Airlie
2021-07-27  9:28               ` Daniel Vetter
2021-07-28 11:18               ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2021-07-28 13:05 Christian König
2021-08-23 11:05 ` Christian König
2021-08-23 11:15   ` Thomas Hellström
2021-08-26  8:49     ` Daniel Vetter
2021-08-26 10:11       ` Christian König
2021-08-26 12:50         ` Daniel Vetter
2021-07-22  8:55 Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPM=9txffDPERe_iZ2obsekJcbjfph32bca-18ROCJqEPByQWg@mail.gmail.com' \
    --to=airlied@gmail.com \
    --cc=airlied@redhat.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).