From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: linaro-mm-sig@lists.linaro.org, intel-gfx@lists.freedesktop.org,
sumit.semwal@linaro.org, dri-devel@lists.freedesktop.org,
linux-media@vger.kernel.org
Subject: Re: RFC: Unpinned DMA-buf handling
Date: Tue, 5 Nov 2019 14:46:07 +0100 [thread overview]
Message-ID: <20191105134607.GH10326@phenom.ffwll.local> (raw)
In-Reply-To: <20191029104049.9011-1-christian.koenig@amd.com>
On Tue, Oct 29, 2019 at 11:40:44AM +0100, Christian König wrote:
> The basic idea stayed the same since the last version of those patches.
> The exporter can provide explicit pin/unpin functions and the importer a
> move_notify callback. This allows us to avoid pinning buffers while
> importers have a mapping for them.
>
> In difference to the last version the locking changes were separated
> from this patchset and committed to drm-misc-next.
>
> This allows drivers to implement the new locking semantics without the
> extra unpinned handling, but of course the changed locking semantics is
> still a prerequisite to the unpinned handling.
>
> The last time this set was send out the discussion ended by questioning
> if the move_notify callback was really the right approach of notifying
> the importers that a buffer is about to change its placement. A possible
> alternative would be to add a special crafted fence object instead.
>
> Let's discuss on the different approaches once more,
So here's my pile of higher-level thoughts on things still to discuss. I
don't think we need a code-answer for all of these, but at least a rough
idea to make sure we're not walling ourselves into a corner.
- The entire eviction fence stuff amdkfd does. It is kinda a very special
version of ->move_notify, except it's also passing around an active bit
for an entire set of buffers in an efficient way. This active bit works
for amdkfd where we evict the entire context (logically at least, ofc
all the unevicted buffers and their pagetables stay). I don't think
it'll work of a more traditional execbuf driver.
I think we need some way to move lru/active information between drivers
that works. Including making sure that drivers don't spend all the time
walking over all the active buffers in their lru first, but also not
burning down too much cpu time. So either lazy lru updates, or some bulk
move thing, or something else. Or alternatively we spec out explicitly
that lru updates will _not_ happen across drivers, and that drivers need
to lru-bump buffers while evicting when they notice they're still busy
(so some kind of lazy lru update).
- How will we handle the acquire ctx? Sooner or later I expect that when
an importer calls into the exporter to validate the buffers we need to
to have something like what you added as a stall point in ttm in
commit d367bd2a5e2b12cb9135b30df94af8211196e8cf
Author: Christian König <christian.koenig@amd.com>
Date: Wed May 22 09:51:47 2019 +0200
drm/ttm: fix busy memory to fail other user v10
Now we can do the same trick you've done of fishing out the acquire ctx
from the buffer we're trying to get validated, instead of an explicit
parameter to dma_buf_map_attachment. But the other change is that
callers need to be able to handle EDEADLCK, and that's a huge one. I'm
leaning towards requiring EDEADLCK handling for dynamic importers from
the go, using the fake deadlck injection debug knob to enforce it.
Explicit argument would be nice, but oh well.
- Related, we need to have an idea for how we should handle the TODO
comment in ttm_mem_evict_wait_busy across drivers. Other drivers might
hang onto a lot of buffers they don't really need, simply because they
evicted them and kept them locked (i915 very likely will do that).
This is one of the questions I don't think we need to solve right away,
but good to have a solution in mind. I think a dma_resv->can_evict flag,
which allows the lru evict code to throw out locked buffers (only locked
by our own ctx ofc) would solve this. But not 100% sure. Also making it
can_evict would make it opt-in as an optimization.
- ->move_notify needs to guarantee that all access stops, or we have a
huge leak between security domains. I think there's three ways to do
that:
- Preempt the entire context right away. This is what amdkfd does
(except with the eviction fence, not the move_notify callback). Then
when you reschedule the context make sure all the pagetables are up to
date again.
- Synchronously punch out the pagetables in ->notify_move, and let
gpu-side page faulting handle the fallout. Not sure anyone is doing
that right now, but we at least discussed that as an idea here at
intel.
- Add an async pagetable update job, which has the current latest job as
a dependency, and adds a new fence to the dma_resv object to signal
when the pagetables are updated. This would all be scheduled from the
->notify_move callback, so would need to make sure this is officially
allowed.
If we don't have any of these then some later batch (which didn't
declare it's going to need the buffer we've evicted) could access
whatever new buffer has been placed at the same locations through the
old pagetables.
I did try to figure out how you solve this in amdgpu right now, but for
normal CS ioctl the only pagetable update code I could find was in the
cs ioctl itself. That's too late I think.
- Related to the above, and since I think the ttm hack to privatize the
resv_object to avoid unecessary stalls: I think a last_acces fence on
the attachment would be really nice. That way a driver could make sure
it's only blocking on its own stuff and not on stuff another gpu is
doing. But not sure it makes sense to have that in dma-buf structures,
drivers can just track this on their own (and ttm would need a big
overhaul since right now it totally ignores that there might be multiple
mappings of the same underlying buffer object, that's all left to
drivers to wrangle).
- Finally, we need to spell out the semantics of when you need to call
dma_buf_attachment_unmap after a notify_move. I'm kinda leaning towards
that you first need to unmap the old mapping, before you create a new
one. But that might be too tough to implement for drivers, and result in
stalls. So probably we need multiple mappings, and then we need to make
it clear whether you can do that on the same attachment, or whether you
need to do something else. Either way this must be really really clear.
I think the above is all the big questions from the past few discussion
rounds that we still have.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: linaro-mm-sig@lists.linaro.org, intel-gfx@lists.freedesktop.org,
sumit.semwal@linaro.org, dri-devel@lists.freedesktop.org,
linux-media@vger.kernel.org
Subject: Re: [Intel-gfx] RFC: Unpinned DMA-buf handling
Date: Tue, 5 Nov 2019 14:46:07 +0100 [thread overview]
Message-ID: <20191105134607.GH10326@phenom.ffwll.local> (raw)
Message-ID: <20191105134607.u04lP7vghq5tcPRLAgPOg0J6f8Fy6ct16r-HIsvdfaw@z> (raw)
In-Reply-To: <20191029104049.9011-1-christian.koenig@amd.com>
On Tue, Oct 29, 2019 at 11:40:44AM +0100, Christian König wrote:
> The basic idea stayed the same since the last version of those patches.
> The exporter can provide explicit pin/unpin functions and the importer a
> move_notify callback. This allows us to avoid pinning buffers while
> importers have a mapping for them.
>
> In difference to the last version the locking changes were separated
> from this patchset and committed to drm-misc-next.
>
> This allows drivers to implement the new locking semantics without the
> extra unpinned handling, but of course the changed locking semantics is
> still a prerequisite to the unpinned handling.
>
> The last time this set was send out the discussion ended by questioning
> if the move_notify callback was really the right approach of notifying
> the importers that a buffer is about to change its placement. A possible
> alternative would be to add a special crafted fence object instead.
>
> Let's discuss on the different approaches once more,
So here's my pile of higher-level thoughts on things still to discuss. I
don't think we need a code-answer for all of these, but at least a rough
idea to make sure we're not walling ourselves into a corner.
- The entire eviction fence stuff amdkfd does. It is kinda a very special
version of ->move_notify, except it's also passing around an active bit
for an entire set of buffers in an efficient way. This active bit works
for amdkfd where we evict the entire context (logically at least, ofc
all the unevicted buffers and their pagetables stay). I don't think
it'll work of a more traditional execbuf driver.
I think we need some way to move lru/active information between drivers
that works. Including making sure that drivers don't spend all the time
walking over all the active buffers in their lru first, but also not
burning down too much cpu time. So either lazy lru updates, or some bulk
move thing, or something else. Or alternatively we spec out explicitly
that lru updates will _not_ happen across drivers, and that drivers need
to lru-bump buffers while evicting when they notice they're still busy
(so some kind of lazy lru update).
- How will we handle the acquire ctx? Sooner or later I expect that when
an importer calls into the exporter to validate the buffers we need to
to have something like what you added as a stall point in ttm in
commit d367bd2a5e2b12cb9135b30df94af8211196e8cf
Author: Christian König <christian.koenig@amd.com>
Date: Wed May 22 09:51:47 2019 +0200
drm/ttm: fix busy memory to fail other user v10
Now we can do the same trick you've done of fishing out the acquire ctx
from the buffer we're trying to get validated, instead of an explicit
parameter to dma_buf_map_attachment. But the other change is that
callers need to be able to handle EDEADLCK, and that's a huge one. I'm
leaning towards requiring EDEADLCK handling for dynamic importers from
the go, using the fake deadlck injection debug knob to enforce it.
Explicit argument would be nice, but oh well.
- Related, we need to have an idea for how we should handle the TODO
comment in ttm_mem_evict_wait_busy across drivers. Other drivers might
hang onto a lot of buffers they don't really need, simply because they
evicted them and kept them locked (i915 very likely will do that).
This is one of the questions I don't think we need to solve right away,
but good to have a solution in mind. I think a dma_resv->can_evict flag,
which allows the lru evict code to throw out locked buffers (only locked
by our own ctx ofc) would solve this. But not 100% sure. Also making it
can_evict would make it opt-in as an optimization.
- ->move_notify needs to guarantee that all access stops, or we have a
huge leak between security domains. I think there's three ways to do
that:
- Preempt the entire context right away. This is what amdkfd does
(except with the eviction fence, not the move_notify callback). Then
when you reschedule the context make sure all the pagetables are up to
date again.
- Synchronously punch out the pagetables in ->notify_move, and let
gpu-side page faulting handle the fallout. Not sure anyone is doing
that right now, but we at least discussed that as an idea here at
intel.
- Add an async pagetable update job, which has the current latest job as
a dependency, and adds a new fence to the dma_resv object to signal
when the pagetables are updated. This would all be scheduled from the
->notify_move callback, so would need to make sure this is officially
allowed.
If we don't have any of these then some later batch (which didn't
declare it's going to need the buffer we've evicted) could access
whatever new buffer has been placed at the same locations through the
old pagetables.
I did try to figure out how you solve this in amdgpu right now, but for
normal CS ioctl the only pagetable update code I could find was in the
cs ioctl itself. That's too late I think.
- Related to the above, and since I think the ttm hack to privatize the
resv_object to avoid unecessary stalls: I think a last_acces fence on
the attachment would be really nice. That way a driver could make sure
it's only blocking on its own stuff and not on stuff another gpu is
doing. But not sure it makes sense to have that in dma-buf structures,
drivers can just track this on their own (and ttm would need a big
overhaul since right now it totally ignores that there might be multiple
mappings of the same underlying buffer object, that's all left to
drivers to wrangle).
- Finally, we need to spell out the semantics of when you need to call
dma_buf_attachment_unmap after a notify_move. I'm kinda leaning towards
that you first need to unmap the old mapping, before you create a new
one. But that might be too tough to implement for drivers, and result in
stalls. So probably we need multiple mappings, and then we need to make
it clear whether you can do that on the same attachment, or whether you
need to do something else. Either way this must be really really clear.
I think the above is all the big questions from the past few discussion
rounds that we still have.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-11-05 13:46 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-29 10:40 RFC: Unpinned DMA-buf handling Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-10-29 10:40 ` [PATCH 1/5] dma-buf: add dynamic DMA-buf handling v14 Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-11-05 10:20 ` Daniel Vetter
2019-11-05 10:20 ` [Intel-gfx] " Daniel Vetter
2020-02-18 13:20 ` Christian König
2020-02-18 14:14 ` Daniel Vetter
2019-10-29 10:40 ` [PATCH 2/5] drm/ttm: remove the backing store if no placement is given Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-10-29 10:40 ` [PATCH 3/5] drm/amdgpu: use allowed_domains for exported DMA-bufs Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-10-29 10:40 ` [PATCH 4/5] drm/amdgpu: add amdgpu_dma_buf_pin/unpin Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-10-29 10:40 ` [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify Christian König
2019-10-29 10:40 ` [Intel-gfx] " Christian König
2019-11-05 10:52 ` Daniel Vetter
2019-11-05 10:52 ` [Intel-gfx] " Daniel Vetter
2019-11-05 13:39 ` Christian König
2019-11-05 13:39 ` [Intel-gfx] " Christian König
2019-11-05 13:50 ` Daniel Vetter
2019-11-05 13:50 ` [Intel-gfx] " Daniel Vetter
2019-11-05 15:20 ` Koenig, Christian
2019-11-05 15:20 ` [Intel-gfx] " Koenig, Christian
2019-11-05 15:23 ` Daniel Vetter
2019-11-05 15:23 ` [Intel-gfx] " Daniel Vetter
2019-10-29 17:38 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] dma-buf: add dynamic DMA-buf handling v14 Patchwork
2019-10-29 17:38 ` [Intel-gfx] " Patchwork
2019-10-29 18:09 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-10-29 18:09 ` [Intel-gfx] " Patchwork
2019-11-05 13:46 ` Daniel Vetter [this message]
2019-11-05 13:46 ` [Intel-gfx] RFC: Unpinned DMA-buf handling Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191105134607.GH10326@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-media@vger.kernel.org \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox