* [PATCH] drm/ttm: remove race and optimise evicting destroyed buffers.
@ 2010-09-29 3:45 Dave Airlie
2010-09-29 10:28 ` Thomas Hellstrom
0 siblings, 1 reply; 2+ messages in thread
From: Dave Airlie @ 2010-09-29 3:45 UTC (permalink / raw)
To: dri-devel
From: Dave Airlie <airlied@redhat.com>
A race condition was identifed that led to a leak of the BOs, when
a bo was on the delayed delete list and was selected for eviction,
the code would enter
thread 1 (evict) -- thread2 (dd)
bo added to delayed destroy list
take lru lock
ttm_mem_evict_first called
- reserve locked
- remove from lru
- drop lru lock
take lru lock
try del from lru - get put_count == 0
try to reserve locked
- drop lru lock to wait unreserved
call bo_evict
unreserve buffer
- take lru lock
- add back to lru
- unreserver
- drop lru lock
take lru lock due to unreserved
unbind ttm
drop put_count references (which is 0)
despite being on the lru/swap lru lists.
leak due to never losing reference
The obvious quick fix is to try the bo from the ddestroy list if we
are going to evict it, however if we are going to evict a buffer that
is going to be destroyed we should just destroy it instead, its more
likely to be a lighter-weight operation than evict + delayed destroy.
This patch check is a bo that we are about to evict is actually on the
destroy list and if it is, it unreserves it (without adding to lru),
and cleans up its references. It should then get destroyed via
normal ref counting means.
proposed fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=615505
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
drivers/gpu/drm/ttm/ttm_bo.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index cb4cf7e..60689b1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -689,7 +689,7 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
struct ttm_mem_type_manager *man = &bdev->man[mem_type];
struct ttm_buffer_object *bo;
int ret, put_count = 0;
-
+ bool destroy = false;
retry:
spin_lock(&glob->lru_lock);
if (list_empty(&man->lru)) {
@@ -719,6 +719,13 @@ retry:
}
put_count = ttm_bo_del_from_lru(bo);
+
+ /* is the buffer currently on the delayed destroy list? */
+ if (!list_empty(&bo->ddestroy)) {
+ list_del_init(&bo->ddestroy);
+ destroy = true;
+ put_count++;
+ }
spin_unlock(&glob->lru_lock);
BUG_ON(ret != 0);
@@ -726,8 +733,13 @@ retry:
while (put_count--)
kref_put(&bo->list_kref, ttm_bo_ref_bug);
- ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, no_wait_gpu);
- ttm_bo_unreserve(bo);
+ if (destroy) {
+ atomic_set(&bo->reserved, 0);
+ ret = ttm_bo_cleanup_refs(bo, !no_wait_gpu);
+ } else {
+ ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, no_wait_gpu);
+ ttm_bo_unreserve(bo);
+ }
kref_put(&bo->list_kref, ttm_bo_release_list);
return ret;
--
1.7.2.3
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH] drm/ttm: remove race and optimise evicting destroyed buffers.
2010-09-29 3:45 [PATCH] drm/ttm: remove race and optimise evicting destroyed buffers Dave Airlie
@ 2010-09-29 10:28 ` Thomas Hellstrom
0 siblings, 0 replies; 2+ messages in thread
From: Thomas Hellstrom @ 2010-09-29 10:28 UTC (permalink / raw)
To: Dave Airlie; +Cc: dri-devel
On 09/29/2010 05:45 AM, Dave Airlie wrote:
> From: Dave Airlie<airlied@redhat.com>
>
> A race condition was identifed that led to a leak of the BOs, when
> a bo was on the delayed delete list and was selected for eviction,
> the code would enter
>
> thread 1 (evict) -- thread2 (dd)
> bo added to delayed destroy list
> take lru lock
> ttm_mem_evict_first called
> - reserve locked
> - remove from lru
> - drop lru lock
> take lru lock
> try del from lru - get put_count == 0
> try to reserve locked
> - drop lru lock to wait unreserved
> call bo_evict
> unreserve buffer
> - take lru lock
> - add back to lru
> - unreserver
> - drop lru lock
> take lru lock due to unreserved
> unbind ttm
> drop put_count references (which is 0)
> despite being on the lru/swap lru lists.
> leak due to never losing reference
>
> The obvious quick fix is to try the bo from the ddestroy list if we
> are going to evict it, however if we are going to evict a buffer that
> is going to be destroyed we should just destroy it instead, its more
> likely to be a lighter-weight operation than evict + delayed destroy.
>
> This patch check is a bo that we are about to evict is actually on the
> destroy list and if it is, it unreserves it (without adding to lru),
> and cleans up its references. It should then get destroyed via
> normal ref counting means.
>
>
Dave, this will not work, since we can't guarantee that the buffer will
be destroyed immediatly. There may be other holders of a list_kref, and
if the destruction doesn't happen immediately, we don't get immediate
eviction.
If we take a command submission lock and decide to evict all buffers to
clean up fragmentation, we need to be able to guarantee that all buffers
are evicted synchronously.
Having said that, I think it should be possible to implement this if we
in addition to the above code free the mm_node immediately and don't
rely on the buffer being destroyed to do that.
But that's an optimization rather than a bug fix. The real bug sits in
ttm_bo_cleanup_refs(), in the fact that we remove the buffers from the
lru lists before we reserve.
Let me try to come up with a fix for that. I think there needs to be a
second wait for bo idle as well, since in theory, someone might have
started an eviction using an accelerated path..
/Thomas
> proposed fix for:
> https://bugzilla.redhat.com/show_bug.cgi?id=615505
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061
>
> Signed-off-by: Dave Airlie<airlied@redhat.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 18 +++++++++++++++---
> 1 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index cb4cf7e..60689b1 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -689,7 +689,7 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> struct ttm_mem_type_manager *man =&bdev->man[mem_type];
> struct ttm_buffer_object *bo;
> int ret, put_count = 0;
> -
> + bool destroy = false;
> retry:
> spin_lock(&glob->lru_lock);
> if (list_empty(&man->lru)) {
> @@ -719,6 +719,13 @@ retry:
> }
>
> put_count = ttm_bo_del_from_lru(bo);
> +
> + /* is the buffer currently on the delayed destroy list? */
> + if (!list_empty(&bo->ddestroy)) {
> + list_del_init(&bo->ddestroy);
> + destroy = true;
> + put_count++;
> + }
> spin_unlock(&glob->lru_lock);
>
> BUG_ON(ret != 0);
> @@ -726,8 +733,13 @@ retry:
> while (put_count--)
> kref_put(&bo->list_kref, ttm_bo_ref_bug);
>
> - ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, no_wait_gpu);
> - ttm_bo_unreserve(bo);
> + if (destroy) {
> + atomic_set(&bo->reserved, 0);
> + ret = ttm_bo_cleanup_refs(bo, !no_wait_gpu);
> + } else {
> + ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, no_wait_gpu);
> + ttm_bo_unreserve(bo);
> + }
>
> kref_put(&bo->list_kref, ttm_bo_release_list);
> return ret;
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-09-29 10:28 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29 3:45 [PATCH] drm/ttm: remove race and optimise evicting destroyed buffers Dave Airlie
2010-09-29 10:28 ` Thomas Hellstrom
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.