All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs
@ 2014-10-28  9:35 Michel Dänzer
  2014-10-28  9:35 ` [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN Michel Dänzer
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Michel Dänzer @ 2014-10-28  9:35 UTC (permalink / raw)
  To: dri-devel; +Cc: Lauri Kasanen

From: Michel Dänzer <michel.daenzer@amd.com>

I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but
AFAICT it does.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/radeon/radeon_object.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 9c92976..d23b0dd 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -179,9 +179,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 	 * improve fragmentation quality.
 	 * 512kb was measured as the most optimal number.
 	 */
-	if (!((rbo->flags & RADEON_GEM_CPU_ACCESS) &&
-	      (rbo->placements[i].flags & TTM_PL_FLAG_VRAM)) &&
-	    rbo->tbo.mem.size > 512 * 1024) {
+	if (rbo->tbo.mem.size > 512 * 1024) {
 		for (i = 0; i < c; i++) {
 			rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN;
 		}
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
@ 2014-10-28  9:35 ` Michel Dänzer
  2014-10-28  9:35 ` [PATCH 3/5] drm/ttm: Use only " Michel Dänzer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Michel Dänzer @ 2014-10-28  9:35 UTC (permalink / raw)
  To: dri-devel; +Cc: Lauri Kasanen

From: Michel Dänzer <michel.daenzer@amd.com>

If the BO should be placed at the top of the area, we should start looking
for holes from the top.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index 964387f..1e93f6c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
 	struct drm_mm *mm = &rman->mm;
 	struct drm_mm_node *node = NULL;
+	enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
 	enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
 	unsigned long lpfn;
 	int ret;
@@ -67,15 +68,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	if (!node)
 		return -ENOMEM;
 
-	if (place->flags & TTM_PL_FLAG_TOPDOWN)
+	if (place->flags & TTM_PL_FLAG_TOPDOWN) {
+		sflags |= DRM_MM_SEARCH_BELOW;
 		aflags = DRM_MM_CREATE_TOP;
+	}
 
 	spin_lock(&rman->lock);
 	ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages,
 					  mem->page_alignment, 0,
 					  place->fpfn, lpfn,
-					  DRM_MM_SEARCH_BEST,
-					  aflags);
+					  sflags, aflags);
 	spin_unlock(&rman->lock);
 
 	if (unlikely(ret)) {
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
  2014-10-28  9:35 ` [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN Michel Dänzer
@ 2014-10-28  9:35 ` Michel Dänzer
  2014-10-28 11:10   ` Daniel Vetter
  2014-10-28  9:35 ` [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP Michel Dänzer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Michel Dänzer @ 2014-10-28  9:35 UTC (permalink / raw)
  To: dri-devel; +Cc: Lauri Kasanen

From: Michel Dänzer <michel.daenzer@amd.com>

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems
against the idea of TTM_PL_FLAG_TOPDOWN:

* The smallest hole may be in the overall bottom of the area
* If the hole isn't much larger than the BO, it doesn't make much
  difference whether the BO is placed at the bottom or at the top of the
  hole

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index 1e93f6c..aa0bd05 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 		return -ENOMEM;
 
 	if (place->flags & TTM_PL_FLAG_TOPDOWN) {
-		sflags |= DRM_MM_SEARCH_BELOW;
+		sflags = DRM_MM_SEARCH_BELOW;
 		aflags = DRM_MM_CREATE_TOP;
 	}
 
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP
  2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
  2014-10-28  9:35 ` [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN Michel Dänzer
  2014-10-28  9:35 ` [PATCH 3/5] drm/ttm: Use only " Michel Dänzer
@ 2014-10-28  9:35 ` Michel Dänzer
  2014-10-30  8:36   ` [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST Michel Dänzer
  2014-10-28  9:35 ` [PATCH 5/5] drm/radeon: Use TTM_PL_FLAG_BOTTOMUP Michel Dänzer
  2014-10-28 10:46 ` [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Lauri Kasanen
  4 siblings, 1 reply; 14+ messages in thread
From: Michel Dänzer @ 2014-10-28  9:35 UTC (permalink / raw)
  To: dri-devel; +Cc: Lauri Kasanen

From: Michel Dänzer <michel.daenzer@amd.com>

For placing a BO strictly from the bottom up, i.e. in the lowest hole which
can hold it.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 5 +++++
 include/drm/ttm/ttm_placement.h      | 5 ++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index aa0bd05..a64397b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -73,6 +73,11 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 		aflags = DRM_MM_CREATE_TOP;
 	}
 
+	if (place->flags & TTM_PL_FLAG_BOTTOMUP) {
+		sflags = DRM_MM_SEARCH_DEFAULT;
+		aflags = DRM_MM_CREATE_DEFAULT;
+	}
+
 	spin_lock(&rman->lock);
 	ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages,
 					  mem->page_alignment, 0,
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index 8ed44f9..c9ae86a 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -66,7 +66,9 @@
  * TTM_PL_FLAG_NO_EVICT means that the buffer may never
  * be evicted to make room for other buffers.
  * TTM_PL_FLAG_TOPDOWN requests to be placed from the
- * top of the memory area, instead of the bottom.
+ * top of the memory area.
+ * TTM_PL_FLAG_BOTTOMUP requests to be placed from the
+ * bottom of the memory area.
  */
 
 #define TTM_PL_FLAG_CACHED      (1 << 16)
@@ -75,6 +77,7 @@
 #define TTM_PL_FLAG_SHARED      (1 << 20)
 #define TTM_PL_FLAG_NO_EVICT    (1 << 21)
 #define TTM_PL_FLAG_TOPDOWN     (1 << 22)
+#define TTM_PL_FLAG_BOTTOMUP    (1 << 23)
 
 #define TTM_PL_MASK_CACHING     (TTM_PL_FLAG_CACHED | \
 				 TTM_PL_FLAG_UNCACHED | \
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] drm/radeon: Use TTM_PL_FLAG_BOTTOMUP
  2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
                   ` (2 preceding siblings ...)
  2014-10-28  9:35 ` [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP Michel Dänzer
@ 2014-10-28  9:35 ` Michel Dänzer
  2014-10-28 10:46 ` [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Lauri Kasanen
  4 siblings, 0 replies; 14+ messages in thread
From: Michel Dänzer @ 2014-10-28  9:35 UTC (permalink / raw)
  To: dri-devel; +Cc: Lauri Kasanen

From: Michel Dänzer <michel.daenzer@amd.com>

We want to allocate small BOs strictly from the bottom up, just as large
BOs are allocated strictly from the top down.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/radeon/radeon_object.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index d23b0dd..d9f7a02 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -180,9 +180,11 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 	 * 512kb was measured as the most optimal number.
 	 */
 	if (rbo->tbo.mem.size > 512 * 1024) {
-		for (i = 0; i < c; i++) {
+		for (i = 0; i < c; i++)
 			rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN;
-		}
+	} else {
+		for (i = 0; i < c; i++)
+			rbo->placements[i].flags |= TTM_PL_FLAG_BOTTOMUP;
 	}
 }
 
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs
  2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
                   ` (3 preceding siblings ...)
  2014-10-28  9:35 ` [PATCH 5/5] drm/radeon: Use TTM_PL_FLAG_BOTTOMUP Michel Dänzer
@ 2014-10-28 10:46 ` Lauri Kasanen
  4 siblings, 0 replies; 14+ messages in thread
From: Lauri Kasanen @ 2014-10-28 10:46 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On Tue, 28 Oct 2014 18:35:02 +0900
Michel Dänzer <michel@daenzer.net> wrote:

> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but
> AFAICT it does.
> 
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

This series is

Reviewed-by: Lauri Kasanen <cand@gmx.com>

- Lauri
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-28  9:35 ` [PATCH 3/5] drm/ttm: Use only " Michel Dänzer
@ 2014-10-28 11:10   ` Daniel Vetter
  2014-10-30  8:41     ` Michel Dänzer
  2014-10-30 15:07     ` Thomas Hellstrom
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Vetter @ 2014-10-28 11:10 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Lauri Kasanen, dri-devel

On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems
> against the idea of TTM_PL_FLAG_TOPDOWN:
> 
> * The smallest hole may be in the overall bottom of the area
> * If the hole isn't much larger than the BO, it doesn't make much
>   difference whether the BO is placed at the bottom or at the top of the
>   hole
> 
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart
allocations from the same execbuf, and usually those get recycled around
the same time. Which means you'll just fragment your mm even more if you
try to find the best hole instead of just picking one and the stuffing the
entire execbuf into it. So imo we might as well just kill it.

Another one that I've advertised a bunch of times already is the scan
roaster in drm_mm.c: Currently ttm just evicts until there's a big enough
hole, which is fairly awful if you have quasi-segmented memory like with
top-down/bottom-up schemes and different ranges for different units. With
the roaster you just walk the lru and build up potential holes until
there's a suitable one, and then only evict those buffers. Which means if
you have a certain range of memory under very high pressure (e.g. the 256M
which uvd can use or whatever it is), then you wont thrash all the other
vram too.

Cheers, Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> index 1e93f6c..aa0bd05 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> @@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
>  		return -ENOMEM;
>  
>  	if (place->flags & TTM_PL_FLAG_TOPDOWN) {
> -		sflags |= DRM_MM_SEARCH_BELOW;
> +		sflags = DRM_MM_SEARCH_BELOW;
>  		aflags = DRM_MM_CREATE_TOP;
>  	}
>  
> -- 
> 2.1.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST
  2014-10-28  9:35 ` [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP Michel Dänzer
@ 2014-10-30  8:36   ` Michel Dänzer
  2014-10-31 18:10     ` Alex Deucher
  0 siblings, 1 reply; 14+ messages in thread
From: Michel Dänzer @ 2014-10-30  8:36 UTC (permalink / raw)
  To: dri-devel

From: Michel Dänzer <michel.daenzer@amd.com>

In the words of Daniel Vetter:

«I think SEARCH_BEST is pretty much always a bad idea - it rips apart
allocations from the same execbuf, and usually those get recycled around
the same time. Which means you'll just fragment your mm even more if you
try to find the best hole instead of just picking one and the stuffing
the entire execbuf into it. So imo we might as well just kill it.»

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---

This patch replaces the previous patches 4 and 5.

Not sure about the function header comment hunk, maybe what it says is
true even with DRM_MM_SEARCH_BELOW?

 drivers/gpu/drm/drm_mm.c             | 36 +++++-------------------------------
 drivers/gpu/drm/ttm/ttm_bo_manager.c |  2 +-
 include/drm/drm_mm.h                 |  3 +--
 3 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 04a209e..3e0ef8c 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 						      enum drm_mm_search_flags flags)
 {
 	struct drm_mm_node *entry;
-	struct drm_mm_node *best;
 	unsigned long adj_start;
 	unsigned long adj_end;
-	unsigned long best_size;
 
 	BUG_ON(mm->scanned_blocks);
 
-	best = NULL;
-	best_size = ~0UL;
-
 	__drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
 			       flags & DRM_MM_SEARCH_BELOW) {
 		unsigned long hole_size = adj_end - adj_start;
@@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 				continue;
 		}
 
-		if (!check_free_hole(adj_start, adj_end, size, alignment))
-			continue;
-
-		if (!(flags & DRM_MM_SEARCH_BEST))
+		if (check_free_hole(adj_start, adj_end, size, alignment))
 			return entry;
-
-		if (hole_size < best_size) {
-			best = entry;
-			best_size = hole_size;
-		}
 	}
 
-	return best;
+	return NULL;
 }
 
 static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
@@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_
 							enum drm_mm_search_flags flags)
 {
 	struct drm_mm_node *entry;
-	struct drm_mm_node *best;
 	unsigned long adj_start;
 	unsigned long adj_end;
-	unsigned long best_size;
 
 	BUG_ON(mm->scanned_blocks);
 
-	best = NULL;
-	best_size = ~0UL;
-
 	__drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
 			       flags & DRM_MM_SEARCH_BELOW) {
 		unsigned long hole_size = adj_end - adj_start;
@@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_
 				continue;
 		}
 
-		if (!check_free_hole(adj_start, adj_end, size, alignment))
-			continue;
-
-		if (!(flags & DRM_MM_SEARCH_BEST))
+		if (check_free_hole(adj_start, adj_end, size, alignment))
 			return entry;
-
-		if (hole_size < best_size) {
-			best = entry;
-			best_size = hole_size;
-		}
 	}
 
-	return best;
+	return NULL;
 }
 
 /**
@@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block);
  * corrupted.
  *
  * When the scan list is empty, the selected memory nodes can be freed. An
- * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then
+ * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then
  * return the just freed block (because its at the top of the free_stack list).
  *
  * Returns:
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index aa0bd05..42c2aa9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
 	struct drm_mm *mm = &rman->mm;
 	struct drm_mm_node *node = NULL;
-	enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
+	enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT;
 	enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
 	unsigned long lpfn;
 	int ret;
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index a24addf..f38f7a0 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -46,8 +46,7 @@
 
 enum drm_mm_search_flags {
 	DRM_MM_SEARCH_DEFAULT =		0,
-	DRM_MM_SEARCH_BEST =		1 << 0,
-	DRM_MM_SEARCH_BELOW =		1 << 1,
+	DRM_MM_SEARCH_BELOW =		1 << 0,
 };
 
 enum drm_mm_allocator_flags {
-- 
2.1.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-28 11:10   ` Daniel Vetter
@ 2014-10-30  8:41     ` Michel Dänzer
  2014-10-30 15:07     ` Thomas Hellstrom
  1 sibling, 0 replies; 14+ messages in thread
From: Michel Dänzer @ 2014-10-30  8:41 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Lauri Kasanen, dri-devel

On 28.10.2014 20:10, Daniel Vetter wrote:
> On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems
>> against the idea of TTM_PL_FLAG_TOPDOWN:
>>
>> * The smallest hole may be in the overall bottom of the area
>> * If the hole isn't much larger than the BO, it doesn't make much
>>    difference whether the BO is placed at the bottom or at the top of the
>>    hole
>>
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>
> tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart
> allocations from the same execbuf, and usually those get recycled around
> the same time. Which means you'll just fragment your mm even more if you
> try to find the best hole instead of just picking one and the stuffing the
> entire execbuf into it. So imo we might as well just kill it.

Sent out a new patch 4 which nukes it.


> Another one that I've advertised a bunch of times already is the scan
> roaster in drm_mm.c: Currently ttm just evicts until there's a big enough
> hole, which is fairly awful if you have quasi-segmented memory like with
> top-down/bottom-up schemes and different ranges for different units. With
> the roaster you just walk the lru and build up potential holes until
> there's a suitable one, and then only evict those buffers. Which means if
> you have a certain range of memory under very high pressure (e.g. the 256M
> which uvd can use or whatever it is), then you wont thrash all the other
> vram too.

I recently made some changes in TTM and radeon to alleviate that 
particular issue, but the scan roaster does sound useful. Unfortunately, 
modifying TTM to use it is a bit too involved for me right now, any 
volunteers?


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-28 11:10   ` Daniel Vetter
  2014-10-30  8:41     ` Michel Dänzer
@ 2014-10-30 15:07     ` Thomas Hellstrom
  2014-10-31 17:19       ` Daniel Vetter
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2014-10-30 15:07 UTC (permalink / raw)
  To: dri-devel

On 10/28/2014 12:10 PM, Daniel Vetter wrote:
> On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems
>> against the idea of TTM_PL_FLAG_TOPDOWN:
>>
>> * The smallest hole may be in the overall bottom of the area
>> * If the hole isn't much larger than the BO, it doesn't make much
>>   difference whether the BO is placed at the bottom or at the top of the
>>   hole
>>
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart
> allocations from the same execbuf, and usually those get recycled around
> the same time. Which means you'll just fragment your mm even more if you
> try to find the best hole instead of just picking one and the stuffing the
> entire execbuf into it. So imo we might as well just kill it.
>
> Another one that I've advertised a bunch of times already is the scan
> roaster in drm_mm.c: Currently ttm just evicts until there's a big enough
> hole, which is fairly awful if you have quasi-segmented memory like with
> top-down/bottom-up schemes and different ranges for different units. With
> the roaster you just walk the lru and build up potential holes until
> there's a suitable one, and then only evict those buffers. Which means if
> you have a certain range of memory under very high pressure (e.g. the 256M
> which uvd can use or whatever it is), then you wont thrash all the other
> vram too.
>
> Cheers, Daniel
>
And I think I've commented each time why that is currently not possible
or at least worthile with TTM ;).

Although the idea is great, even if you make a nice list of suitable
buffers to evict, as soon as you drop the LRU spinlock, that list is
invalid. Worse, while you start evicting your list, other processes may
steal part of the hole you've created.

As some point we should perhaps figure out how to protect each managed
area with a separate mutex, but that would create trouble if, for
example, evicting from VRAM to managed GART.

/Thomas



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN
  2014-10-30 15:07     ` Thomas Hellstrom
@ 2014-10-31 17:19       ` Daniel Vetter
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2014-10-31 17:19 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: dri-devel

On Thu, Oct 30, 2014 at 04:07:05PM +0100, Thomas Hellstrom wrote:
> On 10/28/2014 12:10 PM, Daniel Vetter wrote:
> > On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
> >> From: Michel Dänzer <michel.daenzer@amd.com>
> >>
> >> DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems
> >> against the idea of TTM_PL_FLAG_TOPDOWN:
> >>
> >> * The smallest hole may be in the overall bottom of the area
> >> * If the hole isn't much larger than the BO, it doesn't make much
> >>   difference whether the BO is placed at the bottom or at the top of the
> >>   hole
> >>
> >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> > tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart
> > allocations from the same execbuf, and usually those get recycled around
> > the same time. Which means you'll just fragment your mm even more if you
> > try to find the best hole instead of just picking one and the stuffing the
> > entire execbuf into it. So imo we might as well just kill it.
> >
> > Another one that I've advertised a bunch of times already is the scan
> > roaster in drm_mm.c: Currently ttm just evicts until there's a big enough
> > hole, which is fairly awful if you have quasi-segmented memory like with
> > top-down/bottom-up schemes and different ranges for different units. With
> > the roaster you just walk the lru and build up potential holes until
> > there's a suitable one, and then only evict those buffers. Which means if
> > you have a certain range of memory under very high pressure (e.g. the 256M
> > which uvd can use or whatever it is), then you wont thrash all the other
> > vram too.
> >
> > Cheers, Daniel
> >
> And I think I've commented each time why that is currently not possible
> or at least worthile with TTM ;).
> 
> Although the idea is great, even if you make a nice list of suitable
> buffers to evict, as soon as you drop the LRU spinlock, that list is
> invalid. Worse, while you start evicting your list, other processes may
> steal part of the hole you've created.
> 
> As some point we should perhaps figure out how to protect each managed
> area with a separate mutex, but that would create trouble if, for
> example, evicting from VRAM to managed GART.

Yeah, you need to do the entire scanning/evicting under the lru lock,
which means you need to jump over all the buffers you can't immediately
trylock. Which also means that you need a slowpath fallback that rips out
everything in case the buffer doesn't fit anywhere. Wrt the lru lock you
need to at least overwrite the generic placement/eviction code since you
need to a bit more logic and state-keeping for drm_mm than what's
currently exposed in the super-generic ttm managed area allocator (which
apparently could also manage an idr). But shouldn't be more fuzz than a
new vfunc with the current code as default implementation.

For lru lock recursion (due to vram->gtt eviction or similar) I guess
either rework stuff to allow the entire eviction process under lru. Or
keep the drm_mm allocations on some separate list until you've evicted
everything (dropping the lock meanwhile) and the mass-free the entire list
under the lru lock and immediately do your allocation.

I'm not sure whether the long hold times for the lru lock really would be
a problem (they'll happen even when you drop it for actual eviction since
walking the lru lists is costly when you jump over lots of buffers that
don't get you towards your goal), since evicting all buffers is probably
going to thrash the system much worse anyway. And for the worst case a
regular sched for lru lock contention or preemption should prevent
interactivity disasters for parallel cmd submission.

So yeah some details to take care of, but I don't think there's anything
insurmountable. It's just that I don't have time (nor justification) for
it. And it will be indeed quite a bit of work to get going.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST
  2014-10-30  8:36   ` [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST Michel Dänzer
@ 2014-10-31 18:10     ` Alex Deucher
  2014-11-04  8:47       ` Michel Dänzer
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Deucher @ 2014-10-31 18:10 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Maling list - DRI developers

On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer <michel@daenzer.net> wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> In the words of Daniel Vetter:
>
> «I think SEARCH_BEST is pretty much always a bad idea - it rips apart
> allocations from the same execbuf, and usually those get recycled around
> the same time. Which means you'll just fragment your mm even more if you
> try to find the best hole instead of just picking one and the stuffing
> the entire execbuf into it. So imo we might as well just kill it.»
>
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

Applied this series to my -next tree.

Thanks,

Alex

> ---
>
> This patch replaces the previous patches 4 and 5.
>
> Not sure about the function header comment hunk, maybe what it says is
> true even with DRM_MM_SEARCH_BELOW?
>
>  drivers/gpu/drm/drm_mm.c             | 36 +++++-------------------------------
>  drivers/gpu/drm/ttm/ttm_bo_manager.c |  2 +-
>  include/drm/drm_mm.h                 |  3 +--
>  3 files changed, 7 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index 04a209e..3e0ef8c 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
>                                                       enum drm_mm_search_flags flags)
>  {
>         struct drm_mm_node *entry;
> -       struct drm_mm_node *best;
>         unsigned long adj_start;
>         unsigned long adj_end;
> -       unsigned long best_size;
>
>         BUG_ON(mm->scanned_blocks);
>
> -       best = NULL;
> -       best_size = ~0UL;
> -
>         __drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
>                                flags & DRM_MM_SEARCH_BELOW) {
>                 unsigned long hole_size = adj_end - adj_start;
> @@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
>                                 continue;
>                 }
>
> -               if (!check_free_hole(adj_start, adj_end, size, alignment))
> -                       continue;
> -
> -               if (!(flags & DRM_MM_SEARCH_BEST))
> +               if (check_free_hole(adj_start, adj_end, size, alignment))
>                         return entry;
> -
> -               if (hole_size < best_size) {
> -                       best = entry;
> -                       best_size = hole_size;
> -               }
>         }
>
> -       return best;
> +       return NULL;
>  }
>
>  static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
> @@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_
>                                                         enum drm_mm_search_flags flags)
>  {
>         struct drm_mm_node *entry;
> -       struct drm_mm_node *best;
>         unsigned long adj_start;
>         unsigned long adj_end;
> -       unsigned long best_size;
>
>         BUG_ON(mm->scanned_blocks);
>
> -       best = NULL;
> -       best_size = ~0UL;
> -
>         __drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
>                                flags & DRM_MM_SEARCH_BELOW) {
>                 unsigned long hole_size = adj_end - adj_start;
> @@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_
>                                 continue;
>                 }
>
> -               if (!check_free_hole(adj_start, adj_end, size, alignment))
> -                       continue;
> -
> -               if (!(flags & DRM_MM_SEARCH_BEST))
> +               if (check_free_hole(adj_start, adj_end, size, alignment))
>                         return entry;
> -
> -               if (hole_size < best_size) {
> -                       best = entry;
> -                       best_size = hole_size;
> -               }
>         }
>
> -       return best;
> +       return NULL;
>  }
>
>  /**
> @@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block);
>   * corrupted.
>   *
>   * When the scan list is empty, the selected memory nodes can be freed. An
> - * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then
> + * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then
>   * return the just freed block (because its at the top of the free_stack list).
>   *
>   * Returns:
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> index aa0bd05..42c2aa9 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> @@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
>         struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
>         struct drm_mm *mm = &rman->mm;
>         struct drm_mm_node *node = NULL;
> -       enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
> +       enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT;
>         enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
>         unsigned long lpfn;
>         int ret;
> diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
> index a24addf..f38f7a0 100644
> --- a/include/drm/drm_mm.h
> +++ b/include/drm/drm_mm.h
> @@ -46,8 +46,7 @@
>
>  enum drm_mm_search_flags {
>         DRM_MM_SEARCH_DEFAULT =         0,
> -       DRM_MM_SEARCH_BEST =            1 << 0,
> -       DRM_MM_SEARCH_BELOW =           1 << 1,
> +       DRM_MM_SEARCH_BELOW =           1 << 0,
>  };
>
>  enum drm_mm_allocator_flags {
> --
> 2.1.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST
  2014-10-31 18:10     ` Alex Deucher
@ 2014-11-04  8:47       ` Michel Dänzer
  2014-11-04 14:30         ` Alex Deucher
  0 siblings, 1 reply; 14+ messages in thread
From: Michel Dänzer @ 2014-11-04  8:47 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Daniel Vetter, Maling list - DRI developers

On 01.11.2014 03:10, Alex Deucher wrote:
> On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> In the words of Daniel Vetter:
>>
>> «I think SEARCH_BEST is pretty much always a bad idea - it rips apart
>> allocations from the same execbuf, and usually those get recycled around
>> the same time. Which means you'll just fragment your mm even more if you
>> try to find the best hole instead of just picking one and the stuffing
>> the entire execbuf into it. So imo we might as well just kill it.»
>>
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>
> Applied this series to my -next tree.

Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused 
performance regressions, presumably due to excessive eviction from VRAM, 
see https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can 
reproduce a significant bad effect with the UE4 Elemental demo on my 
development machine. Please drop this patch.

The original patches 4 & 5 have the same effect, so they shouldn't be 
applied either. Maybe we can try this again once TTM is using the MM 
scan roaster for eviction.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST
  2014-11-04  8:47       ` Michel Dänzer
@ 2014-11-04 14:30         ` Alex Deucher
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Deucher @ 2014-11-04 14:30 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Daniel Vetter, Maling list - DRI developers

On Tue, Nov 4, 2014 at 3:47 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 01.11.2014 03:10, Alex Deucher wrote:
>>
>> On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>>
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> In the words of Daniel Vetter:
>>>
>>> «I think SEARCH_BEST is pretty much always a bad idea - it rips apart
>>> allocations from the same execbuf, and usually those get recycled around
>>> the same time. Which means you'll just fragment your mm even more if you
>>> try to find the best hole instead of just picking one and the stuffing
>>> the entire execbuf into it. So imo we might as well just kill it.»
>>>
>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>>
>>
>> Applied this series to my -next tree.
>
>
> Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused performance
> regressions, presumably due to excessive eviction from VRAM, see
> https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can reproduce a
> significant bad effect with the UE4 Elemental demo on my development
> machine. Please drop this patch.
>
> The original patches 4 & 5 have the same effect, so they shouldn't be
> applied either. Maybe we can try this again once TTM is using the MM scan
> roaster for eviction.
>

I dropped this patch.

Thanks,

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-11-04 14:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-28  9:35 [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Michel Dänzer
2014-10-28  9:35 ` [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN Michel Dänzer
2014-10-28  9:35 ` [PATCH 3/5] drm/ttm: Use only " Michel Dänzer
2014-10-28 11:10   ` Daniel Vetter
2014-10-30  8:41     ` Michel Dänzer
2014-10-30 15:07     ` Thomas Hellstrom
2014-10-31 17:19       ` Daniel Vetter
2014-10-28  9:35 ` [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP Michel Dänzer
2014-10-30  8:36   ` [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST Michel Dänzer
2014-10-31 18:10     ` Alex Deucher
2014-11-04  8:47       ` Michel Dänzer
2014-11-04 14:30         ` Alex Deucher
2014-10-28  9:35 ` [PATCH 5/5] drm/radeon: Use TTM_PL_FLAG_BOTTOMUP Michel Dänzer
2014-10-28 10:46 ` [PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs Lauri Kasanen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.