[PATCH v5 0/6] Best effort contiguous VRAM allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/6] Best effort contiguous VRAM allocation
@ 2024-04-23 15:28 Philip Yang
  2024-04-23 15:28 ` [PATCH v5 1/6] drm/amdgpu: Support " Philip Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

This patch series implement new KFD memory alloc flag for best effort contiguous
VRAM allocation, to support peer direct access RDMA device with limited scatter-gather
dma capability.

v2: rebase on patch ("drm/amdgpu: Modify the contiguous flags behaviour")
    to avoid adding the new GEM flag

v3: add patch 2 to handle sg segment size limit (Christian)

v4: remove the buddy block size limit from vram mgr because sg table creation already
    remove the limit, and resource uses u64 to handle block start, size (Christian)

v5: remove patch 7 which is not for upstream, add AMDGPU prefix to the macro name.

Philip Yang (6):
  drm/amdgpu: Support contiguous VRAM allocation
  drm/amdgpu: Handle sg size limit for contiguous allocation
  drm/amdgpu: Evict BOs from same process for contiguous allocation
  drm/amdkfd: Evict BO itself for contiguous allocation
  drm/amdkfd: Increase KFD bo restore wait time
  drm/amdkfd: Bump kfd version for contiguous VRAM allocation

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 20 ++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 12 +++++------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 +-
 include/uapi/linux/kfd_ioctl.h                |  4 +++-
 5 files changed, 31 insertions(+), 10 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
@ 2024-04-23 15:28 ` Philip Yang
  2024-04-23 22:17   ` Felix Kuehling
  2024-04-23 15:28 ` [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation Philip Yang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.

Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++++
 include/uapi/linux/kfd_ioctl.h                   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..ef9154043757 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1712,6 +1712,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 			alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 			alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ?
 			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
+
+			/* For contiguous VRAM allocation */
+			if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
+				alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
 		}
 		xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
 					0 : fpriv->xcp_id;
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 2040a470ddb4..c1394c162d4e 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
 #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
 #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED	(1 << 25)
 #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT	(1 << 24)
+#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT	(1 << 23)
 
 /* Allocate memory for later SVM (shared virtual memory) mapping.
  *
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-23 15:28 ` [PATCH v5 1/6] drm/amdgpu: Support " Philip Yang
@ 2024-04-23 22:17   ` Felix Kuehling
  2024-04-24 13:58     ` Philip Yang
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Kuehling @ 2024-04-23 22:17 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig, Arunpravin.PaneerSelvam


On 2024-04-23 11:28, Philip Yang wrote:
> RDMA device with limited scatter-gather ability requires contiguous VRAM
> buffer allocation for RDMA peer direct support.
>
> Add a new KFD alloc memory flag and store as bo alloc flag
> AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
> peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
> VRAM buddy allocator to get contiguous VRAM.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++++
>   include/uapi/linux/kfd_ioctl.h                   | 1 +
>   2 files changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 0ae9fd844623..ef9154043757 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1712,6 +1712,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   			alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>   			alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ?
>   			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
> +
> +			/* For contiguous VRAM allocation */
> +			if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
> +				alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
>   		}
>   		xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
>   					0 : fpriv->xcp_id;
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 2040a470ddb4..c1394c162d4e 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
>   #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
>   #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED	(1 << 25)
>   #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT	(1 << 24)
> +#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT	(1 << 23)

If I understand it correctly, AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS was 
redefined to mean "best effort". Maybe we can drop the explicit 
"BEST_EFFORT" from this flag as well to keep the name to a reasonable 
length.

Regards,
   Felix


>   
>   /* Allocate memory for later SVM (shared virtual memory) mapping.
>    *

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-23 22:17   ` Felix Kuehling
@ 2024-04-24 13:58     ` Philip Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Philip Yang @ 2024-04-24 13:58 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx
  Cc: christian.koenig, Arunpravin.PaneerSelvam

[-- Attachment #1: Type: text/html, Size: 5090 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
  2024-04-23 15:28 ` [PATCH v5 1/6] drm/amdgpu: Support " Philip Yang
@ 2024-04-23 15:28 ` Philip Yang
  2024-04-24  6:14   ` Christian König
  2024-04-23 15:28 ` [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process " Philip Yang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
is unsigned int, and some users of it cast to a signed int, so every
segment of sg table is limited to size 2GB maximum.

For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
MAX_SG_SEGMENT_SIZE.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 4be8b091099a..ebffb58ea53a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -31,6 +31,8 @@
 #include "amdgpu_atomfirmware.h"
 #include "atom.h"
 
+#define AMDGPU_MAX_SG_SEGMENT_SIZE	(2UL << 30)
+
 struct amdgpu_vram_reservation {
 	u64 start;
 	u64 size;
@@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 
 		BUG_ON(min_block_size < mm->chunk_size);
 
-		/* Limit maximum size to 2GiB due to SG table limitations */
-		size = min(remaining_size, 2ULL << 30);
-
+		size = remaining_size;
 		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
 				!(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
 			min_block_size = (u64)pages_per_block << PAGE_SHIFT;
@@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 	amdgpu_res_first(res, offset, length, &cursor);
 	while (cursor.remaining) {
 		num_entries++;
-		amdgpu_res_next(&cursor, cursor.size);
+		amdgpu_res_next(&cursor, min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE));
 	}
 
 	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);
@@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 	amdgpu_res_first(res, offset, length, &cursor);
 	for_each_sgtable_sg((*sgt), sg, i) {
 		phys_addr_t phys = cursor.start + adev->gmc.aper_base;
-		size_t size = cursor.size;
+		unsigned long size = min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE);
 		dma_addr_t addr;
 
 		addr = dma_map_resource(dev, phys, size, dir,
@@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 		sg_dma_address(sg) = addr;
 		sg_dma_len(sg) = size;
 
-		amdgpu_res_next(&cursor, cursor.size);
+		amdgpu_res_next(&cursor, size);
 	}
 
 	return 0;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation
  2024-04-23 15:28 ` [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation Philip Yang
@ 2024-04-24  6:14   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2024-04-24  6:14 UTC (permalink / raw)
  To: Philip Yang, amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam

Am 23.04.24 um 17:28 schrieb Philip Yang:
> Define macro MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length
> is unsigned int, and some users of it cast to a signed int, so every
> segment of sg table is limited to size 2GB maximum.
>
> For contiguous VRAM allocation, don't limit the max buddy block size in
> order to get contiguous VRAM memory. To workaround the sg table segment
> size limit, allocate multiple segments if contiguous size is bigger than
> MAX_SG_SEGMENT_SIZE.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 4be8b091099a..ebffb58ea53a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -31,6 +31,8 @@
>   #include "amdgpu_atomfirmware.h"
>   #include "atom.h"
>   
> +#define AMDGPU_MAX_SG_SEGMENT_SIZE	(2UL << 30)
> +
>   struct amdgpu_vram_reservation {
>   	u64 start;
>   	u64 size;
> @@ -532,9 +534,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>   
>   		BUG_ON(min_block_size < mm->chunk_size);
>   
> -		/* Limit maximum size to 2GiB due to SG table limitations */
> -		size = min(remaining_size, 2ULL << 30);
> -
> +		size = remaining_size;
>   		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
>   				!(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
>   			min_block_size = (u64)pages_per_block << PAGE_SHIFT;
> @@ -675,7 +675,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>   	amdgpu_res_first(res, offset, length, &cursor);
>   	while (cursor.remaining) {
>   		num_entries++;
> -		amdgpu_res_next(&cursor, cursor.size);
> +		amdgpu_res_next(&cursor, min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE));
>   	}
>   
>   	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);
> @@ -695,7 +695,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>   	amdgpu_res_first(res, offset, length, &cursor);
>   	for_each_sgtable_sg((*sgt), sg, i) {
>   		phys_addr_t phys = cursor.start + adev->gmc.aper_base;
> -		size_t size = cursor.size;
> +		unsigned long size = min(cursor.size, AMDGPU_MAX_SG_SEGMENT_SIZE);
>   		dma_addr_t addr;
>   
>   		addr = dma_map_resource(dev, phys, size, dir,
> @@ -708,7 +708,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>   		sg_dma_address(sg) = addr;
>   		sg_dma_len(sg) = size;
>   
> -		amdgpu_res_next(&cursor, cursor.size);
> +		amdgpu_res_next(&cursor, size);
>   	}
>   
>   	return 0;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process for contiguous allocation
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
  2024-04-23 15:28 ` [PATCH v5 1/6] drm/amdgpu: Support " Philip Yang
  2024-04-23 15:28 ` [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation Philip Yang
@ 2024-04-23 15:28 ` Philip Yang
  2024-04-23 22:16   ` Felix Kuehling
  2024-04-23 15:28 ` [PATCH v5 4/6] drm/amdkfd: Evict BO itself " Philip Yang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.

If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
TTM evict KFD BOs from the same process, this will evict the user queues
first, and restore the queues later after contiguous VRAM allocation.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 851509c6e90e..c907d6005641 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1398,7 +1398,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	 */
 	dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
 				DMA_RESV_USAGE_BOOKKEEP, f) {
-		if (amdkfd_fence_check_mm(f, current->mm))
+		if (amdkfd_fence_check_mm(f, current->mm) &&
+		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
 			return false;
 	}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process for contiguous allocation
  2024-04-23 15:28 ` [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process " Philip Yang
@ 2024-04-23 22:16   ` Felix Kuehling
  0 siblings, 0 replies; 14+ messages in thread
From: Felix Kuehling @ 2024-04-23 22:16 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig, Arunpravin.PaneerSelvam

On 2024-04-23 11:28, Philip Yang wrote:
> When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
> memory then retry the allocation, this skips the KFD BOs from the same
> process because KFD require all BOs are resident for user queues.
>
> If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
> TTM evict KFD BOs from the same process, this will evict the user queues
> first, and restore the queues later after contiguous VRAM allocation.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 851509c6e90e..c907d6005641 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1398,7 +1398,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>   	 */
>   	dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
>   				DMA_RESV_USAGE_BOOKKEEP, f) {
> -		if (amdkfd_fence_check_mm(f, current->mm))
> +		if (amdkfd_fence_check_mm(f, current->mm) &&
> +		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
>   			return false;
>   	}
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (2 preceding siblings ...)
  2024-04-23 15:28 ` [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process " Philip Yang
@ 2024-04-23 15:28 ` Philip Yang
  2024-04-23 22:15   ` Felix Kuehling
  2024-04-23 15:28 ` [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
  2024-04-23 15:29 ` [PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang
  5 siblings, 1 reply; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ef9154043757..5d118e5580ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1470,13 +1470,27 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)
 	if (unlikely(ret))
 		return ret;
 
+	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
+		/*
+		 * If bo is not contiguous on VRAM, move to system memory first to ensure
+		 * we can get contiguous VRAM space after evicting other BOs.
+		 */
+		if (!(bo->tbo.resource->placement & TTM_PL_FLAG_CONTIGUOUS)) {
+			ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);
+			if (unlikely(ret)) {
+				pr_debug("validate bo 0x%p to GTT failed %d\n", &bo->tbo, ret);
+				goto out;
+			}
+		}
+	}
+
 	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
 	if (ret)
 		pr_err("Error in Pinning BO to domain: %d\n", domain);
 
 	amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
+out:
 	amdgpu_bo_unreserve(bo);
-
 	return ret;
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation
  2024-04-23 15:28 ` [PATCH v5 4/6] drm/amdkfd: Evict BO itself " Philip Yang
@ 2024-04-23 22:15   ` Felix Kuehling
  2024-04-24 13:41     ` Philip Yang
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Kuehling @ 2024-04-23 22:15 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig, Arunpravin.PaneerSelvam

On 2024-04-23 11:28, Philip Yang wrote:
> If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
> system memory first to free the VRAM space, then allocate contiguous
> VRAM space, and then move it from system memory back to VRAM.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index ef9154043757..5d118e5580ce 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1470,13 +1470,27 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)
>   	if (unlikely(ret))
>   		return ret;
>   
> +	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
> +		/*
> +		 * If bo is not contiguous on VRAM, move to system memory first to ensure
> +		 * we can get contiguous VRAM space after evicting other BOs.
> +		 */
> +		if (!(bo->tbo.resource->placement & TTM_PL_FLAG_CONTIGUOUS)) {
> +			ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);

amdgpu_amdkfd_bo_validate is meant for use in kernel threads. It always 
runs uninterruptible. I believe pin_bo runs in the context of ioctls 
from user mode. So it should be interruptible.

Regards,
   Felix


> +			if (unlikely(ret)) {
> +				pr_debug("validate bo 0x%p to GTT failed %d\n", &bo->tbo, ret);
> +				goto out;
> +			}
> +		}
> +	}
> +
>   	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
>   	if (ret)
>   		pr_err("Error in Pinning BO to domain: %d\n", domain);
>   
>   	amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
> +out:
>   	amdgpu_bo_unreserve(bo);
> -
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/6] drm/amdkfd: Evict BO itself for contiguous allocation
  2024-04-23 22:15   ` Felix Kuehling
@ 2024-04-24 13:41     ` Philip Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Philip Yang @ 2024-04-24 13:41 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx
  Cc: christian.koenig, Arunpravin.PaneerSelvam

[-- Attachment #1: Type: text/html, Size: 4934 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (3 preceding siblings ...)
  2024-04-23 15:28 ` [PATCH v5 4/6] drm/amdkfd: Evict BO itself " Philip Yang
@ 2024-04-23 15:28 ` Philip Yang
  2024-04-23 22:06   ` Felix Kuehling
  2024-04-23 15:29 ` [PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang
  5 siblings, 1 reply; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:28 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this causes TTM failed to alloc contiguous VRAM.

Increase the KFD restore BO wait time to 2 seconds, long enough for RDMA
pin BO to alloc the contiguous VRAM.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index a81ef232fdef..c205e2d3acf9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -698,7 +698,7 @@ struct qcm_process_device {
 /* KFD Memory Eviction */
 
 /* Approx. wait time before attempting to restore evicted BOs */
-#define PROCESS_RESTORE_TIME_MS 100
+#define PROCESS_RESTORE_TIME_MS 2000
 /* Approx. back off time if restore fails due to lack of memory */
 #define PROCESS_BACK_OFF_TIME_MS 100
 /* Approx. time before evicting the process again */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time
  2024-04-23 15:28 ` [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
@ 2024-04-23 22:06   ` Felix Kuehling
  0 siblings, 0 replies; 14+ messages in thread
From: Felix Kuehling @ 2024-04-23 22:06 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig, Arunpravin.PaneerSelvam

On 2024-04-23 11:28, Philip Yang wrote:
> TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
> for larger size RDMA buffer. Because KFD restore bo worker reserves all
> KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
> this causes TTM failed to alloc contiguous VRAM.
>
> Increase the KFD restore BO wait time to 2 seconds, long enough for RDMA
> pin BO to alloc the contiguous VRAM.

Two seconds is a very long time that the GPU will be idle whenever 
memory gets evicted. Maybe we need to look for a solution where the 
restore gets scheduled in response to a fence when the migration completes.

With my most recent changes I made to the eviction fence handling, I 
think we can decouple the scheduling of the restore work from the evict 
work. So we could schedule the delayed restore worker in a fence 
callback set up in amdgpu_bo_move or somewhere around there, and keep a 
short delay that starts counting at the end of the eviction move blit.

Regards,
   Felix


>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index a81ef232fdef..c205e2d3acf9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -698,7 +698,7 @@ struct qcm_process_device {
>   /* KFD Memory Eviction */
>   
>   /* Approx. wait time before attempting to restore evicted BOs */
> -#define PROCESS_RESTORE_TIME_MS 100
> +#define PROCESS_RESTORE_TIME_MS 2000
>   /* Approx. back off time if restore fails due to lack of memory */
>   #define PROCESS_BACK_OFF_TIME_MS 100
>   /* Approx. time before evicting the process again */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation
  2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (4 preceding siblings ...)
  2024-04-23 15:28 ` [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
@ 2024-04-23 15:29 ` Philip Yang
  5 siblings, 0 replies; 14+ messages in thread
From: Philip Yang @ 2024-04-23 15:29 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam,
	Philip Yang

Bump the kfd ioctl minor version to delcare the contiguous VRAM
allocation flag support.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index c1394c162d4e..a5ebbe98ff7f 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -41,9 +41,10 @@
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
  * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl
+ * - 1.16 - Add contiguous VRAM allocation flag
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 15
+#define KFD_IOCTL_MINOR_VERSION 16
 
 struct kfd_ioctl_get_version_args {
 	__u32 major_version;	/* from KFD */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-24 13:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-23 15:28 [PATCH v5 0/6] Best effort contiguous VRAM allocation Philip Yang
2024-04-23 15:28 ` [PATCH v5 1/6] drm/amdgpu: Support " Philip Yang
2024-04-23 22:17   ` Felix Kuehling
2024-04-24 13:58     ` Philip Yang
2024-04-23 15:28 ` [PATCH v5 2/6] drm/amdgpu: Handle sg size limit for contiguous allocation Philip Yang
2024-04-24  6:14   ` Christian König
2024-04-23 15:28 ` [PATCH v5 3/6] drm/amdgpu: Evict BOs from same process " Philip Yang
2024-04-23 22:16   ` Felix Kuehling
2024-04-23 15:28 ` [PATCH v5 4/6] drm/amdkfd: Evict BO itself " Philip Yang
2024-04-23 22:15   ` Felix Kuehling
2024-04-24 13:41     ` Philip Yang
2024-04-23 15:28 ` [PATCH v5 5/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
2024-04-23 22:06   ` Felix Kuehling
2024-04-23 15:29 ` [PATCH v5 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.