[PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1
@ 2026-01-12 14:06 Donet Tom
  2026-01-12 14:06 ` [PATCH v2 1/3] drm/amdkfd: Relax size checking during queue buffer get Donet Tom
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Donet Tom @ 2026-01-12 14:06 UTC (permalink / raw)
  To: amd-gfx, Felix Kuehling, Alex Deucher, Alex Deucher,
	christian.koenig, Philip Yang
  Cc: David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya, donettom

RFC -> v2
=========

In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
required to enable minimal support for 64K pages in AMDGPU. I have added those
3 pacthes in this series.

With these three patches applied, all RCCL tests and the rocr-debug-agent tests
pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems with
more than 2 GPUs and with XNACK enabled, we require  additional Patches [4-8]
which were posted earlier as part of RFC [1]  Since that require a bit of additional
work and discussion. We will post v2 of them later as Part-2.

1. Patch 1 was updated to only relax the EOP buffer size check, based on Philip Yang’s comment.

2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags were added to
   Patch 2 and Patch 3.

[1] https://lore.kernel.org/all/cover.1765519875.git.donettom@linux.ibm.com/

If this looks good, could we pull these changes into v6.20?

This patch series addresses few issues which we encountered while running rocr
debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
system pagesize.

Note that we don't observe any of these issues while booting with 4k system
pagesize on Power. So with the 64K system pagesize what we observed so far is,
at few of the places, the conversion between gpu pfn to cpu pfn (or vice versa)
may not be done correctly (due to different page size of AMD GPU (4K)
v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
while running these tests.

Changes so far in this series:
=============================
1. For now, during kfd queue creation, this patch lifts the restriction on EOP
   buffer size to be same buffer object mapping size.

2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU page
   numbers before calling amdgpu_vm_update_range(), which expects 4K GPU pages.
   Without this the rocr-debug-agent tests and rccl unit  tests were failing.

3. Fix GART PTE allocation in migration code to account for multiple GPU pages
   per CPU page. The current code only allocates PTEs based on number of CPU
   pages, but GART may need one PTE per 4K GPU page.

Setup details:
============
System details: Power10 LPAR using 64K pagesize.
AMD GPU:
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI210

Donet Tom (3):
  drm/amdkfd: Relax size checking during queue buffer get
  drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
  drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()

 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
 3 files changed, 25 insertions(+), 12 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] drm/amdkfd: Relax size checking during queue buffer get
  2026-01-12 14:06 [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Donet Tom
@ 2026-01-12 14:06 ` Donet Tom
  2026-01-12 14:06 ` [PATCH v2 2/3] drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes Donet Tom
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Donet Tom @ 2026-01-12 14:06 UTC (permalink / raw)
  To: amd-gfx, Felix Kuehling, Alex Deucher, Alex Deucher,
	christian.koenig, Philip Yang
  Cc: David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya, donettom

HW-supported EOP buffer sizes are 4K and 32K. On systems that do not
use 4K pages, the minimum buffer object (BO) allocation size is
PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver
currently checks the allocated BO size against the supported EOP buffer
size. Since the allocated BO is larger than the expected size, this check
fails, preventing queue creation.

Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used.
Only the required 4K region of the buffer will be used as the EOP buffer
and avoids queue creation failures on non-4K page systems.

Suggested-by: Philip Yang <yangp@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 80c4fa2b0975..2822c90bd7be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -275,8 +275,8 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 
 	/* EOP buffer is not required for all ASICs */
 	if (properties->eop_ring_buffer_address) {
-		if (properties->eop_ring_buffer_size != topo_dev->node_props.eop_buffer_size) {
-			pr_debug("queue eop bo size 0x%x not equal to node eop buf size 0x%x\n",
+		if (properties->eop_ring_buffer_size < topo_dev->node_props.eop_buffer_size) {
+			pr_debug("queue eop bo size 0x%x is less than node eop buf size 0x%x\n",
 				properties->eop_ring_buffer_size,
 				topo_dev->node_props.eop_buffer_size);
 			err = -EINVAL;
@@ -284,7 +284,7 @@ int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_prope
 		}
 		err = kfd_queue_buffer_get(vm, (void *)properties->eop_ring_buffer_address,
 					   &properties->eop_buf_bo,
-					   properties->eop_ring_buffer_size);
+					   ALIGN(properties->eop_ring_buffer_size, PAGE_SIZE));
 		if (err)
 			goto out_err_unreserve;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
  2026-01-12 14:06 [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Donet Tom
  2026-01-12 14:06 ` [PATCH v2 1/3] drm/amdkfd: Relax size checking during queue buffer get Donet Tom
@ 2026-01-12 14:06 ` Donet Tom
  2026-01-12 14:06 ` [PATCH v2 3/3] drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map() Donet Tom
  2026-01-12 20:28 ` [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Felix Kuehling
  3 siblings, 0 replies; 7+ messages in thread
From: Donet Tom @ 2026-01-12 14:06 UTC (permalink / raw)
  To: amd-gfx, Felix Kuehling, Alex Deucher, Alex Deucher,
	christian.koenig, Philip Yang
  Cc: David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya, donettom,
	Philip Yang

SVM range size is tracked using the system page size. The range start and
end are aligned to system page-sized PFNs, so the total SVM range size
equals the total number of pages in the SVM range multiplied by the system
page size.

The SVM range map/unmap functions pass these system page-sized PFN numbers
to amdgpu_vm_update_range(), which expects PFNs based on the GPU page size
(4K). On non-4K page systems, this mismatch causes only part of the SVM
range to be mapped in the GPU page table, while the rest remains unmapped.
If the GPU accesses an unmapped address within the same range, it results
in a GPU page fault.

To fix this, the required conversion has been added in both
svm_range_map_to_gpu() and svm_range_unmap_from_gpu(), ensuring that all
pages in the SVM range are correctly mapped on non-4K systems.

Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 29 ++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 79ea138897fc..afaec58e1a21 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1332,11 +1332,16 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			 struct dma_fence **fence)
 {
 	uint64_t init_pte_value = 0;
+	uint64_t gpu_start, gpu_end;
 
-	pr_debug("[0x%llx 0x%llx]\n", start, last);
+	/* Convert CPU page range to GPU page range */
+	gpu_start = start * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
+	gpu_end = (last + 1) * AMDGPU_GPU_PAGES_IN_CPU_PAGE - 1;
 
-	return amdgpu_vm_update_range(adev, vm, false, true, true, false, NULL, start,
-				      last, init_pte_value, 0, 0, NULL, NULL,
+	pr_debug("CPU[0x%llx 0x%llx] -> GPU[0x%llx 0x%llx]\n", start, last,
+		gpu_start, gpu_end);
+	return amdgpu_vm_update_range(adev, vm, false, true, true, false, NULL, gpu_start,
+				      gpu_end, init_pte_value, 0, 0, NULL, NULL,
 				      fence);
 }
 
@@ -1416,6 +1421,9 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
 		 last_start, last_start + npages - 1, readonly);
 
 	for (i = offset; i < offset + npages; i++) {
+		uint64_t gpu_start;
+		uint64_t gpu_end;
+
 		last_domain = dma_addr[i] & SVM_RANGE_VRAM_DOMAIN;
 		dma_addr[i] &= ~SVM_RANGE_VRAM_DOMAIN;
 
@@ -1433,17 +1441,22 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
 		if (readonly)
 			pte_flags &= ~AMDGPU_PTE_WRITEABLE;
 
-		pr_debug("svms 0x%p map [0x%lx 0x%llx] vram %d PTE 0x%llx\n",
-			 prange->svms, last_start, prange->start + i,
-			 (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0,
-			 pte_flags);
 
 		/* For dGPU mode, we use same vm_manager to allocate VRAM for
 		 * different memory partition based on fpfn/lpfn, we should use
 		 * same vm_manager.vram_base_offset regardless memory partition.
 		 */
+		gpu_start = last_start * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
+		gpu_end = (prange->start + i + 1) * AMDGPU_GPU_PAGES_IN_CPU_PAGE - 1;
+
+		pr_debug("svms 0x%p map CPU[0x%lx 0x%llx] GPU[0x%llx 0x%llx] vram %d PTE 0x%llx\n",
+			 prange->svms, last_start, prange->start + i,
+			 gpu_start, gpu_end,
+			 (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0,
+			 pte_flags);
+
 		r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, true,
-					   NULL, last_start, prange->start + i,
+					   NULL, gpu_start, gpu_end,
 					   pte_flags,
 					   (last_start - prange->start) << PAGE_SHIFT,
 					   bo_adev ? bo_adev->vm_manager.vram_base_offset : 0,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
  2026-01-12 14:06 [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Donet Tom
  2026-01-12 14:06 ` [PATCH v2 1/3] drm/amdkfd: Relax size checking during queue buffer get Donet Tom
  2026-01-12 14:06 ` [PATCH v2 2/3] drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes Donet Tom
@ 2026-01-12 14:06 ` Donet Tom
  2026-01-12 20:28 ` [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Felix Kuehling
  3 siblings, 0 replies; 7+ messages in thread
From: Donet Tom @ 2026-01-12 14:06 UTC (permalink / raw)
  To: amd-gfx, Felix Kuehling, Alex Deucher, Alex Deucher,
	christian.koenig, Philip Yang
  Cc: David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya, donettom,
	Philip Yang

In svm_migrate_gart_map(), while migrating GART mapping, the number of
bytes copied for the GART table only accounts for CPU pages. On non-4K
systems, each CPU page can contain multiple GPU pages, and the GART
requires one 8-byte PTE per GPU page. As a result, an incorrect size was
passed to the DMA, causing only a partial update of the GART table.

Fix this function to work correctly on non-4K page-size systems by
accounting for the number of GPU pages per CPU page when calculating the
number of bytes to be copied.

Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index af53e796ea1b..fd7721e3333a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -61,7 +61,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
 	*gart_addr = adev->gmc.gart_start;
 
 	num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
-	num_bytes = npages * 8;
+	num_bytes = npages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 
 	r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
 				     AMDGPU_FENCE_OWNER_UNDEFINED,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1
  2026-01-12 14:06 [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Donet Tom
                   ` (2 preceding siblings ...)
  2026-01-12 14:06 ` [PATCH v2 3/3] drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map() Donet Tom
@ 2026-01-12 20:28 ` Felix Kuehling
  2026-01-12 20:39   ` Alex Deucher
  3 siblings, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2026-01-12 20:28 UTC (permalink / raw)
  To: Donet Tom, amd-gfx, Alex Deucher, Alex Deucher, christian.koenig,
	Philip Yang
  Cc: David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya


On 2026-01-12 09:06, Donet Tom wrote:
> RFC -> v2
> =========
>
> In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
> required to enable minimal support for 64K pages in AMDGPU. I have added those
> 3 pacthes in this series.
>
> With these three patches applied, all RCCL tests and the rocr-debug-agent tests
> pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems with
> more than 2 GPUs and with XNACK enabled, we require  additional Patches [4-8]
> which were posted earlier as part of RFC [1]  Since that require a bit of additional
> work and discussion. We will post v2 of them later as Part-2.
>
> 1. Patch 1 was updated to only relax the EOP buffer size check, based on Philip Yang’s comment.
>
> 2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags were added to
>     Patch 2 and Patch 3.
>
> [1] https://lore.kernel.org/all/cover.1765519875.git.donettom@linux.ibm.com/
>
> If this looks good, could we pull these changes into v6.20?

The series looks good to me.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>

Alex, what does it take to get this into 6.20? I guess you'll want to 
include this in a pull-request for drm-fixes ASAP?

Regards,
   Felix


>
> This patch series addresses few issues which we encountered while running rocr
> debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
> system pagesize.
>
> Note that we don't observe any of these issues while booting with 4k system
> pagesize on Power. So with the 64K system pagesize what we observed so far is,
> at few of the places, the conversion between gpu pfn to cpu pfn (or vice versa)
> may not be done correctly (due to different page size of AMD GPU (4K)
> v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
> while running these tests.
>
> Changes so far in this series:
> =============================
> 1. For now, during kfd queue creation, this patch lifts the restriction on EOP
>     buffer size to be same buffer object mapping size.
>
> 2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU page
>     numbers before calling amdgpu_vm_update_range(), which expects 4K GPU pages.
>     Without this the rocr-debug-agent tests and rccl unit  tests were failing.
>
> 3. Fix GART PTE allocation in migration code to account for multiple GPU pages
>     per CPU page. The current code only allocates PTEs based on number of CPU
>     pages, but GART may need one PTE per 4K GPU page.
>
> Setup details:
> ============
> System details: Power10 LPAR using 64K pagesize.
> AMD GPU:
>    Name:                    gfx90a
>    Marketing Name:          AMD Instinct MI210
>
> Donet Tom (3):
>    drm/amdkfd: Relax size checking during queue buffer get
>    drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
>    drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
>
>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
>   3 files changed, 25 insertions(+), 12 deletions(-)
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1
  2026-01-12 20:28 ` [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Felix Kuehling
@ 2026-01-12 20:39   ` Alex Deucher
  2026-01-13  8:10     ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2026-01-12 20:39 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Donet Tom, amd-gfx, Alex Deucher, christian.koenig, Philip Yang,
	David.YatSin, Kent.Russell, Ritesh Harjani,
	Vaidyanathan Srinivasan, Mukesh Kumar Chaurasiya

On Mon, Jan 12, 2026 at 3:28 PM Felix Kuehling <felix.kuehling@amd.com> wrote:
>
>
> On 2026-01-12 09:06, Donet Tom wrote:
> > RFC -> v2
> > =========
> >
> > In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
> > required to enable minimal support for 64K pages in AMDGPU. I have added those
> > 3 pacthes in this series.
> >
> > With these three patches applied, all RCCL tests and the rocr-debug-agent tests
> > pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems with
> > more than 2 GPUs and with XNACK enabled, we require  additional Patches [4-8]
> > which were posted earlier as part of RFC [1]  Since that require a bit of additional
> > work and discussion. We will post v2 of them later as Part-2.
> >
> > 1. Patch 1 was updated to only relax the EOP buffer size check, based on Philip Yang’s comment.
> >
> > 2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags were added to
> >     Patch 2 and Patch 3.
> >
> > [1] https://lore.kernel.org/all/cover.1765519875.git.donettom@linux.ibm.com/
> >
> > If this looks good, could we pull these changes into v6.20?
>
> The series looks good to me.
>
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
>
> Alex, what does it take to get this into 6.20? I guess you'll want to
> include this in a pull-request for drm-fixes ASAP?

Yes, if you can land it in amd-staging-drm-next ASAP, I'll include it
in this week's PR.

Alex

>
> Regards,
>    Felix
>
>
> >
> > This patch series addresses few issues which we encountered while running rocr
> > debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
> > system pagesize.
> >
> > Note that we don't observe any of these issues while booting with 4k system
> > pagesize on Power. So with the 64K system pagesize what we observed so far is,
> > at few of the places, the conversion between gpu pfn to cpu pfn (or vice versa)
> > may not be done correctly (due to different page size of AMD GPU (4K)
> > v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
> > while running these tests.
> >
> > Changes so far in this series:
> > =============================
> > 1. For now, during kfd queue creation, this patch lifts the restriction on EOP
> >     buffer size to be same buffer object mapping size.
> >
> > 2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU page
> >     numbers before calling amdgpu_vm_update_range(), which expects 4K GPU pages.
> >     Without this the rocr-debug-agent tests and rccl unit  tests were failing.
> >
> > 3. Fix GART PTE allocation in migration code to account for multiple GPU pages
> >     per CPU page. The current code only allocates PTEs based on number of CPU
> >     pages, but GART may need one PTE per 4K GPU page.
> >
> > Setup details:
> > ============
> > System details: Power10 LPAR using 64K pagesize.
> > AMD GPU:
> >    Name:                    gfx90a
> >    Marketing Name:          AMD Instinct MI210
> >
> > Donet Tom (3):
> >    drm/amdkfd: Relax size checking during queue buffer get
> >    drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
> >    drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
> >
> >   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
> >   drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
> >   3 files changed, 25 insertions(+), 12 deletions(-)
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1
  2026-01-12 20:39   ` Alex Deucher
@ 2026-01-13  8:10     ` Christian König
  0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2026-01-13  8:10 UTC (permalink / raw)
  To: Alex Deucher, Felix Kuehling
  Cc: Donet Tom, amd-gfx, Alex Deucher, Philip Yang, David.YatSin,
	Kent.Russell, Ritesh Harjani, Vaidyanathan Srinivasan,
	Mukesh Kumar Chaurasiya



On 1/12/26 21:39, Alex Deucher wrote:
> On Mon, Jan 12, 2026 at 3:28 PM Felix Kuehling <felix.kuehling@amd.com> wrote:
>>
>>
>> On 2026-01-12 09:06, Donet Tom wrote:
>>> RFC -> v2
>>> =========
>>>
>>> In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
>>> required to enable minimal support for 64K pages in AMDGPU. I have added those
>>> 3 pacthes in this series.
>>>
>>> With these three patches applied, all RCCL tests and the rocr-debug-agent tests
>>> pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems with
>>> more than 2 GPUs and with XNACK enabled, we require  additional Patches [4-8]
>>> which were posted earlier as part of RFC [1]  Since that require a bit of additional
>>> work and discussion. We will post v2 of them later as Part-2.
>>>
>>> 1. Patch 1 was updated to only relax the EOP buffer size check, based on Philip Yang’s comment.
>>>
>>> 2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags were added to
>>>     Patch 2 and Patch 3.
>>>
>>> [1] https://lore.kernel.org/all/cover.1765519875.git.donettom@linux.ibm.com/
>>>
>>> If this looks good, could we pull these changes into v6.20?
>>
>> The series looks good to me.
>>
>> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
>>
>> Alex, what does it take to get this into 6.20? I guess you'll want to
>> include this in a pull-request for drm-fixes ASAP?
> 
> Yes, if you can land it in amd-staging-drm-next ASAP, I'll include it
> in this week's PR.

If possible feel free to add an Acked-by: Christian König <christian.koenig@amd.com>.

I will try to work with Pierre-Eric to get the DMA window patches upstream so that it is possible to base the rest of the work on top of that.

Regards,
Christian.

> 
> Alex
> 
>>
>> Regards,
>>    Felix
>>
>>
>>>
>>> This patch series addresses few issues which we encountered while running rocr
>>> debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
>>> system pagesize.
>>>
>>> Note that we don't observe any of these issues while booting with 4k system
>>> pagesize on Power. So with the 64K system pagesize what we observed so far is,
>>> at few of the places, the conversion between gpu pfn to cpu pfn (or vice versa)
>>> may not be done correctly (due to different page size of AMD GPU (4K)
>>> v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
>>> while running these tests.
>>>
>>> Changes so far in this series:
>>> =============================
>>> 1. For now, during kfd queue creation, this patch lifts the restriction on EOP
>>>     buffer size to be same buffer object mapping size.
>>>
>>> 2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU page
>>>     numbers before calling amdgpu_vm_update_range(), which expects 4K GPU pages.
>>>     Without this the rocr-debug-agent tests and rccl unit  tests were failing.
>>>
>>> 3. Fix GART PTE allocation in migration code to account for multiple GPU pages
>>>     per CPU page. The current code only allocates PTEs based on number of CPU
>>>     pages, but GART may need one PTE per 4K GPU page.
>>>
>>> Setup details:
>>> ============
>>> System details: Power10 LPAR using 64K pagesize.
>>> AMD GPU:
>>>    Name:                    gfx90a
>>>    Marketing Name:          AMD Instinct MI210
>>>
>>> Donet Tom (3):
>>>    drm/amdkfd: Relax size checking during queue buffer get
>>>    drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
>>>    drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
>>>
>>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
>>>   3 files changed, 25 insertions(+), 12 deletions(-)
>>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-13  8:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 14:06 [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Donet Tom
2026-01-12 14:06 ` [PATCH v2 1/3] drm/amdkfd: Relax size checking during queue buffer get Donet Tom
2026-01-12 14:06 ` [PATCH v2 2/3] drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes Donet Tom
2026-01-12 14:06 ` [PATCH v2 3/3] drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map() Donet Tom
2026-01-12 20:28 ` [PATCH v2 0/3] drm/amdkfd: Add support for non-4K page size systems - part1 Felix Kuehling
2026-01-12 20:39   ` Alex Deucher
2026-01-13  8:10     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox