* [PATCH 1/7] drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 10:58 ` Christian König
2026-04-16 20:26 ` [PATCH 2/7] drm/amdgpu/vce: Align VCPU BO to a power of two address Timur Kristóf
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
When the GART placement is set to AMDGPU_GART_PLACEMENT_LOW:
Make sure that GART does not overlap with VRAM when
VRAM is configured to be in the low address space.
Solve this according to the following logic:
- When GART fits before VRAM, use zero address for GART
- Otherwise, put GART after the end of VRAM, aligned to 4 GiB
Previously, I had assumed this was not possible
so it was OK to not handle it, but now we got a report
from a user who has a board that is configured this way.
Fixes: 917f91d8d8e8 ("drm/amdgpu/gmc: add a way to force a particular placement for GART")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 1daf2546d3b26..b454b463bcb2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -314,7 +314,10 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,
mc->gart_start = max_mc_address - mc->gart_size + 1;
break;
case AMDGPU_GART_PLACEMENT_LOW:
- mc->gart_start = 0;
+ if (size_bf >= mc->gart_size)
+ mc->gart_start = 0;
+ else
+ mc->gart_start = ALIGN(mc->fb_end, four_gb);
break;
case AMDGPU_GART_PLACEMENT_BEST_FIT:
default:
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 1/7] drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM
2026-04-16 20:26 ` [PATCH 1/7] drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM Timur Kristóf
@ 2026-04-17 10:58 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2026-04-17 10:58 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> When the GART placement is set to AMDGPU_GART_PLACEMENT_LOW:
> Make sure that GART does not overlap with VRAM when
> VRAM is configured to be in the low address space.
>
> Solve this according to the following logic:
> - When GART fits before VRAM, use zero address for GART
> - Otherwise, put GART after the end of VRAM, aligned to 4 GiB
>
> Previously, I had assumed this was not possible
> so it was OK to not handle it, but now we got a report
> from a user who has a board that is configured this way.
>
> Fixes: 917f91d8d8e8 ("drm/amdgpu/gmc: add a way to force a particular placement for GART")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 1daf2546d3b26..b454b463bcb2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -314,7 +314,10 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,
> mc->gart_start = max_mc_address - mc->gart_size + 1;
> break;
> case AMDGPU_GART_PLACEMENT_LOW:
> - mc->gart_start = 0;
> + if (size_bf >= mc->gart_size)
> + mc->gart_start = 0;
> + else
> + mc->gart_start = ALIGN(mc->fb_end, four_gb);
> break;
> case AMDGPU_GART_PLACEMENT_BEST_FIT:
> default:
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/7] drm/amdgpu/vce: Align VCPU BO to a power of two address
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
2026-04-16 20:26 ` [PATCH 1/7] drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 7:08 ` Christian König
2026-04-16 20:26 ` [PATCH 3/7] drm/amdgpu: Add alignment to amdgpu_gtt_mgr_alloc_entries() Timur Kristóf
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
This is done to prevent the VCPU BO from crossing
a 256 MiB boundary.
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index efdebd9c0a1f3..ac25355539cb2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -218,7 +218,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
if (!adev->vce.fw)
return -ENOENT;
- r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
+ r = amdgpu_bo_create_kernel(adev, size,
+ roundup_pow_of_two(ALIGN(size, PAGE_SIZE)),
AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT,
&adev->vce.vcpu_bo,
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2/7] drm/amdgpu/vce: Align VCPU BO to a power of two address
2026-04-16 20:26 ` [PATCH 2/7] drm/amdgpu/vce: Align VCPU BO to a power of two address Timur Kristóf
@ 2026-04-17 7:08 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2026-04-17 7:08 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> This is done to prevent the VCPU BO from crossing
> a 256 MiB boundary.
UVD was the one with the 256MiB segments, VCE had an issue with 4GiB segments.
>
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index efdebd9c0a1f3..ac25355539cb2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -218,7 +218,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
> if (!adev->vce.fw)
> return -ENOENT;
>
> - r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> + r = amdgpu_bo_create_kernel(adev, size,
> + roundup_pow_of_two(ALIGN(size, PAGE_SIZE)),
> AMDGPU_GEM_DOMAIN_VRAM |
> AMDGPU_GEM_DOMAIN_GTT,
Oh that is a really interesting find.
For kernel BOs the alignment of VRAM allocations is always the power of two of the size because of the backend allocator.
So that only matters for GTT allocation, but GTT allocated FW should only be used on VCE 4 and there the segments doesn't matter either.
We should probably adjust that so that VCE version < 4 always only used VRAM.
And by the way PAGE_SIZE alignment is nonsense as well, that parameter should either be 0 or AMDGPU_PAGE_SIZE.
Regards,
Christian.
> &adev->vce.vcpu_bo,
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/7] drm/amdgpu: Add alignment to amdgpu_gtt_mgr_alloc_entries()
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
2026-04-16 20:26 ` [PATCH 1/7] drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM Timur Kristóf
2026-04-16 20:26 ` [PATCH 2/7] drm/amdgpu/vce: Align VCPU BO to a power of two address Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-16 20:26 ` [PATCH 4/7] drm/amdgpu/vce1: Fix workaround to ensure low 32-bit VCPU address Timur Kristóf
` (3 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
Add an argument to amdgpu_gtt_mgr_alloc_entries() so that
the caller can specify an alignment.
This is a pre-requisite for fixing the workaround for
ensuring that the VCE1 VCPU BO has a low 32-bit address.
Fixes: 66a80158aa2a ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +-
drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 2 +-
4 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 9b0bcf6aca445..4fea81479264f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -188,6 +188,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
* @mgr: The GTT manager object
* @mm_node: The drm mm node to return the new allocation node information
* @num_pages: The number of pages for the new allocation
+ * @alignment: Alignment of the allocation (in pages)
* @mode: The new allocation mode
*
* Helper to dynamic alloc GART entries to map memory not accociated with
@@ -195,7 +196,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
*/
int amdgpu_gtt_mgr_alloc_entries(struct amdgpu_gtt_mgr *mgr,
struct drm_mm_node *mm_node,
- u64 num_pages,
+ u64 num_pages, u64 alignment,
enum drm_mm_insert_mode mode)
{
struct amdgpu_device *adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -203,7 +204,7 @@ int amdgpu_gtt_mgr_alloc_entries(struct amdgpu_gtt_mgr *mgr,
spin_lock(&mgr->lock);
r = drm_mm_insert_node_in_range(&mgr->mm, mm_node, num_pages,
- 0, GART_ENTRY_WITHOUT_BO_COLOR, 0,
+ alignment, GART_ENTRY_WITHOUT_BO_COLOR, 0,
adev->gmc.gart_size >> PAGE_SHIFT,
mode);
spin_unlock(&mgr->lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b69e29e7cfc9b..de85bbb0a1efc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2029,7 +2029,7 @@ static int amdgpu_ttm_buffer_entity_init(struct amdgpu_gtt_mgr *mgr,
return 0;
num_pages = num_gart_windows * AMDGPU_GTT_MAX_TRANSFER_SIZE;
- r = amdgpu_gtt_mgr_alloc_entries(mgr, &entity->gart_node, num_pages,
+ r = amdgpu_gtt_mgr_alloc_entries(mgr, &entity->gart_node, num_pages, 0,
DRM_MM_INSERT_BEST);
if (r) {
drm_sched_entity_destroy(&entity->base);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index f2f23a42b3cc4..f3b214502c1c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -147,7 +147,7 @@ void amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr);
int amdgpu_gtt_mgr_alloc_entries(struct amdgpu_gtt_mgr *mgr,
struct drm_mm_node *mm_node,
- u64 num_pages,
+ u64 num_pages, u64 alignment,
enum drm_mm_insert_mode mode);
void amdgpu_gtt_mgr_free_entries(struct amdgpu_gtt_mgr *mgr,
struct drm_mm_node *mm_node);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index 5b7b46d242c6d..2fe931366985a 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -539,7 +539,7 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
int r;
r = amdgpu_gtt_mgr_alloc_entries(&adev->mman.gtt_mgr,
- &adev->vce.gart_node, num_pages,
+ &adev->vce.gart_node, num_pages, 0,
DRM_MM_INSERT_LOW);
if (r)
return r;
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 4/7] drm/amdgpu/vce1: Fix workaround to ensure low 32-bit VCPU address
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
` (2 preceding siblings ...)
2026-04-16 20:26 ` [PATCH 3/7] drm/amdgpu: Add alignment to amdgpu_gtt_mgr_alloc_entries() Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 7:21 ` Christian König
2026-04-16 20:26 ` [PATCH 5/7] drm/amdgpu/uvd3.1: Don't validate the firmware when already validated Timur Kristóf
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
Fix a few issues, some of which were inadvertently
exposed by starting to use amdgpu_gtt_mgr_alloc_entries()
for the VCE1 workaround:
1. When the VCPU BO is already located in a low 32-bit address
in VRAM (eg. when VRAM is mapped to the low address space),
don't do the workaround.
Previously, I had assumed this was not possible
so it was OK to not handle it, but now we got a report
from a user who has a board that is configured this way.
2. Only allocate entries from the GTT manager when the
VCE GTT node is not allocated yet. This prevents the
possibility of allocating them multiple times, which
causes issues during GPU reset and suspend/resume.
3. Align the GTT address of the VCPU BO to a power-of-two,
ensuring that it doesn't cross a 256 MiB boundary.
4. Remove a useless check at the end of the function,
which is superfluous because the same thing is already
checked above.
5. Change maximum address limit to 0x7fffffff in order to
reflect how vce_v1_0_mc_resume() works.
Fixes: 66a80158aa2a ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index 2fe931366985a..55ea6765c03b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -531,18 +531,29 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
{
u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
- u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
+ u64 vcpu_gart_alignment = roundup_pow_of_two(ALIGN(bo_size, PAGE_SIZE));
+ u64 max_vcpu_bo_addr = 0x7fffffff - vcpu_gart_alignment;
u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
u64 vce_gart_start_offs;
int r;
- r = amdgpu_gtt_mgr_alloc_entries(&adev->mman.gtt_mgr,
- &adev->vce.gart_node, num_pages, 0,
- DRM_MM_INSERT_LOW);
- if (r)
- return r;
+ /*
+ * Check if the VCPU BO already has a 32-bit address in VRAM.
+ * Eg. if MC is configured to put VRAM in the low address range.
+ */
+ if (amdgpu_bo_gpu_offset(adev->vce.vcpu_bo) <= max_vcpu_bo_addr)
+ return 0;
+
+ if (!drm_mm_node_allocated(&adev->vce.gart_node)) {
+ r = amdgpu_gtt_mgr_alloc_entries(&adev->mman.gtt_mgr,
+ &adev->vce.gart_node, num_pages,
+ vcpu_gart_alignment / PAGE_SIZE,
+ DRM_MM_INSERT_LOW);
+ if (r)
+ return r;
+ }
vce_gart_start_offs = amdgpu_gtt_node_to_byte_offset(&adev->vce.gart_node);
@@ -553,8 +564,6 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
amdgpu_gart_map_vram_range(adev, pa, adev->vce.gart_node.start,
num_pages, flags, adev->gart.ptr);
adev->vce.gpu_addr = adev->gmc.gart_start + vce_gart_start_offs;
- if (adev->vce.gpu_addr > max_vcpu_bo_addr)
- return -EINVAL;
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 4/7] drm/amdgpu/vce1: Fix workaround to ensure low 32-bit VCPU address
2026-04-16 20:26 ` [PATCH 4/7] drm/amdgpu/vce1: Fix workaround to ensure low 32-bit VCPU address Timur Kristóf
@ 2026-04-17 7:21 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2026-04-17 7:21 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> Fix a few issues, some of which were inadvertently
> exposed by starting to use amdgpu_gtt_mgr_alloc_entries()
> for the VCE1 workaround:
>
> 1. When the VCPU BO is already located in a low 32-bit address
> in VRAM (eg. when VRAM is mapped to the low address space),
> don't do the workaround.
> Previously, I had assumed this was not possible
> so it was OK to not handle it, but now we got a report
> from a user who has a board that is configured this way.
>
> 2. Only allocate entries from the GTT manager when the
> VCE GTT node is not allocated yet. This prevents the
> possibility of allocating them multiple times, which
> causes issues during GPU reset and suspend/resume.
>
> 3. Align the GTT address of the VCPU BO to a power-of-two,
> ensuring that it doesn't cross a 256 MiB boundary.
Mhm, where do you got that requirement for the 256MiB boundary from?
That is an UVE thing, but for VCE it should be irrelevant.
Regards,
Christian.
>
> 4. Remove a useless check at the end of the function,
> which is superfluous because the same thing is already
> checked above.
>
> 5. Change maximum address limit to 0x7fffffff in order to
> reflect how vce_v1_0_mc_resume() works.
>
> Fixes: 66a80158aa2a ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
> drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> index 2fe931366985a..55ea6765c03b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -531,18 +531,29 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
> static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
> {
> u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
> - u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
> + u64 vcpu_gart_alignment = roundup_pow_of_two(ALIGN(bo_size, PAGE_SIZE));
> + u64 max_vcpu_bo_addr = 0x7fffffff - vcpu_gart_alignment;
> u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
> u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
> u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
> u64 vce_gart_start_offs;
> int r;
>
> - r = amdgpu_gtt_mgr_alloc_entries(&adev->mman.gtt_mgr,
> - &adev->vce.gart_node, num_pages, 0,
> - DRM_MM_INSERT_LOW);
> - if (r)
> - return r;
> + /*
> + * Check if the VCPU BO already has a 32-bit address in VRAM.
> + * Eg. if MC is configured to put VRAM in the low address range.
> + */
> + if (amdgpu_bo_gpu_offset(adev->vce.vcpu_bo) <= max_vcpu_bo_addr)
> + return 0;
> +
> + if (!drm_mm_node_allocated(&adev->vce.gart_node)) {
> + r = amdgpu_gtt_mgr_alloc_entries(&adev->mman.gtt_mgr,
> + &adev->vce.gart_node, num_pages,
> + vcpu_gart_alignment / PAGE_SIZE,
> + DRM_MM_INSERT_LOW);
> + if (r)
> + return r;
> + }
>
> vce_gart_start_offs = amdgpu_gtt_node_to_byte_offset(&adev->vce.gart_node);
>
> @@ -553,8 +564,6 @@ static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
> amdgpu_gart_map_vram_range(adev, pa, adev->vce.gart_node.start,
> num_pages, flags, adev->gart.ptr);
> adev->vce.gpu_addr = adev->gmc.gart_start + vce_gart_start_offs;
> - if (adev->vce.gpu_addr > max_vcpu_bo_addr)
> - return -EINVAL;
>
> return 0;
> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 5/7] drm/amdgpu/uvd3.1: Don't validate the firmware when already validated
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
` (3 preceding siblings ...)
2026-04-16 20:26 ` [PATCH 4/7] drm/amdgpu/vce1: Fix workaround to ensure low 32-bit VCPU address Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 7:22 ` Christian König
2026-04-16 20:26 ` [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary Timur Kristóf
2026-04-16 20:26 ` [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs Timur Kristóf
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
UVD 3.1 firmware validation seems to always fail after
attempting it when it had already been validated.
(This works similarly with the VCE 1.0 as well.)
Don't attempt repeating the validation when it's already done.
This caused issues in situations when the system isn't able
to suspend the GPU properly and so the GPU isn't actually
powered down. Then amdgpu would fail when calling the IP
block resume function.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2887
Fixes: bb7978111dd3 ("drm/amdgpu: fix SI UVD firmware validate resume fail")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
index fea576a7f397f..efb3fde919ee3 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
@@ -242,6 +242,10 @@ static void uvd_v3_1_mc_resume(struct amdgpu_device *adev)
uint64_t addr;
uint32_t size;
+ /* When the keyselect is already set, don't perturb it. */
+ if (RREG32(mmUVD_FW_START))
+ return;
+
/* program the VCPU memory controller bits 0-27 */
addr = (adev->uvd.inst->gpu_addr + AMDGPU_UVD_FIRMWARE_OFFSET) >> 3;
size = AMDGPU_UVD_FIRMWARE_SIZE(adev) >> 3;
@@ -284,6 +288,12 @@ static int uvd_v3_1_fw_validate(struct amdgpu_device *adev)
int i;
uint32_t keysel = adev->uvd.keyselect;
+ if (RREG32(mmUVD_FW_START) & UVD_FW_STATUS__PASS_MASK) {
+ dev_dbg(adev->dev, "UVD keyselect already set: 0x%x (on CPU: 0x%x)\n",
+ RREG32(mmUVD_FW_START), adev->uvd.keyselect);
+ return 0;
+ }
+
WREG32(mmUVD_FW_START, keysel);
for (i = 0; i < 10; ++i) {
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 5/7] drm/amdgpu/uvd3.1: Don't validate the firmware when already validated
2026-04-16 20:26 ` [PATCH 5/7] drm/amdgpu/uvd3.1: Don't validate the firmware when already validated Timur Kristóf
@ 2026-04-17 7:22 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2026-04-17 7:22 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> UVD 3.1 firmware validation seems to always fail after
> attempting it when it had already been validated.
> (This works similarly with the VCE 1.0 as well.)
>
> Don't attempt repeating the validation when it's already done.
>
> This caused issues in situations when the system isn't able
> to suspend the GPU properly and so the GPU isn't actually
> powered down. Then amdgpu would fail when calling the IP
> block resume function.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2887
> Fixes: bb7978111dd3 ("drm/amdgpu: fix SI UVD firmware validate resume fail")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> index fea576a7f397f..efb3fde919ee3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c
> @@ -242,6 +242,10 @@ static void uvd_v3_1_mc_resume(struct amdgpu_device *adev)
> uint64_t addr;
> uint32_t size;
>
> + /* When the keyselect is already set, don't perturb it. */
> + if (RREG32(mmUVD_FW_START))
> + return;
> +
> /* program the VCPU memory controller bits 0-27 */
> addr = (adev->uvd.inst->gpu_addr + AMDGPU_UVD_FIRMWARE_OFFSET) >> 3;
> size = AMDGPU_UVD_FIRMWARE_SIZE(adev) >> 3;
> @@ -284,6 +288,12 @@ static int uvd_v3_1_fw_validate(struct amdgpu_device *adev)
> int i;
> uint32_t keysel = adev->uvd.keyselect;
>
> + if (RREG32(mmUVD_FW_START) & UVD_FW_STATUS__PASS_MASK) {
> + dev_dbg(adev->dev, "UVD keyselect already set: 0x%x (on CPU: 0x%x)\n",
> + RREG32(mmUVD_FW_START), adev->uvd.keyselect);
> + return 0;
> + }
> +
> WREG32(mmUVD_FW_START, keysel);
>
> for (i = 0; i < 10; ++i) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
` (4 preceding siblings ...)
2026-04-16 20:26 ` [PATCH 5/7] drm/amdgpu/uvd3.1: Don't validate the firmware when already validated Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 7:24 ` Christian König
2026-04-16 20:26 ` [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs Timur Kristóf
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
These are the L2 and L1 cache on some AMD GPU architectures.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
Documentation/gpu/amdgpu/amdgpu-glossary.rst | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
index 033167025fcca..d553dd599c966 100644
--- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
+++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
@@ -233,8 +233,15 @@ we have a dedicated glossary for Display Core at
TC
Texture Cache
+ TCC
+ Texture Cache per Channel - L2 cache attached to the memory channels.
+ May be used when shader cores are accessing memory.
+ Despite "Texture" in the name, this is used by any kind of memory access.
+ TCCs may be mapped to TCPs, depending on the architecture.
+
TCP (AMDGPU)
- Texture Cache per Pipe. Even though the name "Texture" is part of this
+ Texture Cache per Pipe - L1 cache attached to each CU.
+ Even though the name "Texture" is part of this
acronym, the TCP represents the path to memory shaders; i.e., it is not
related to texture. The name is a leftover from older designs where shader
stages had different cache designs; it refers to the L1 cache in older
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary
2026-04-16 20:26 ` [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary Timur Kristóf
@ 2026-04-17 7:24 ` Christian König
2026-04-17 8:36 ` Timur Kristóf
0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2026-04-17 7:24 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> These are the L2 and L1 cache on some AMD GPU architectures.
>
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
> Documentation/gpu/amdgpu/amdgpu-glossary.rst | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> index 033167025fcca..d553dd599c966 100644
> --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> @@ -233,8 +233,15 @@ we have a dedicated glossary for Display Core at
> TC
> Texture Cache
>
> + TCC
> + Texture Cache per Channel - L2 cache attached to the memory channels.
> + May be used when shader cores are accessing memory.
> + Despite "Texture" in the name, this is used by any kind of memory access.
> + TCCs may be mapped to TCPs, depending on the architecture.
> +
Good to have, but maybe put that below TCP. E.g. L1 first and then L2.
Apart from that nit Reviewed-by: Christian König <christian.koenig@amd.com>
Regards,
Christian.
> TCP (AMDGPU)
> - Texture Cache per Pipe. Even though the name "Texture" is part of this
> + Texture Cache per Pipe - L1 cache attached to each CU.
> + Even though the name "Texture" is part of this
> acronym, the TCP represents the path to memory shaders; i.e., it is not
> related to texture. The name is a leftover from older designs where shader
> stages had different cache designs; it refers to the L1 cache in older
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary
2026-04-17 7:24 ` Christian König
@ 2026-04-17 8:36 ` Timur Kristóf
2026-04-17 9:03 ` Christian König
0 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-17 8:36 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, Christian König
On Friday, April 17, 2026 9:24:55 AM Central European Summer Time Christian
König wrote:
> On 4/16/26 22:26, Timur Kristóf wrote:
> > These are the L2 and L1 cache on some AMD GPU architectures.
> >
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >
> > Documentation/gpu/amdgpu/amdgpu-glossary.rst | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > b/Documentation/gpu/amdgpu/amdgpu-glossary.rst index
> > 033167025fcca..d553dd599c966 100644
> > --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > @@ -233,8 +233,15 @@ we have a dedicated glossary for Display Core at
> >
> > TC
> >
> > Texture Cache
> >
> > + TCC
> > + Texture Cache per Channel - L2 cache attached to the memory
> > channels. + May be used when shader cores are accessing memory.
> > + Despite "Texture" in the name, this is used by any kind of memory
> > access. + TCCs may be mapped to TCPs, depending on the architecture.
> > +
>
> Good to have, but maybe put that below TCP. E.g. L1 first and then L2.
I prefer to keep the alphabetical order for consistency with the rest of the
glossary.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary
2026-04-17 8:36 ` Timur Kristóf
@ 2026-04-17 9:03 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2026-04-17 9:03 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/17/26 10:36, Timur Kristóf wrote:
> On Friday, April 17, 2026 9:24:55 AM Central European Summer Time Christian
> König wrote:
>> On 4/16/26 22:26, Timur Kristóf wrote:
>>> These are the L2 and L1 cache on some AMD GPU architectures.
>>>
>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>> ---
>>>
>>> Documentation/gpu/amdgpu/amdgpu-glossary.rst | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
>>> b/Documentation/gpu/amdgpu/amdgpu-glossary.rst index
>>> 033167025fcca..d553dd599c966 100644
>>> --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
>>> +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
>>> @@ -233,8 +233,15 @@ we have a dedicated glossary for Display Core at
>>>
>>> TC
>>>
>>> Texture Cache
>>>
>>> + TCC
>>> + Texture Cache per Channel - L2 cache attached to the memory
>>> channels. + May be used when shader cores are accessing memory.
>>> + Despite "Texture" in the name, this is used by any kind of memory
>>> access. + TCCs may be mapped to TCPs, depending on the architecture.
>>> +
>>
>> Good to have, but maybe put that below TCP. E.g. L1 first and then L2.
>
> I prefer to keep the alphabetical order for consistency with the rest of the
> glossary.
Good argument as well, feel free to add my rb to the patch as it is.
Regards,
Christian
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs
2026-04-16 20:26 [PATCH 0/7] Various SI fixes, fix Radeon HD 7870 XT Timur Kristóf
` (5 preceding siblings ...)
2026-04-16 20:26 ` [PATCH 6/7] Documentation/gpu: Add TCC, update TCP in amdgpu glossary Timur Kristóf
@ 2026-04-16 20:26 ` Timur Kristóf
2026-04-17 12:56 ` Christian König
6 siblings, 1 reply; 18+ messages in thread
From: Timur Kristóf @ 2026-04-16 20:26 UTC (permalink / raw)
To: amd-gfx, alexander.deucher, christian.koenig; +Cc: Timur Kristóf
This commit fixes amdgpu to work on the Radeon HD 7870 XT
which has never worked with the Linux open source drivers before.
Some boards have "harvested" chips, meaning that some parts of
the chip are disabled and fused, and it's sold for cheaper and
under a different marketing name.
On a harvested chip, any of the following can be disabled:
- CUs (Compute Units)
- RBs (Render Backend, aka. ROP)
- Memory channels (ie. the chip has a lower bandwidth)
- TCCs (ie. less L2 cache)
Handle chips with harvested TCCs by patching the registers
that configure how TCCs are mapped.
If some TCCs are disabled, we need to make sure that
the disabled TCCs are not used, and the remaining TCCs
are used optimally.
TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
Note that the TCC configuration is highly relevant to performance.
Suboptimal configuration (eg. CHAN_STEER=0) can significantly
reduce gaming performance.
For optimal performance:
- Rely on the CHAN_STEER from the golden registers table,
only skip disabled TCCs but keep the mapping order.
- Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
which performs better than using the same TCC twice.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=60879
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2664
Fixes: 2cd46ad22383 ("drm/amdgpu: add graphic pipeline implementation for si v8")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 63 +++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 73223d97a87f5..baddb3aa7fa3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1571,6 +1571,68 @@ static void gfx_v6_0_setup_spi(struct amdgpu_device *adev)
mutex_unlock(&adev->grbm_idx_mutex);
}
+/**
+ * gfx_v6_0_setup_tcc() - setup which TCCs are used
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Verify whether the current GPU has any TCCs disabled,
+ * which can happen when the GPU is harvested and some
+ * memory channels are disabled, reducing the memory bus width.
+ * For example, on the Radeon HD 7870 XT (Tahiti LE).
+ *
+ * If some TCCs are disabled, we need to make sure that
+ * the disabled TCCs are not used, and the remaining TCCs
+ * are used optimally.
+ *
+ * TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
+ * TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
+ *
+ * For optimal performance:
+ * - Rely on the CHAN_STEER from the golden registers table,
+ * only skip disabled TCCs but keep the mapping order.
+ * - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
+ * which performs better than using the same TCC twice.
+ */
+static void gfx_v6_0_setup_tcc(struct amdgpu_device *adev)
+{
+ u32 i, tcc, tcp_addr_config, num_active_tcc = 0;
+ u64 chan_steer, patched_chan_steer = 0;
+ const u32 num_max_tcc = adev->gfx.config.max_texture_channel_caches;
+ const u32 dis_tcc_mask = amdgpu_gfx_create_bitmask(num_max_tcc) &
+ REG_GET_FIELD(RREG32(mmCGTS_TCC_DISABLE),
+ CGTS_TCC_DISABLE, TCC_DISABLE);
+
+ /* When no TCC is disabled, the golden registers table already has optimal TCC setup */
+ if (!dis_tcc_mask)
+ return;
+
+ /* Each 4-bit nibble contains the index of a TCC used by all TCPs */
+ chan_steer = RREG32(mmTCP_CHAN_STEER_LO) | ((u64)RREG32(mmTCP_CHAN_STEER_HI) << 32ull);
+
+ /* Patch the TCP to TCC mapping to skip disabled TCCs */
+ for (i = 0; i < num_max_tcc; ++i) {
+ tcc = (chan_steer >> (u64)(4 * i)) & 0xf;
+
+ if (!((1 << tcc) & dis_tcc_mask)) {
+ /* Copy enabled TCC indices to the patched register value. */
+ patched_chan_steer |= (u64)tcc << (u64)(4 * num_active_tcc);
+ ++num_active_tcc;
+ }
+ }
+
+ WARN_ON(num_active_tcc != num_max_tcc - hweight32(dis_tcc_mask));
+
+ /* Patch number of TCCs used by TCPs */
+ tcp_addr_config = REG_SET_FIELD(RREG32(mmTCP_ADDR_CONFIG),
+ TCP_ADDR_CONFIG, NUM_TCC_BANKS,
+ num_active_tcc - 1);
+
+ WREG32(mmTCP_ADDR_CONFIG, tcp_addr_config);
+ WREG32(mmTCP_CHAN_STEER_HI, upper_32_bits(patched_chan_steer));
+ WREG32(mmTCP_CHAN_STEER_LO, lower_32_bits(patched_chan_steer));
+}
+
static void gfx_v6_0_config_init(struct amdgpu_device *adev)
{
adev->gfx.config.double_offchip_lds_buf = 0;
@@ -1729,6 +1791,7 @@ static void gfx_v6_0_constants_init(struct amdgpu_device *adev)
gfx_v6_0_tiling_mode_table_init(adev);
gfx_v6_0_setup_rb(adev);
+ gfx_v6_0_setup_tcc(adev);
gfx_v6_0_setup_spi(adev);
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs
2026-04-16 20:26 ` [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs Timur Kristóf
@ 2026-04-17 12:56 ` Christian König
2026-04-17 13:36 ` Alex Deucher
0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2026-04-17 12:56 UTC (permalink / raw)
To: Timur Kristóf, amd-gfx, alexander.deucher
On 4/16/26 22:26, Timur Kristóf wrote:
> This commit fixes amdgpu to work on the Radeon HD 7870 XT
> which has never worked with the Linux open source drivers before.
>
> Some boards have "harvested" chips, meaning that some parts of
> the chip are disabled and fused, and it's sold for cheaper and
> under a different marketing name.
> On a harvested chip, any of the following can be disabled:
> - CUs (Compute Units)
> - RBs (Render Backend, aka. ROP)
> - Memory channels (ie. the chip has a lower bandwidth)
> - TCCs (ie. less L2 cache)
>
> Handle chips with harvested TCCs by patching the registers
> that configure how TCCs are mapped.
>
> If some TCCs are disabled, we need to make sure that
> the disabled TCCs are not used, and the remaining TCCs
> are used optimally.
>
> TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
> TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
>
> Note that the TCC configuration is highly relevant to performance.
> Suboptimal configuration (eg. CHAN_STEER=0) can significantly
> reduce gaming performance.
>
> For optimal performance:
> - Rely on the CHAN_STEER from the golden registers table,
> only skip disabled TCCs but keep the mapping order.
> - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
> which performs better than using the same TCC twice.
>
> Link: https://bugs.freedesktop.org/show_bug.cgi?id=60879
> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2664
> Fixes: 2cd46ad22383 ("drm/amdgpu: add graphic pipeline implementation for si v8")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 63 +++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> index 73223d97a87f5..baddb3aa7fa3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> @@ -1571,6 +1571,68 @@ static void gfx_v6_0_setup_spi(struct amdgpu_device *adev)
> mutex_unlock(&adev->grbm_idx_mutex);
> }
>
> +/**
> + * gfx_v6_0_setup_tcc() - setup which TCCs are used
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Verify whether the current GPU has any TCCs disabled,
> + * which can happen when the GPU is harvested and some
> + * memory channels are disabled, reducing the memory bus width.
> + * For example, on the Radeon HD 7870 XT (Tahiti LE).
> + *
> + * If some TCCs are disabled, we need to make sure that
> + * the disabled TCCs are not used, and the remaining TCCs
> + * are used optimally.
> + *
> + * TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
> + * TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
> + *
> + * For optimal performance:
> + * - Rely on the CHAN_STEER from the golden registers table,
> + * only skip disabled TCCs but keep the mapping order.
> + * - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
> + * which performs better than using the same TCC twice.
> + */
> +static void gfx_v6_0_setup_tcc(struct amdgpu_device *adev)
> +{
> + u32 i, tcc, tcp_addr_config, num_active_tcc = 0;
> + u64 chan_steer, patched_chan_steer = 0;
> + const u32 num_max_tcc = adev->gfx.config.max_texture_channel_caches;
> + const u32 dis_tcc_mask = amdgpu_gfx_create_bitmask(num_max_tcc) &
> + REG_GET_FIELD(RREG32(mmCGTS_TCC_DISABLE),
> + CGTS_TCC_DISABLE, TCC_DISABLE);
> +
> + /* When no TCC is disabled, the golden registers table already has optimal TCC setup */
> + if (!dis_tcc_mask)
> + return;
> +
> + /* Each 4-bit nibble contains the index of a TCC used by all TCPs */
> + chan_steer = RREG32(mmTCP_CHAN_STEER_LO) | ((u64)RREG32(mmTCP_CHAN_STEER_HI) << 32ull);
> +
> + /* Patch the TCP to TCC mapping to skip disabled TCCs */
> + for (i = 0; i < num_max_tcc; ++i) {
> + tcc = (chan_steer >> (u64)(4 * i)) & 0xf;
> +
> + if (!((1 << tcc) & dis_tcc_mask)) {
> + /* Copy enabled TCC indices to the patched register value. */
> + patched_chan_steer |= (u64)tcc << (u64)(4 * num_active_tcc);
> + ++num_active_tcc;
> + }
> + }
> +
> + WARN_ON(num_active_tcc != num_max_tcc - hweight32(dis_tcc_mask));
> +
> + /* Patch number of TCCs used by TCPs */
> + tcp_addr_config = REG_SET_FIELD(RREG32(mmTCP_ADDR_CONFIG),
> + TCP_ADDR_CONFIG, NUM_TCC_BANKS,
> + num_active_tcc - 1);
> +
> + WREG32(mmTCP_ADDR_CONFIG, tcp_addr_config);
> + WREG32(mmTCP_CHAN_STEER_HI, upper_32_bits(patched_chan_steer));
> + WREG32(mmTCP_CHAN_STEER_LO, lower_32_bits(patched_chan_steer));
> +}
> +
> static void gfx_v6_0_config_init(struct amdgpu_device *adev)
> {
> adev->gfx.config.double_offchip_lds_buf = 0;
> @@ -1729,6 +1791,7 @@ static void gfx_v6_0_constants_init(struct amdgpu_device *adev)
> gfx_v6_0_tiling_mode_table_init(adev);
>
> gfx_v6_0_setup_rb(adev);
> + gfx_v6_0_setup_tcc(adev);
>
> gfx_v6_0_setup_spi(adev);
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs
2026-04-17 12:56 ` Christian König
@ 2026-04-17 13:36 ` Alex Deucher
2026-04-17 14:09 ` Timur Kristóf
0 siblings, 1 reply; 18+ messages in thread
From: Alex Deucher @ 2026-04-17 13:36 UTC (permalink / raw)
To: Christian König; +Cc: Timur Kristóf, amd-gfx, alexander.deucher
On Fri, Apr 17, 2026 at 9:04 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 4/16/26 22:26, Timur Kristóf wrote:
> > This commit fixes amdgpu to work on the Radeon HD 7870 XT
> > which has never worked with the Linux open source drivers before.
> >
> > Some boards have "harvested" chips, meaning that some parts of
> > the chip are disabled and fused, and it's sold for cheaper and
> > under a different marketing name.
> > On a harvested chip, any of the following can be disabled:
> > - CUs (Compute Units)
> > - RBs (Render Backend, aka. ROP)
> > - Memory channels (ie. the chip has a lower bandwidth)
> > - TCCs (ie. less L2 cache)
> >
> > Handle chips with harvested TCCs by patching the registers
> > that configure how TCCs are mapped.
> >
> > If some TCCs are disabled, we need to make sure that
> > the disabled TCCs are not used, and the remaining TCCs
> > are used optimally.
> >
> > TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
> > TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
> >
> > Note that the TCC configuration is highly relevant to performance.
> > Suboptimal configuration (eg. CHAN_STEER=0) can significantly
> > reduce gaming performance.
> >
> > For optimal performance:
> > - Rely on the CHAN_STEER from the golden registers table,
> > only skip disabled TCCs but keep the mapping order.
> > - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
> > which performs better than using the same TCC twice.
> >
> > Link: https://bugs.freedesktop.org/show_bug.cgi?id=60879
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2664
> > Fixes: 2cd46ad22383 ("drm/amdgpu: add graphic pipeline implementation for si v8")
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> > drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 63 +++++++++++++++++++++++++++
> > 1 file changed, 63 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> > index 73223d97a87f5..baddb3aa7fa3c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> > @@ -1571,6 +1571,68 @@ static void gfx_v6_0_setup_spi(struct amdgpu_device *adev)
> > mutex_unlock(&adev->grbm_idx_mutex);
> > }
> >
> > +/**
> > + * gfx_v6_0_setup_tcc() - setup which TCCs are used
> > + *
> > + * @adev: amdgpu_device pointer
> > + *
> > + * Verify whether the current GPU has any TCCs disabled,
> > + * which can happen when the GPU is harvested and some
> > + * memory channels are disabled, reducing the memory bus width.
> > + * For example, on the Radeon HD 7870 XT (Tahiti LE).
> > + *
> > + * If some TCCs are disabled, we need to make sure that
> > + * the disabled TCCs are not used, and the remaining TCCs
> > + * are used optimally.
> > + *
> > + * TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
> > + * TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.
> > + *
> > + * For optimal performance:
> > + * - Rely on the CHAN_STEER from the golden registers table,
> > + * only skip disabled TCCs but keep the mapping order.
> > + * - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
> > + * which performs better than using the same TCC twice.
> > + */
> > +static void gfx_v6_0_setup_tcc(struct amdgpu_device *adev)
> > +{
> > + u32 i, tcc, tcp_addr_config, num_active_tcc = 0;
> > + u64 chan_steer, patched_chan_steer = 0;
> > + const u32 num_max_tcc = adev->gfx.config.max_texture_channel_caches;
> > + const u32 dis_tcc_mask = amdgpu_gfx_create_bitmask(num_max_tcc) &
> > + REG_GET_FIELD(RREG32(mmCGTS_TCC_DISABLE),
> > + CGTS_TCC_DISABLE, TCC_DISABLE);
I would OR dis_tcc_mask with mmCGTS_USER_TCC_DISABLE as well in case
someone has set additional TCCs to disable as well. Other than that,
looks good to me.
Alex
> > +
> > + /* When no TCC is disabled, the golden registers table already has optimal TCC setup */
> > + if (!dis_tcc_mask)
> > + return;
> > +
> > + /* Each 4-bit nibble contains the index of a TCC used by all TCPs */
> > + chan_steer = RREG32(mmTCP_CHAN_STEER_LO) | ((u64)RREG32(mmTCP_CHAN_STEER_HI) << 32ull);
> > +
> > + /* Patch the TCP to TCC mapping to skip disabled TCCs */
> > + for (i = 0; i < num_max_tcc; ++i) {
> > + tcc = (chan_steer >> (u64)(4 * i)) & 0xf;
> > +
> > + if (!((1 << tcc) & dis_tcc_mask)) {
> > + /* Copy enabled TCC indices to the patched register value. */
> > + patched_chan_steer |= (u64)tcc << (u64)(4 * num_active_tcc);
> > + ++num_active_tcc;
> > + }
> > + }
> > +
> > + WARN_ON(num_active_tcc != num_max_tcc - hweight32(dis_tcc_mask));
> > +
> > + /* Patch number of TCCs used by TCPs */
> > + tcp_addr_config = REG_SET_FIELD(RREG32(mmTCP_ADDR_CONFIG),
> > + TCP_ADDR_CONFIG, NUM_TCC_BANKS,
> > + num_active_tcc - 1);
> > +
> > + WREG32(mmTCP_ADDR_CONFIG, tcp_addr_config);
> > + WREG32(mmTCP_CHAN_STEER_HI, upper_32_bits(patched_chan_steer));
> > + WREG32(mmTCP_CHAN_STEER_LO, lower_32_bits(patched_chan_steer));
> > +}
> > +
> > static void gfx_v6_0_config_init(struct amdgpu_device *adev)
> > {
> > adev->gfx.config.double_offchip_lds_buf = 0;
> > @@ -1729,6 +1791,7 @@ static void gfx_v6_0_constants_init(struct amdgpu_device *adev)
> > gfx_v6_0_tiling_mode_table_init(adev);
> >
> > gfx_v6_0_setup_rb(adev);
> > + gfx_v6_0_setup_tcc(adev);
> >
> > gfx_v6_0_setup_spi(adev);
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 7/7] drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs
2026-04-17 13:36 ` Alex Deucher
@ 2026-04-17 14:09 ` Timur Kristóf
0 siblings, 0 replies; 18+ messages in thread
From: Timur Kristóf @ 2026-04-17 14:09 UTC (permalink / raw)
To: Christian König, Alex Deucher; +Cc: amd-gfx, alexander.deucher
On 2026. április 17., péntek 15:36:20 közép-európai nyári idő Alex Deucher
wrote:
> > > +static void gfx_v6_0_setup_tcc(struct amdgpu_device *adev)
> > > +{
> > > + u32 i, tcc, tcp_addr_config, num_active_tcc = 0;
> > > + u64 chan_steer, patched_chan_steer = 0;
> > > + const u32 num_max_tcc =
> > > adev->gfx.config.max_texture_channel_caches;
> > > + const u32 dis_tcc_mask = amdgpu_gfx_create_bitmask(num_max_tcc) &
> > > + REG_GET_FIELD(RREG32(mmCGTS_TCC_DISABLE),
> > > + CGTS_TCC_DISABLE,
> > > TCC_DISABLE);
> I would OR dis_tcc_mask with mmCGTS_USER_TCC_DISABLE as well in case
> someone has set additional TCCs to disable as well. Other than that,
> looks good to me.
Thank you, will do.
I'll split off the VCE patches into a separate series and send a second version
of this series with the consideration for CGTS_USER_TCC_DISABLE added.
Timur
^ permalink raw reply [flat|nested] 18+ messages in thread