amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] drm/amdgpu: Support VCE1 IP block
@ 2025-10-28 22:06 Timur Kristóf
  2025-10-28 22:06 ` [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT Timur Kristóf
                   ` (13 more replies)
  0 siblings, 14 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

VCE1 is the Video Coding Engine 1.0 found in SI GPUs.
Add support for the VCE1 IP block, which is the last
missing piece for fully-featured SI support in amdgpu.
Co-developed by Alexandre Demers and Christian König.

This VCE1 implementation is based on:
VCE2 code in amdgpu
VCE1 code in radeon
Research by Alexandre and Christian

The biggest challenge was getting the firmware
validation mechanism to work correctly. Due to
some limitations in the HW, the VCE1 requires
the VCPU BO to be located at a low 32-bit address.
This was achieved by placing the GART in the
LOW address space and manually mapping the
VCPU BO in the GART page table.

Also hook up the VCE1 to the DPM.

Tested on the following HW:
Radeon R9 280X (Tahiti)
Radeon HD 7990 (Tahiti)
FirePro W9000 (Tahiti)
Radeon R7 450 (Cape Verde)

Looking forward to reviews and feedback!

Timur Kristóf (14):
  drm/amdgpu/gmc: Don't hardcode GART page count before GTT
  drm/amdgpu/gmc6: Place gart at low address range
  drm/amdgpu/gmc6: Add GART space for VCPU BO
  drm/amdgpu/gart: Add helper to bind VRAM BO
  drm/amdgpu/vce: Clear VCPU BO before copying firmware to it
  drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  drm/amdgpu/si,cik,vi: Verify IP block when querying video codecs
  drm/amdgpu/vce1: Clean up register definitions
  drm/amdgpu/vce1: Load VCE1 firmware
  drm/amdgpu/vce1: Implement VCE1 IP block
  drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space
  drm/amd/pm/si: Hook up VCE1 to SI DPM
  drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs
  drm/amdgpu/vce1: Tolerate VCE PLL timeout better

 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c      |  41 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h      |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       | 134 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h       |   2 +
 drivers/gpu/drm/amd/amdgpu/cik.c              |   6 +
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c         |   7 +-
 drivers/gpu/drm/amd/amdgpu/si.c               |  26 +-
 drivers/gpu/drm/amd/amdgpu/sid.h              |  40 -
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c         | 857 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h         |  32 +
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c         |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c         |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c         |   5 +
 drivers/gpu/drm/amd/amdgpu/vi.c               |   6 +
 .../drm/amd/include/asic_reg/vce/vce_1_0_d.h  |   5 +
 .../include/asic_reg/vce/vce_1_0_sh_mask.h    |  10 +
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c    |  18 +-
 22 files changed, 1099 insertions(+), 112 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h

-- 
2.51.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:00   ` Christian König
  2025-10-28 22:06 ` [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

GART contains some pages in its address space that come before
the GTT and are used for BO copies.

Instead of hardcoding the size of the GART space before GTT,
make it a field in the amdgpu_gmc struct. This allows us to map
more things in GART before GTT.

Split this into a separate patch to make it easier to bisect,
in case there are any errors in the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h     | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 97b562a79ea8..bf31bd022d6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -325,6 +325,8 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,
 		break;
 	}
 
+	mc->num_gart_pages_before_gtt =
+		AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
 	mc->gart_start &= ~(four_gb - 1);
 	mc->gart_end = mc->gart_start + mc->gart_size - 1;
 	dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 55097ca10738..568eed3eb557 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -266,6 +266,7 @@ struct amdgpu_gmc {
 	u64			fb_end;
 	unsigned		vram_width;
 	u64			real_vram_size;
+	u32			num_gart_pages_before_gtt;
 	int			vram_mtrr;
 	u64                     mc_mask;
 	const struct firmware   *fw;	/* MC firmware */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 0760e70402ec..4c2563a70c2b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -283,7 +283,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
 
 	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
 
-	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
+	start = adev->gmc.num_gart_pages_before_gtt;
 	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
 	drm_mm_init(&mgr->mm, start, size);
 	spin_lock_init(&mgr->lock);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
  2025-10-28 22:06 ` [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:00   ` Christian König
  2025-10-28 22:06 ` [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO Timur Kristóf
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Instead of using a best-fit algorithm to determine which part
of the VMID 0 address space to use for GART, always use the low
address range.

A subsequent commit will use this to map the VCPU BO in GART
for the VCE1 IP block.

Split this into	a separate patch to make it easier to bisect,
in case	there are any errors in	the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index f6ad7911f1e6..499dfd78092d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -213,7 +213,7 @@ static void gmc_v6_0_vram_gtt_location(struct amdgpu_device *adev,
 
 	amdgpu_gmc_set_agp_default(adev, mc);
 	amdgpu_gmc_vram_location(adev, mc, base);
-	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_BEST_FIT);
+	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW);
 }
 
 static void gmc_v6_0_mc_program(struct amdgpu_device *adev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
  2025-10-28 22:06 ` [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT Timur Kristóf
  2025-10-28 22:06 ` [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:05   ` Christian König
  2025-10-28 22:06 ` [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO Timur Kristóf
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Add an extra 16M (4096 pages) to the GART before GTT.
This space is going to be used for the VCE VCPU BO.

Split this into	a separate patch to make it easier to bisect,
in case	there are any errors in	the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 499dfd78092d..bfeb60cfbf62 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -214,6 +214,9 @@ static void gmc_v6_0_vram_gtt_location(struct amdgpu_device *adev,
 	amdgpu_gmc_set_agp_default(adev, mc);
 	amdgpu_gmc_vram_location(adev, mc, base);
 	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW);
+
+	/* Add space for VCE's VCPU BO so that VCE1 can access it. */
+	mc->num_gart_pages_before_gtt += 4096;
 }
 
 static void gmc_v6_0_mc_program(struct amdgpu_device *adev)
@@ -338,7 +341,7 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev)
 		case CHIP_TAHITI:   /* UVD, VCE do not support GPUVM */
 		case CHIP_PITCAIRN: /* UVD, VCE do not support GPUVM */
 		case CHIP_OLAND:    /* UVD, VCE do not support GPUVM */
-			adev->gmc.gart_size = 1024ULL << 20;
+			adev->gmc.gart_size = 1040ULL << 20;
 			break;
 		}
 	} else {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (2 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:16   ` Christian König
  2025-10-28 22:06 ` [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it Timur Kristóf
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Binds a BO that is allocated in VRAM to the GART page table.

Useful when a kernel BO is located in VRAM but
needs to be accessed from the GART address space,
for example to give a kernel BO a 32-bit address
when GART is placed in LOW address space.

Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 41 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 83f3b94ed975..19b5e72a6a26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -390,6 +390,47 @@ void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
 	amdgpu_gart_map(adev, offset, pages, dma_addr, flags, adev->gart.ptr);
 }
 
+/**
+ * amdgpu_gart_bind - bind VRAM BO into the GART page table
+ *
+ * @adev: amdgpu_device pointer
+ * @offset: offset into the GPU's gart aperture
+ * @bo: the BO whose pages should be mapped
+ * @flags: page table entry flags
+ *
+ * Binds a BO that is allocated in VRAM to the GART page table
+ * (all ASICs).
+ * Useful when a kernel BO is located in VRAM but
+ * needs to be accessed from the GART address space,
+ * for example to give a kernel BO a 32-bit address
+ * when GART is placed in LOW address space.
+ */
+void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t offset,
+		     struct amdgpu_bo *bo, uint64_t flags)
+{
+	u64 pa, bo_size;
+	u32 num_pages, start_page, i, idx;
+
+	if (!adev->gart.ptr)
+		return;
+
+	if (!drm_dev_enter(adev_to_drm(adev), &idx))
+		return;
+
+	pa = amdgpu_gmc_vram_pa(adev, bo);
+	bo_size = amdgpu_bo_size(bo);
+	num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
+	start_page = offset / AMDGPU_GPU_PAGE_SIZE;
+
+	for (i = 0; i < num_pages; ++i) {
+		amdgpu_gmc_set_pte_pde(adev, adev->gart.ptr,
+			start_page + i, pa + AMDGPU_GPU_PAGE_SIZE * i, flags);
+	}
+
+	amdgpu_gart_invalidate_tlb(adev);
+	drm_dev_exit(idx);
+}
+
 /**
  * amdgpu_gart_invalidate_tlb - invalidate gart TLB
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index 7cc980bf4725..756548d0b520 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -64,5 +64,7 @@ void amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
 		     void *dst);
 void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
 		      int pages, dma_addr_t *dma_addr, uint64_t flags);
+void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t offset,
+		     struct amdgpu_bo *bo, uint64_t flags);
 void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
 #endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (3 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:19   ` Christian König
  2025-10-28 22:06 ` [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

The VCPU BO doesn't only contain the VCE firmware but also other
ranges that the VCE uses for its stack and data. Let's initialize
this to zero to avoid having garbage in the VCPU BO.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index b9060bcd4806..eaa06dbef5c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -310,6 +310,7 @@ int amdgpu_vce_resume(struct amdgpu_device *adev)
 	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
 
 	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
+		memset32(cpu_addr, 0, amdgpu_bo_size(adev->vce.vcpu_bo) / 4);
 		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
 			    adev->vce.fw->size - offset);
 		drm_dev_exit(idx);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (4 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:26   ` Christian König
  2025-10-29 17:16   ` Liu, Leo
  2025-10-28 22:06 ` [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs Timur Kristóf
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Try to load the VCE firmware at early_init.

When the correct firmware is not found, return -ENOENT.
This way, the driver initialization will complete even
without VCE, and the GPU will be functional, albeit
without video encoding capabilities.

This is necessary because we are planning to add support
for the VCE1, and AMD hasn't yet publised the correct
firmware for this version. So we need to anticipate that
users will try to boot amdgpu on SI GPUs without the
correct VCE1 firmware present on their system.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 121 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c   |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c   |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   |   5 +
 5 files changed, 91 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index eaa06dbef5c4..b23a48a1efc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -88,82 +88,87 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 				      bool direct, struct dma_fence **fence);
 
 /**
- * amdgpu_vce_sw_init - allocate memory, load vce firmware
+ * amdgpu_vce_firmware_name() - determine the firmware file name for VCE
  *
  * @adev: amdgpu_device pointer
- * @size: size for the new BO
  *
- * First step to get VCE online, allocate memory and load the firmware
+ * Each chip that has VCE IP may need a different firmware.
+ * This function returns the name of the VCE firmware file
+ * appropriate for the current chip.
  */
-int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
+static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
 {
-	const char *fw_name;
-	const struct common_firmware_header *hdr;
-	unsigned int ucode_version, version_major, version_minor, binary_id;
-	int i, r;
-
 	switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_CIK
 	case CHIP_BONAIRE:
-		fw_name = FIRMWARE_BONAIRE;
-		break;
+		return FIRMWARE_BONAIRE;
 	case CHIP_KAVERI:
-		fw_name = FIRMWARE_KAVERI;
-		break;
+		return FIRMWARE_KAVERI;
 	case CHIP_KABINI:
-		fw_name = FIRMWARE_KABINI;
-		break;
+		return FIRMWARE_KABINI;
 	case CHIP_HAWAII:
-		fw_name = FIRMWARE_HAWAII;
-		break;
+		return FIRMWARE_HAWAII;
 	case CHIP_MULLINS:
-		fw_name = FIRMWARE_MULLINS;
-		break;
+		return FIRMWARE_MULLINS;
 #endif
 	case CHIP_TONGA:
-		fw_name = FIRMWARE_TONGA;
-		break;
+		return  FIRMWARE_TONGA;
 	case CHIP_CARRIZO:
-		fw_name = FIRMWARE_CARRIZO;
-		break;
+		return  FIRMWARE_CARRIZO;
 	case CHIP_FIJI:
-		fw_name = FIRMWARE_FIJI;
-		break;
+		return  FIRMWARE_FIJI;
 	case CHIP_STONEY:
-		fw_name = FIRMWARE_STONEY;
-		break;
+		return  FIRMWARE_STONEY;
 	case CHIP_POLARIS10:
-		fw_name = FIRMWARE_POLARIS10;
-		break;
+		return  FIRMWARE_POLARIS10;
 	case CHIP_POLARIS11:
-		fw_name = FIRMWARE_POLARIS11;
-		break;
+		return  FIRMWARE_POLARIS11;
 	case CHIP_POLARIS12:
-		fw_name = FIRMWARE_POLARIS12;
-		break;
+		return  FIRMWARE_POLARIS12;
 	case CHIP_VEGAM:
-		fw_name = FIRMWARE_VEGAM;
-		break;
+		return  FIRMWARE_VEGAM;
 	case CHIP_VEGA10:
-		fw_name = FIRMWARE_VEGA10;
-		break;
+		return  FIRMWARE_VEGA10;
 	case CHIP_VEGA12:
-		fw_name = FIRMWARE_VEGA12;
-		break;
+		return  FIRMWARE_VEGA12;
 	case CHIP_VEGA20:
-		fw_name = FIRMWARE_VEGA20;
-		break;
+		return  FIRMWARE_VEGA20;
 
 	default:
-		return -EINVAL;
+		return NULL;
 	}
+}
+
+/**
+ * amdgpu_vce_early_init() - try to load VCE firmware
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Tries to load the VCE firmware.
+ *
+ * When not found, returns ENOENT so that the driver can
+ * still load and initialize the rest of the IP blocks.
+ * The GPU can function just fine without VCE, they will just
+ * not support video encoding.
+ */
+int amdgpu_vce_early_init(struct amdgpu_device *adev)
+{
+	const char *fw_name = amdgpu_vce_firmware_name(adev);
+	const struct common_firmware_header *hdr;
+	unsigned int ucode_version, version_major, version_minor, binary_id;
+	int r;
+
+	if (!fw_name)
+		return -ENOENT;
 
 	r = amdgpu_ucode_request(adev, &adev->vce.fw, AMDGPU_UCODE_REQUIRED, "%s", fw_name);
 	if (r) {
-		dev_err(adev->dev, "amdgpu_vce: Can't validate firmware \"%s\"\n",
-			fw_name);
+		dev_err(adev->dev,
+			"amdgpu_vce: Firmware \"%s\" not found or failed to validate (%d)\n",
+			fw_name, r);
+
 		amdgpu_ucode_release(&adev->vce.fw);
-		return r;
+		return -ENOENT;
 	}
 
 	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
@@ -172,11 +177,35 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
 	version_major = (ucode_version >> 20) & 0xfff;
 	version_minor = (ucode_version >> 8) & 0xfff;
 	binary_id = ucode_version & 0xff;
-	DRM_INFO("Found VCE firmware Version: %d.%d Binary ID: %d\n",
+	dev_info(adev->dev, "Found VCE firmware Version: %d.%d Binary ID: %d\n",
 		version_major, version_minor, binary_id);
 	adev->vce.fw_version = ((version_major << 24) | (version_minor << 16) |
 				(binary_id << 8));
 
+	return 0;
+}
+
+/**
+ * amdgpu_vce_sw_init() - allocate memory for VCE BO
+ *
+ * @adev: amdgpu_device pointer
+ * @size: size for the new BO
+ *
+ * First step to get VCE online: allocate memory for VCE BO.
+ * The VCE firmware binary is copied into the VCE BO later,
+ * in amdgpu_vce_resume. The VCE executes its code from the
+ * VCE BO and also uses the space in this BO for its stack and data.
+ *
+ * Ideally this BO should be placed in VRAM for optimal performance,
+ * although technically it also runs from system RAM (albeit slowly).
+ */
+int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
+{
+	int i, r;
+
+	if (!adev->vce.fw)
+		return -ENOENT;
+
 	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
 				    AMDGPU_GEM_DOMAIN_VRAM |
 				    AMDGPU_GEM_DOMAIN_GTT,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 6e53f872d084..22acd7b35945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -53,6 +53,7 @@ struct amdgpu_vce {
 	unsigned		num_rings;
 };
 
+int amdgpu_vce_early_init(struct amdgpu_device *adev);
 int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size);
 int amdgpu_vce_sw_fini(struct amdgpu_device *adev);
 int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
index bee3e904a6bc..8ea8a6193492 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
@@ -407,6 +407,11 @@ static void vce_v2_0_enable_mgcg(struct amdgpu_device *adev, bool enable,
 static int vce_v2_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
 
 	adev->vce.num_rings = 2;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 708123899c41..719e9643c43d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -399,6 +399,7 @@ static unsigned vce_v3_0_get_harvest_config(struct amdgpu_device *adev)
 static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
 
 	adev->vce.harvest_config = vce_v3_0_get_harvest_config(adev);
 
@@ -407,6 +408,10 @@ static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 	    (AMDGPU_VCE_HARVEST_VCE0 | AMDGPU_VCE_HARVEST_VCE1))
 		return -ENOENT;
 
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
+
 	adev->vce.num_rings = 3;
 
 	vce_v3_0_set_ring_funcs(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
index 335bda64ff5b..2d64002bed61 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
@@ -410,6 +410,11 @@ static int vce_v4_0_stop(struct amdgpu_device *adev)
 static int vce_v4_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
 
 	if (amdgpu_sriov_vf(adev)) /* currently only VCN0 support SRIOV */
 		adev->vce.num_rings = 1;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (5 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 10:35   ` Christian König
  2025-10-28 22:06 ` [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Some harvested chips may not have any IP blocks,
or we may not have the firmware for the IP blocks.
In these cases, the query should return that no video
codec is supported.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/cik.c        | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/si.c         | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/vi.c         | 6 ++++++
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b3e6b3fcdf2c..42b5da59d00f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1263,7 +1263,8 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 			-EFAULT : 0;
 	}
 	case AMDGPU_INFO_VIDEO_CAPS: {
-		const struct amdgpu_video_codecs *codecs;
+		static const struct amdgpu_video_codecs no_codecs = {0};
+		const struct amdgpu_video_codecs *codecs = &no_codecs;
 		struct drm_amdgpu_info_video_caps *caps;
 		int r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index 9cd63b4177bf..b755238c2c3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -130,6 +130,12 @@ static const struct amdgpu_video_codecs cik_video_codecs_decode =
 static int cik_query_video_codecs(struct amdgpu_device *adev, bool encode,
 				  const struct amdgpu_video_codecs **codecs)
 {
+	const enum amd_ip_block_type ip =
+		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
+
+	if (!amdgpu_device_ip_is_valid(adev, ip))
+		return 0;
+
 	switch (adev->asic_type) {
 	case CHIP_BONAIRE:
 	case CHIP_HAWAII:
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index e0f139de7991..9468c03bdb1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -1003,6 +1003,12 @@ static const struct amdgpu_video_codecs hainan_video_codecs_decode =
 static int si_query_video_codecs(struct amdgpu_device *adev, bool encode,
 				 const struct amdgpu_video_codecs **codecs)
 {
+	const enum amd_ip_block_type ip =
+		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
+
+	if (!amdgpu_device_ip_is_valid(adev, ip))
+		return 0;
+
 	switch (adev->asic_type) {
 	case CHIP_VERDE:
 	case CHIP_TAHITI:
diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index a611a7345125..f0e4193cf722 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -256,6 +256,12 @@ static const struct amdgpu_video_codecs cz_video_codecs_decode =
 static int vi_query_video_codecs(struct amdgpu_device *adev, bool encode,
 				 const struct amdgpu_video_codecs **codecs)
 {
+	const enum amd_ip_block_type ip =
+		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
+
+	if (!amdgpu_device_ip_is_valid(adev, ip))
+		return 0;
+
 	switch (adev->asic_type) {
 	case CHIP_TOPAZ:
 		if (encode)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (6 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:23   ` Christian König
  2025-10-28 22:06 ` [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

The sid.h header contained some VCE1 register definitions, but
they were using byte offsets (probably copied from the old radeon
driver). Move all of these to the proper VCE1 headers.

Also add the register definitions that we need for the
firmware validation mechanism in VCE1.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sid.h              | 40 -------------------
 .../drm/amd/include/asic_reg/vce/vce_1_0_d.h  |  5 +++
 .../include/asic_reg/vce/vce_1_0_sh_mask.h    | 10 +++++
 3 files changed, 15 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sid.h b/drivers/gpu/drm/amd/amdgpu/sid.h
index cbd4f8951cfa..561462a8332e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sid.h
+++ b/drivers/gpu/drm/amd/amdgpu/sid.h
@@ -582,45 +582,6 @@
 #define	DMA_PACKET_NOP					  0xf
 
 /* VCE */
-#define VCE_STATUS					0x20004
-#define VCE_VCPU_CNTL					0x20014
-#define		VCE_CLK_EN				(1 << 0)
-#define VCE_VCPU_CACHE_OFFSET0				0x20024
-#define VCE_VCPU_CACHE_SIZE0				0x20028
-#define VCE_VCPU_CACHE_OFFSET1				0x2002c
-#define VCE_VCPU_CACHE_SIZE1				0x20030
-#define VCE_VCPU_CACHE_OFFSET2				0x20034
-#define VCE_VCPU_CACHE_SIZE2				0x20038
-#define VCE_SOFT_RESET					0x20120
-#define 	VCE_ECPU_SOFT_RESET			(1 << 0)
-#define 	VCE_FME_SOFT_RESET			(1 << 2)
-#define VCE_RB_BASE_LO2					0x2016c
-#define VCE_RB_BASE_HI2					0x20170
-#define VCE_RB_SIZE2					0x20174
-#define VCE_RB_RPTR2					0x20178
-#define VCE_RB_WPTR2					0x2017c
-#define VCE_RB_BASE_LO					0x20180
-#define VCE_RB_BASE_HI					0x20184
-#define VCE_RB_SIZE					0x20188
-#define VCE_RB_RPTR					0x2018c
-#define VCE_RB_WPTR					0x20190
-#define VCE_CLOCK_GATING_A				0x202f8
-#define VCE_CLOCK_GATING_B				0x202fc
-#define VCE_UENC_CLOCK_GATING				0x205bc
-#define VCE_UENC_REG_CLOCK_GATING			0x205c0
-#define VCE_FW_REG_STATUS				0x20e10
-#	define VCE_FW_REG_STATUS_BUSY			(1 << 0)
-#	define VCE_FW_REG_STATUS_PASS			(1 << 3)
-#	define VCE_FW_REG_STATUS_DONE			(1 << 11)
-#define VCE_LMI_FW_START_KEYSEL				0x20e18
-#define VCE_LMI_FW_PERIODIC_CTRL			0x20e20
-#define VCE_LMI_CTRL2					0x20e74
-#define VCE_LMI_CTRL					0x20e98
-#define VCE_LMI_VM_CTRL					0x20ea0
-#define VCE_LMI_SWAP_CNTL				0x20eb4
-#define VCE_LMI_SWAP_CNTL1				0x20eb8
-#define VCE_LMI_CACHE_CTRL				0x20ef4
-
 #define VCE_CMD_NO_OP					0x00000000
 #define VCE_CMD_END					0x00000001
 #define VCE_CMD_IB					0x00000002
@@ -629,7 +590,6 @@
 #define VCE_CMD_IB_AUTO					0x00000005
 #define VCE_CMD_SEMAPHORE				0x00000006
 
-
 //#dce stupp
 /* display controller offsets used for crtc/cur/lut/grph/viewport/etc. */
 #define CRTC0_REGISTER_OFFSET                 (0x1b7c - 0x1b7c) //(0x6df0 - 0x6df0)/4
diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
index 2176548e9203..9778822dd2a0 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
@@ -60,5 +60,10 @@
 #define mmVCE_VCPU_CACHE_SIZE1 0x800C
 #define mmVCE_VCPU_CACHE_SIZE2 0x800E
 #define mmVCE_VCPU_CNTL 0x8005
+#define mmVCE_VCPU_SCRATCH7 0x8037
+#define mmVCE_FW_REG_STATUS 0x8384
+#define mmVCE_LMI_FW_PERIODIC_CTRL 0x8388
+#define mmVCE_LMI_FW_START_KEYSEL 0x8386
+
 
 #endif
diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
index ea5b26b11cb1..1f82d6f5abde 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
@@ -61,6 +61,8 @@
 #define VCE_RB_WPTR__RB_WPTR__SHIFT 0x00000004
 #define VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK 0x00000001L
 #define VCE_SOFT_RESET__ECPU_SOFT_RESET__SHIFT 0x00000000
+#define VCE_SOFT_RESET__FME_SOFT_RESET_MASK 0x00000004L
+#define VCE_SOFT_RESET__FME_SOFT_RESET__SHIFT 0x00000002
 #define VCE_STATUS__JOB_BUSY_MASK 0x00000001L
 #define VCE_STATUS__JOB_BUSY__SHIFT 0x00000000
 #define VCE_STATUS__UENC_BUSY_MASK 0x00000100L
@@ -95,5 +97,13 @@
 #define VCE_VCPU_CNTL__CLK_EN__SHIFT 0x00000000
 #define VCE_VCPU_CNTL__RBBM_SOFT_RESET_MASK 0x00040000L
 #define VCE_VCPU_CNTL__RBBM_SOFT_RESET__SHIFT 0x00000012
+#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK 0x00010000
+#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_SHIFT 0x00000010
+#define VCE_FW_REG_STATUS__BUSY_MASK 0x0000001
+#define VCE_FW_REG_STATUS__BUSY__SHIFT 0x0000001
+#define VCE_FW_REG_STATUS__PASS_MASK 0x0000008
+#define VCE_FW_REG_STATUS__PASS__SHIFT 0x0000003
+#define VCE_FW_REG_STATUS__DONE_MASK 0x0000800
+#define VCE_FW_REG_STATUS__DONE__SHIFT 0x000000b
 
 #endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (7 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:28   ` Christian König
  2025-10-28 22:06 ` [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block Timur Kristóf
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Load VCE1 firmware using amdgpu_ucode_request, just like
it is done for other VCE versions.

All SI chips share the same VCE1 firmware file: vce_1_0_0.bin
which will be sent to linux-firmware soon.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index b23a48a1efc1..7fcc27d4453e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -41,6 +41,9 @@
 #define VCE_IDLE_TIMEOUT	msecs_to_jiffies(1000)
 
 /* Firmware Names */
+#ifdef CONFIG_DRM_AMDGPU_SI
+#define FIRMWARE_VCE_V1_0	"amdgpu/vce_1_0_0.bin"
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 #define FIRMWARE_BONAIRE	"amdgpu/bonaire_vce.bin"
 #define FIRMWARE_KABINI	"amdgpu/kabini_vce.bin"
@@ -61,6 +64,9 @@
 #define FIRMWARE_VEGA12		"amdgpu/vega12_vce.bin"
 #define FIRMWARE_VEGA20		"amdgpu/vega20_vce.bin"
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+MODULE_FIRMWARE(FIRMWARE_VCE_V1_0);
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 MODULE_FIRMWARE(FIRMWARE_BONAIRE);
 MODULE_FIRMWARE(FIRMWARE_KABINI);
@@ -99,6 +105,12 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
 {
 	switch (adev->asic_type) {
+#ifdef CONFIG_DRM_AMDGPU_SI
+	case CHIP_PITCAIRN:
+	case CHIP_TAHITI:
+	case CHIP_VERDE:
+		return FIRMWARE_VCE_V1_0;
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 	case CHIP_BONAIRE:
 		return FIRMWARE_BONAIRE;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (8 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:38   ` Christian König
  2025-10-28 22:06 ` [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space Timur Kristóf
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Implement the necessary functionality to support the VCE1.
This implementation is based on:

- VCE2 code from amdgpu
- VCE1 code from radeon (the old driver)
- Some trial and error

A subsequent commit will ensure correct mapping for
the VCPU BO, which will make this actually work.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c   | 805 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h   |  32 +
 4 files changed, 839 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index ebe08947c5a3..c88760fb52ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -78,7 +78,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o \
 	dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o vce_v2_0.o
 
 amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o dce_v6_0.o \
-	uvd_v3_1.o
+	uvd_v3_1.o vce_v1_0.o
 
 amdgpu-y += \
 	vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 22acd7b35945..050783802623 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -51,6 +51,7 @@ struct amdgpu_vce {
 	struct drm_sched_entity	entity;
 	uint32_t                srbm_soft_reset;
 	unsigned		num_rings;
+	uint32_t		keyselect;
 };
 
 int amdgpu_vce_early_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
new file mode 100644
index 000000000000..e62fd8ed1992
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -0,0 +1,805 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2013 Advanced Micro Devices, Inc.
+ * Copyright 2025 Valve Corporation
+ * Copyright 2025 Alexandre Demers
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * Authors: Christian König <christian.koenig@amd.com>
+ *          Timur Kristóf <timur.kristof@gmail.com>
+ *          Alexandre Demers <alexandre.f.demers@gmail.com>
+ */
+
+#include <linux/firmware.h>
+
+#include "amdgpu.h"
+#include "amdgpu_vce.h"
+#include "sid.h"
+#include "vce_v1_0.h"
+#include "vce/vce_1_0_d.h"
+#include "vce/vce_1_0_sh_mask.h"
+#include "oss/oss_1_0_d.h"
+#include "oss/oss_1_0_sh_mask.h"
+
+#define VCE_V1_0_FW_SIZE	(256 * 1024)
+#define VCE_V1_0_STACK_SIZE	(64 * 1024)
+#define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
+#define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
+
+static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
+static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
+
+struct vce_v1_0_fw_signature {
+	int32_t offset;
+	uint32_t length;
+	int32_t number;
+	struct {
+		uint32_t chip_id;
+		uint32_t keyselect;
+		uint32_t nonce[4];
+		uint32_t sigval[4];
+	} val[8];
+};
+
+/**
+ * vce_v1_0_ring_get_rptr - get read pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Returns the current hardware read pointer
+ */
+static uint64_t vce_v1_0_ring_get_rptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		return RREG32(mmVCE_RB_RPTR);
+	else
+		return RREG32(mmVCE_RB_RPTR2);
+}
+
+/**
+ * vce_v1_0_ring_get_wptr - get write pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Returns the current hardware write pointer
+ */
+static uint64_t vce_v1_0_ring_get_wptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		return RREG32(mmVCE_RB_WPTR);
+	else
+		return RREG32(mmVCE_RB_WPTR2);
+}
+
+/**
+ * vce_v1_0_ring_set_wptr - set write pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Commits the write pointer to the hardware
+ */
+static void vce_v1_0_ring_set_wptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
+	else
+		WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
+}
+
+static int vce_v1_0_lmi_clean(struct amdgpu_device *adev)
+{
+	int i, j;
+
+	for (i = 0; i < 10; ++i) {
+		for (j = 0; j < 100; ++j) {
+			if (RREG32(mmVCE_LMI_STATUS) & 0x337f)
+				return 0;
+
+			mdelay(10);
+		}
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int vce_v1_0_firmware_loaded(struct amdgpu_device *adev)
+{
+	int i, j;
+
+	for (i = 0; i < 10; ++i) {
+		for (j = 0; j < 100; ++j) {
+			if (RREG32(mmVCE_STATUS) & VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK)
+				return 0;
+			mdelay(10);
+		}
+
+		dev_err(adev->dev, "VCE not responding, trying to reset the ECPU\n");
+
+		WREG32_P(mmVCE_SOFT_RESET,
+			VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK,
+			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
+		mdelay(10);
+		WREG32_P(mmVCE_SOFT_RESET, 0,
+			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
+		mdelay(10);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static void vce_v1_0_init_cg(struct amdgpu_device *adev)
+{
+	u32 tmp;
+
+	tmp = RREG32(mmVCE_CLOCK_GATING_A);
+	tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+	WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+	tmp = RREG32(mmVCE_CLOCK_GATING_B);
+	tmp |= 0x1e;
+	tmp &= ~0xe100e1;
+	WREG32(mmVCE_CLOCK_GATING_B, tmp);
+
+	tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+	tmp &= ~0xff9ff000;
+	WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+	tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+	tmp &= ~0x3ff;
+	WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+}
+
+/**
+ * vce_v1_0_load_fw_signature - load firmware signature into VCPU BO
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * The VCE1 firmware validation mechanism needs a firmware signature.
+ * This function finds the signature appropriate for the current
+ * ASIC and writes that into the VCPU BO.
+ */
+static int vce_v1_0_load_fw_signature(struct amdgpu_device *adev)
+{
+	const struct common_firmware_header *hdr;
+	struct vce_v1_0_fw_signature *sign;
+	unsigned int ucode_offset;
+	uint32_t chip_id;
+	u32 *cpu_addr;
+	int i, r;
+
+	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
+	ucode_offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
+
+	sign = (void *)adev->vce.fw->data + ucode_offset;
+
+	switch (adev->asic_type) {
+	case CHIP_TAHITI:
+		chip_id = 0x01000014;
+		break;
+	case CHIP_VERDE:
+		chip_id = 0x01000015;
+		break;
+	case CHIP_PITCAIRN:
+		chip_id = 0x01000016;
+		break;
+	default:
+		dev_err(adev->dev, "asic_type %#010x was not found!", adev->asic_type);
+		return -EINVAL;
+	}
+
+	ASSERT(adev->vce.vcpu_bo);
+
+	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
+	if (r) {
+		dev_err(adev->dev, "%s (%d) failed to reserve VCE bo\n", __func__, r);
+		return r;
+	}
+
+	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void **)&cpu_addr);
+	if (r) {
+		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
+		dev_err(adev->dev, "%s (%d) VCE map failed\n", __func__, r);
+		return r;
+	}
+
+	for (i = 0; i < le32_to_cpu(sign->number); ++i) {
+		if (le32_to_cpu(sign->val[i].chip_id) == chip_id)
+			break;
+	}
+
+	if (i == le32_to_cpu(sign->number)) {
+		dev_err(adev->dev, "%s chip_id %#010x was not found for %s in VCE firmware",
+			__func__, chip_id, amdgpu_asic_name[adev->asic_type]);
+		return -EINVAL;
+	}
+
+	cpu_addr += (256 - 64) / 4;
+	cpu_addr[0] = sign->val[i].nonce[0];
+	cpu_addr[1] = sign->val[i].nonce[1];
+	cpu_addr[2] = sign->val[i].nonce[2];
+	cpu_addr[3] = sign->val[i].nonce[3];
+	cpu_addr[4] = cpu_to_le32(le32_to_cpu(sign->length) + 64);
+
+	memset(&cpu_addr[5], 0, 44);
+	memcpy(&cpu_addr[16], &sign[1], hdr->ucode_size_bytes - sizeof(*sign));
+
+	cpu_addr += (le32_to_cpu(sign->length) + 64) / 4;
+	cpu_addr[0] = sign->val[i].sigval[0];
+	cpu_addr[1] = sign->val[i].sigval[1];
+	cpu_addr[2] = sign->val[i].sigval[2];
+	cpu_addr[3] = sign->val[i].sigval[3];
+
+	adev->vce.keyselect = le32_to_cpu(sign->val[i].keyselect);
+
+	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
+	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
+
+	return 0;
+}
+
+static int vce_v1_0_wait_for_fw_validation(struct amdgpu_device *adev)
+{
+	int i;
+
+	for (i = 0; i < 10; ++i) {
+		mdelay(10);
+		if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)
+			break;
+	}
+
+	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)) {
+		dev_err(adev->dev, "%s VCE validation timeout\n", __func__);
+		return -ETIMEDOUT;
+	}
+
+	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__PASS_MASK)) {
+		dev_err(adev->dev, "%s VCE firmware validation failed\n", __func__);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < 10; ++i) {
+		mdelay(10);
+		if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK))
+			break;
+	}
+
+	if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK) {
+		dev_err(adev->dev, "%s VCE firmware busy timeout\n", __func__);
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_mc_resume(struct amdgpu_device *adev)
+{
+	uint32_t offset;
+	uint32_t size;
+
+	/* When the keyselect is already set, don't perturb VCE FW.
+	 * Validation seems to always fail the second time.
+	 */
+	if (RREG32(mmVCE_LMI_FW_START_KEYSEL)) {
+		dev_dbg(adev->dev, "%s keyselect already set: 0x%x (on CPU: 0x%x)\n",
+			__func__, RREG32(mmVCE_LMI_FW_START_KEYSEL), adev->vce.keyselect);
+
+		WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
+		return 0;
+	}
+
+	WREG32_P(mmVCE_CLOCK_GATING_A, 0, ~(1 << 16));
+	WREG32_P(mmVCE_UENC_CLOCK_GATING, 0x1FF000, ~0xFF9FF000);
+	WREG32_P(mmVCE_UENC_REG_CLOCK_GATING, 0x3F, ~0x3F);
+	WREG32(mmVCE_CLOCK_GATING_B, 0);
+
+	WREG32_P(mmVCE_LMI_FW_PERIODIC_CTRL, 0x4, ~0x4);
+
+	WREG32(mmVCE_LMI_CTRL, 0x00398000);
+
+	WREG32_P(mmVCE_LMI_CACHE_CTRL, 0x0, ~0x1);
+	WREG32(mmVCE_LMI_SWAP_CNTL, 0);
+	WREG32(mmVCE_LMI_SWAP_CNTL1, 0);
+	WREG32(mmVCE_LMI_VM_CTRL, 0);
+
+	WREG32(mmVCE_VCPU_SCRATCH7, AMDGPU_MAX_VCE_HANDLES);
+
+	offset =  adev->vce.gpu_addr + AMDGPU_VCE_FIRMWARE_OFFSET;
+	size = VCE_V1_0_FW_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET0, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE0, size);
+
+	offset += size;
+	size = VCE_V1_0_STACK_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET1, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE1, size);
+
+	offset += size;
+	size = VCE_V1_0_DATA_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET2, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE2, size);
+
+	WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
+
+	dev_dbg(adev->dev, "VCE keyselect: %d", adev->vce.keyselect);
+	WREG32(mmVCE_LMI_FW_START_KEYSEL, adev->vce.keyselect);
+
+	return vce_v1_0_wait_for_fw_validation(adev);
+}
+
+/**
+ * vce_v1_0_is_idle() - Check idle status of VCE1 IP block
+ *
+ * @ip_block: amdgpu_ip_block pointer
+ *
+ * Check whether VCE is busy according to VCE_STATUS.
+ * Also check whether the SRBM thinks VCE is busy, although
+ * SRBM_STATUS.VCE_BUSY seems to be bogus because it
+ * appears to mirror the VCE_STATUS.VCPU_REPORT_FW_LOADED bit.
+ */
+static bool vce_v1_0_is_idle(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	bool busy =
+		(RREG32(mmVCE_STATUS) & (VCE_STATUS__JOB_BUSY_MASK | VCE_STATUS__UENC_BUSY_MASK)) ||
+		(RREG32(mmSRBM_STATUS2) & SRBM_STATUS2__VCE_BUSY_MASK);
+
+	return !busy;
+}
+
+static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	unsigned int i;
+
+	for (i = 0; i < adev->usec_timeout; i++) {
+		udelay(1);
+		if (vce_v1_0_is_idle(ip_block))
+			return 0;
+	}
+	return -ETIMEDOUT;
+}
+
+/**
+ * vce_v1_0_start - start VCE block
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Setup and start the VCE block
+ */
+static int vce_v1_0_start(struct amdgpu_device *adev)
+{
+	struct amdgpu_ring *ring;
+	int r;
+
+	WREG32_P(mmVCE_STATUS, 1, ~1);
+
+	r = vce_v1_0_mc_resume(adev);
+	if (r)
+		return r;
+
+	ring = &adev->vce.ring[0];
+	WREG32(mmVCE_RB_RPTR, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_BASE_LO, lower_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_BASE_HI, upper_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_SIZE, ring->ring_size / 4);
+
+	ring = &adev->vce.ring[1];
+	WREG32(mmVCE_RB_RPTR2, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_BASE_LO2, lower_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_BASE_HI2, upper_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_SIZE2, ring->ring_size / 4);
+
+	WREG32_P(mmVCE_VCPU_CNTL, VCE_VCPU_CNTL__CLK_EN_MASK,
+		 ~VCE_VCPU_CNTL__CLK_EN_MASK);
+
+	WREG32_P(mmVCE_SOFT_RESET,
+		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	mdelay(100);
+
+	WREG32_P(mmVCE_SOFT_RESET, 0,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	r = vce_v1_0_firmware_loaded(adev);
+
+	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
+	WREG32(mmVCE_STATUS, 0);
+
+	if (r) {
+		dev_err(adev->dev, "VCE not responding, giving up!!!\n");
+		return r;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_stop(struct amdgpu_device *adev)
+{
+	struct amdgpu_ip_block *ip_block;
+	int status;
+	int i;
+
+	ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCE);
+	if (!ip_block)
+		return -EINVAL;
+
+	if (vce_v1_0_lmi_clean(adev))
+		dev_warn(adev->dev, "%s VCE is not idle\n", __func__);
+
+	if (vce_v1_0_wait_for_idle(ip_block))
+		dev_warn(adev->dev, "VCE is busy: VCE_STATUS=0x%x, SRBM_STATUS2=0x%x\n",
+			RREG32(mmVCE_STATUS), RREG32(mmSRBM_STATUS2));
+
+	/* Stall UMC and register bus before resetting VCPU */
+	WREG32_P(mmVCE_LMI_CTRL2, 1 << 8, ~(1 << 8));
+
+	for (i = 0; i < 100; ++i) {
+		status = RREG32(mmVCE_LMI_STATUS);
+		if (status & 0x240)
+			break;
+		mdelay(1);
+	}
+
+	WREG32_P(mmVCE_VCPU_CNTL, 0, ~VCE_VCPU_CNTL__CLK_EN_MASK);
+
+	WREG32_P(mmVCE_SOFT_RESET,
+		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	WREG32(mmVCE_STATUS, 0);
+
+	return 0;
+}
+
+static void vce_v1_0_enable_mgcg(struct amdgpu_device *adev, bool enable)
+{
+	u32 tmp;
+
+	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_VCE_MGCG)) {
+		tmp = RREG32(mmVCE_CLOCK_GATING_A);
+		tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+		WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+		tmp &= ~0x1ff000;
+		tmp |= 0xff800000;
+		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+		tmp &= ~0x3ff;
+		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+	} else {
+		tmp = RREG32(mmVCE_CLOCK_GATING_A);
+		tmp &= ~VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+		WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+		tmp |= 0x1ff000;
+		tmp &= ~0xff800000;
+		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+		tmp |= 0x3ff;
+		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+	}
+}
+
+static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
+
+	adev->vce.num_rings = 2;
+
+	vce_v1_0_set_ring_funcs(adev);
+	vce_v1_0_set_irq_funcs(adev);
+
+	return 0;
+}
+
+static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	struct amdgpu_ring *ring;
+	int r, i;
+
+	r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY, 167, &adev->vce.irq);
+	if (r)
+		return r;
+
+	r = amdgpu_vce_sw_init(adev, VCE_V1_0_FW_SIZE +
+		VCE_V1_0_STACK_SIZE + VCE_V1_0_DATA_SIZE);
+	if (r)
+		return r;
+
+	r = amdgpu_vce_resume(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		enum amdgpu_ring_priority_level hw_prio = amdgpu_vce_get_ring_prio(i);
+
+		ring = &adev->vce.ring[i];
+		sprintf(ring->name, "vce%d", i);
+		r = amdgpu_ring_init(adev, ring, 512, &adev->vce.irq, 0,
+				     hw_prio, NULL);
+		if (r)
+			return r;
+	}
+
+	return r;
+}
+
+static int vce_v1_0_sw_fini(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_suspend(adev);
+	if (r)
+		return r;
+
+	return amdgpu_vce_sw_fini(adev);
+}
+
+/**
+ * vce_v1_0_hw_init - start and test VCE block
+ *
+ * @ip_block: Pointer to the amdgpu_ip_block for this hw instance.
+ *
+ * Initialize the hardware, boot up the VCPU and do some testing
+ */
+static int vce_v1_0_hw_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int i, r;
+
+	if (adev->pm.dpm_enabled)
+		amdgpu_dpm_enable_vce(adev, true);
+	else
+		amdgpu_asic_set_vce_clocks(adev, 10000, 10000);
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		r = amdgpu_ring_test_helper(&adev->vce.ring[i]);
+		if (r)
+			return r;
+	}
+
+	dev_info(adev->dev, "VCE initialized successfully.\n");
+
+	return 0;
+}
+
+static int vce_v1_0_hw_fini(struct amdgpu_ip_block *ip_block)
+{
+	int r;
+
+	r = vce_v1_0_stop(ip_block->adev);
+	if (r)
+		return r;
+
+	cancel_delayed_work_sync(&ip_block->adev->vce.idle_work);
+	return 0;
+}
+
+static int vce_v1_0_suspend(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	/*
+	 * Proper cleanups before halting the HW engine:
+	 *   - cancel the delayed idle work
+	 *   - enable powergating
+	 *   - enable clockgating
+	 *   - disable dpm
+	 *
+	 * TODO: to align with the VCN implementation, move the
+	 * jobs for clockgating/powergating/dpm setting to
+	 * ->set_powergating_state().
+	 */
+	cancel_delayed_work_sync(&adev->vce.idle_work);
+
+	if (adev->pm.dpm_enabled) {
+		amdgpu_dpm_enable_vce(adev, false);
+	} else {
+		amdgpu_asic_set_vce_clocks(adev, 0, 0);
+		amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
+						       AMD_PG_STATE_GATE);
+		amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
+						       AMD_CG_STATE_GATE);
+	}
+
+	r = vce_v1_0_hw_fini(ip_block);
+	if (r) {
+		dev_err(adev->dev, "vce_v1_0_hw_fini() failed with error %i", r);
+		return r;
+	}
+
+	return amdgpu_vce_suspend(adev);
+}
+
+static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_resume(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+
+	return vce_v1_0_hw_init(ip_block);
+}
+
+static int vce_v1_0_set_interrupt_state(struct amdgpu_device *adev,
+					struct amdgpu_irq_src *source,
+					unsigned int type,
+					enum amdgpu_interrupt_state state)
+{
+	uint32_t val = 0;
+
+	if (state == AMDGPU_IRQ_STATE_ENABLE)
+		val |= VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK;
+
+	WREG32_P(mmVCE_SYS_INT_EN, val,
+		 ~VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK);
+	return 0;
+}
+
+static int vce_v1_0_process_interrupt(struct amdgpu_device *adev,
+				      struct amdgpu_irq_src *source,
+				      struct amdgpu_iv_entry *entry)
+{
+	dev_dbg(adev->dev, "IH: VCE\n");
+	switch (entry->src_data[0]) {
+	case 0:
+	case 1:
+		amdgpu_fence_process(&adev->vce.ring[entry->src_data[0]]);
+		break;
+	default:
+		dev_err(adev->dev, "Unhandled interrupt: %d %d\n",
+			  entry->src_id, entry->src_data[0]);
+		break;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_set_clockgating_state(struct amdgpu_ip_block *ip_block,
+					  enum amd_clockgating_state state)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+
+	vce_v1_0_init_cg(adev);
+	vce_v1_0_enable_mgcg(adev, state == AMD_CG_STATE_GATE);
+
+	return 0;
+}
+
+static int vce_v1_0_set_powergating_state(struct amdgpu_ip_block *ip_block,
+					  enum amd_powergating_state state)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+
+	/* This doesn't actually powergate the VCE block.
+	 * That's done in the dpm code via the SMC.  This
+	 * just re-inits the block as necessary.  The actual
+	 * gating still happens in the dpm code.  We should
+	 * revisit this when there is a cleaner line between
+	 * the smc and the hw blocks
+	 */
+	if (state == AMD_PG_STATE_GATE)
+		return vce_v1_0_stop(adev);
+	else
+		return vce_v1_0_start(adev);
+}
+
+static const struct amd_ip_funcs vce_v1_0_ip_funcs = {
+	.name = "vce_v1_0",
+	.early_init = vce_v1_0_early_init,
+	.sw_init = vce_v1_0_sw_init,
+	.sw_fini = vce_v1_0_sw_fini,
+	.hw_init = vce_v1_0_hw_init,
+	.hw_fini = vce_v1_0_hw_fini,
+	.suspend = vce_v1_0_suspend,
+	.resume = vce_v1_0_resume,
+	.is_idle = vce_v1_0_is_idle,
+	.wait_for_idle = vce_v1_0_wait_for_idle,
+	.set_clockgating_state = vce_v1_0_set_clockgating_state,
+	.set_powergating_state = vce_v1_0_set_powergating_state,
+};
+
+static const struct amdgpu_ring_funcs vce_v1_0_ring_funcs = {
+	.type = AMDGPU_RING_TYPE_VCE,
+	.align_mask = 0xf,
+	.nop = VCE_CMD_NO_OP,
+	.support_64bit_ptrs = false,
+	.no_user_fence = true,
+	.get_rptr = vce_v1_0_ring_get_rptr,
+	.get_wptr = vce_v1_0_ring_get_wptr,
+	.set_wptr = vce_v1_0_ring_set_wptr,
+	.parse_cs = amdgpu_vce_ring_parse_cs,
+	.emit_frame_size = 6, /* amdgpu_vce_ring_emit_fence  x1 no user fence */
+	.emit_ib_size = 4, /* amdgpu_vce_ring_emit_ib */
+	.emit_ib = amdgpu_vce_ring_emit_ib,
+	.emit_fence = amdgpu_vce_ring_emit_fence,
+	.test_ring = amdgpu_vce_ring_test_ring,
+	.test_ib = amdgpu_vce_ring_test_ib,
+	.insert_nop = amdgpu_ring_insert_nop,
+	.pad_ib = amdgpu_ring_generic_pad_ib,
+	.begin_use = amdgpu_vce_ring_begin_use,
+	.end_use = amdgpu_vce_ring_end_use,
+};
+
+static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev)
+{
+	int i;
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		adev->vce.ring[i].funcs = &vce_v1_0_ring_funcs;
+		adev->vce.ring[i].me = i;
+	}
+};
+
+static const struct amdgpu_irq_src_funcs vce_v1_0_irq_funcs = {
+	.set = vce_v1_0_set_interrupt_state,
+	.process = vce_v1_0_process_interrupt,
+};
+
+static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev)
+{
+	adev->vce.irq.num_types = 1;
+	adev->vce.irq.funcs = &vce_v1_0_irq_funcs;
+};
+
+const struct amdgpu_ip_block_version vce_v1_0_ip_block = {
+	.type = AMD_IP_BLOCK_TYPE_VCE,
+	.major = 1,
+	.minor = 0,
+	.rev = 0,
+	.funcs = &vce_v1_0_ip_funcs,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
new file mode 100644
index 000000000000..206e7bec897f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ * Copyright 2025 Valve Corporation
+ * Copyright 2025 Alexandre Demers
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __VCE_V1_0_H__
+#define __VCE_V1_0_H__
+
+extern const struct amdgpu_ip_block_version vce_v1_0_ip_block;
+
+#endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (9 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:41   ` Christian König
  2025-10-28 22:06 ` [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Based on research carried out by Alexandre and Christian.

VCE1 actually executes its code from the VCPU BO.
Due to various hardware limitations, the VCE1 requires
the VCPU BO to be in the low 32 bit address range.
However, VRAM is typically mapped at the high address range,
which means the VCPU can't access VRAM through the FB aperture.

To solve this, we write a few page table entries to
map the VCPU BO in the GART address range. And we make sure
that the GART is located at the low address range.
That way the VCE1 can access the VCPU BO.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 44 +++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index e62fd8ed1992..27f70146293d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -34,6 +34,7 @@
 
 #include "amdgpu.h"
 #include "amdgpu_vce.h"
+#include "amdgpu_gart.h"
 #include "sid.h"
 #include "vce_v1_0.h"
 #include "vce/vce_1_0_d.h"
@@ -46,6 +47,11 @@
 #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
 #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
 
+#define VCE_V1_0_GART_PAGE_START \
+	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
+#define VCE_V1_0_GART_ADDR_START \
+	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
+
 static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
 static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
 
@@ -535,6 +541,38 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
 	return 0;
 }
 
+/**
+ * vce_v1_0_ensure_vcpu_bo_32bit_addr() - ensure the VCPU BO has a 32-bit address
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Due to various hardware limitations, the VCE1 requires
+ * the VCPU BO to be in the low 32 bit address range.
+ * Ensure that the VCPU BO has a 32-bit GPU address,
+ * or return an error code when that isn't possible.
+ */
+static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
+{
+	const u64 gpu_addr = amdgpu_bo_gpu_offset(adev->vce.vcpu_bo);
+	const u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
+	const u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
+
+	/* Check if the VCPU BO already has a 32-bit address.
+	 * Eg. if MC is configured to put VRAM in the low address range.
+	 */
+	if (gpu_addr <= max_vcpu_bo_addr)
+		return 0;
+
+	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
+	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
+		return -EINVAL;
+
+	amdgpu_gart_bind_vram_bo(adev, VCE_V1_0_GART_ADDR_START, adev->vce.vcpu_bo,
+		AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID);
+	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
+		return 0;
+}
+
 static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -554,6 +592,9 @@ static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
 	if (r)
 		return r;
 	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
 	if (r)
 		return r;
 
@@ -669,6 +710,9 @@ static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
 	if (r)
 		return r;
 	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
 	if (r)
 		return r;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (10 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:47   ` Christian König
  2025-10-28 22:06 ` [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
  2025-10-28 22:06 ` [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better Timur Kristóf
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

On SI GPUs, the SMC needs to be aware of whether or not the VCE1
is used. The VCE1 is enabled/disabled through the DPM code.

Also print VCE clocks in amdgpu_pm_info.
Users can inspect the current power state using:
cat /sys/kernel/debug/dri/<card>/amdgpu_pm_info

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index 3a9522c17fee..bf7ab93b265d 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7051,13 +7051,20 @@ static void si_set_vce_clock(struct amdgpu_device *adev,
 	if ((old_rps->evclk != new_rps->evclk) ||
 	    (old_rps->ecclk != new_rps->ecclk)) {
 		/* Turn the clocks on when encoding, off otherwise */
+		dev_dbg(adev->dev, "set VCE clocks: %u, %u\n", new_rps->evclk, new_rps->ecclk);
+
 		if (new_rps->evclk || new_rps->ecclk) {
-			/* Place holder for future VCE1.0 porting to amdgpu
-			vce_v1_0_enable_mgcg(adev, false, false);*/
+			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);
+			amdgpu_device_ip_set_clockgating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_UNGATE);
+			amdgpu_device_ip_set_powergating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_UNGATE);
 		} else {
-			/* Place holder for future VCE1.0 porting to amdgpu
-			vce_v1_0_enable_mgcg(adev, true, false);
-			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);*/
+			amdgpu_device_ip_set_powergating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_GATE);
+			amdgpu_device_ip_set_clockgating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_GATE);
+			amdgpu_asic_set_vce_clocks(adev, 0, 0);
 		}
 	}
 }
@@ -7582,6 +7589,7 @@ static void si_dpm_debugfs_print_current_performance_level(void *handle,
 	} else {
 		pl = &ps->performance_levels[current_index];
 		seq_printf(m, "uvd    vclk: %d dclk: %d\n", rps->vclk, rps->dclk);
+		seq_printf(m, "vce    evclk: %d ecclk: %d\n", rps->evclk, rps->ecclk);
 		seq_printf(m, "power level %d    sclk: %u mclk: %u vddc: %u vddci: %u pcie gen: %u\n",
 			   current_index, pl->sclk, pl->mclk, pl->vddc, pl->vddci, pl->pcie_gen + 1);
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (11 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 11:51   ` Christian König
  2025-10-28 22:06 ` [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better Timur Kristóf
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Add the VCE1 IP block to the SI GPUs that have it.
Advertise the encoder capabilities corresponding to VCE1,
so the userspace applications can detect and use it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Co-developed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/si.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 9468c03bdb1b..f7b35b860ba3 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -45,6 +45,7 @@
 #include "dce_v6_0.h"
 #include "si.h"
 #include "uvd_v3_1.h"
+#include "vce_v1_0.h"
 
 #include "uvd/uvd_4_0_d.h"
 
@@ -921,8 +922,6 @@ static const u32 hainan_mgcg_cgcg_init[] =
 	0x3630, 0xfffffff0, 0x00000100,
 };
 
-/* XXX: update when we support VCE */
-#if 0
 /* tahiti, pitcairn, verde */
 static const struct amdgpu_video_codec_info tahiti_video_codecs_encode_array[] =
 {
@@ -940,13 +939,7 @@ static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
 	.codec_count = ARRAY_SIZE(tahiti_video_codecs_encode_array),
 	.codec_array = tahiti_video_codecs_encode_array,
 };
-#else
-static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
-{
-	.codec_count = 0,
-	.codec_array = NULL,
-};
-#endif
+
 /* oland and hainan don't support encode */
 static const struct amdgpu_video_codecs hainan_video_codecs_encode =
 {
@@ -2723,7 +2716,7 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
 		else
 			amdgpu_device_ip_block_add(adev, &dce_v6_0_ip_block);
 		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
-		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
+		amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block);
 		break;
 	case CHIP_OLAND:
 		amdgpu_device_ip_block_add(adev, &si_common_ip_block);
@@ -2741,7 +2734,6 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
 		else
 			amdgpu_device_ip_block_add(adev, &dce_v6_4_ip_block);
 		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
-		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
 		break;
 	case CHIP_HAINAN:
 		amdgpu_device_ip_block_add(adev, &si_common_ip_block);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better
  2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
                   ` (12 preceding siblings ...)
  2025-10-28 22:06 ` [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
@ 2025-10-28 22:06 ` Timur Kristóf
  2025-10-29 12:02   ` Christian König
  13 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-28 22:06 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Timur Kristóf,
	Alexandre Demers, Rodrigo Siqueira

Sometimes the VCE PLL times out while we are programming it.
When it happens, the VCE still works, but much slower.
Observed on some Tahiti boards, but not all:
- FirePro W9000 has the issue
- Radeon R9 280X not affected
- Radeon HD 7990 not affected

Continue the complete VCE PLL programming sequence even when
it timed out. With this, the VCE will work fine and faster
after the timeout happened.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/si.c       |  6 +-----
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 10 +++++++++-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index f7b35b860ba3..ed3d4f9bf9d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -1902,7 +1902,7 @@ static int si_vce_send_vcepll_ctlreq(struct amdgpu_device *adev)
 	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0, ~UPLL_CTLREQ_MASK);
 
 	if (i == SI_MAX_CTLACKS_ASSERTION_WAIT) {
-		DRM_ERROR("Timeout setting VCE clocks!\n");
+		DRM_WARN("Timeout setting VCE clocks!\n");
 		return -ETIMEDOUT;
 	}
 
@@ -1954,8 +1954,6 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
 	mdelay(1);
 
 	r = si_vce_send_vcepll_ctlreq(adev);
-	if (r)
-		return r;
 
 	/* Assert VCEPLL_RESET again */
 	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, VCEPLL_RESET_MASK, ~VCEPLL_RESET_MASK);
@@ -1988,8 +1986,6 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
 	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0, ~VCEPLL_BYPASS_EN_MASK);
 
 	r = si_vce_send_vcepll_ctlreq(adev);
-	if (r)
-		return r;
 
 	/* Switch VCLK and DCLK selection */
 	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL_2,
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index 27f70146293d..fdc455797258 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -401,7 +401,7 @@ static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block)
 static int vce_v1_0_start(struct amdgpu_device *adev)
 {
 	struct amdgpu_ring *ring;
-	int r;
+	int r, i;
 
 	WREG32_P(mmVCE_STATUS, 1, ~1);
 
@@ -443,6 +443,14 @@ static int vce_v1_0_start(struct amdgpu_device *adev)
 	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
 	WREG32(mmVCE_STATUS, 0);
 
+	/* Wait for VCE_STATUS to actually clear.
+	 * This helps when there was a timeout setting the VCE clocks.
+	 */
+	for (i = 0; i < adev->usec_timeout && RREG32(mmVCE_STATUS); ++i) {
+		udelay(1);
+		WREG32(mmVCE_STATUS, 0);
+	}
+
 	if (r) {
 		dev_err(adev->dev, "VCE not responding, giving up!!!\n");
 		return r;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT
  2025-10-28 22:06 ` [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT Timur Kristóf
@ 2025-10-29 10:00   ` Christian König
  2025-10-29 11:41     ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 10:00 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira
  Cc: Pelloux-Prayer, Pierre-Eric

On 10/28/25 23:06, Timur Kristóf wrote:
> GART contains some pages in its address space that come before
> the GTT and are used for BO copies.
> 
> Instead of hardcoding the size of the GART space before GTT,
> make it a field in the amdgpu_gmc struct. This allows us to map
> more things in GART before GTT.
> 
> Split this into a separate patch to make it easier to bisect,
> in case there are any errors in the future.

Pierre-Eric has been working on something similar.

On the newer HW generations we need more transfer windows since we want to utilize more DMA engines for copies and clears.

My suggestion is that we just make AMDGPU_GTT_NUM_TRANSFER_WINDOWS depend on adev and so the HW generation and then reserve one extra transfer window for this workaround on SI.

Regards,
Christian.

> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     | 2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h     | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 +-
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 97b562a79ea8..bf31bd022d6d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -325,6 +325,8 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,
>  		break;
>  	}
>  
> +	mc->num_gart_pages_before_gtt =
> +		AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
>  	mc->gart_start &= ~(four_gb - 1);
>  	mc->gart_end = mc->gart_start + mc->gart_size - 1;
>  	dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index 55097ca10738..568eed3eb557 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -266,6 +266,7 @@ struct amdgpu_gmc {
>  	u64			fb_end;
>  	unsigned		vram_width;
>  	u64			real_vram_size;
> +	u32			num_gart_pages_before_gtt;
>  	int			vram_mtrr;
>  	u64                     mc_mask;
>  	const struct firmware   *fw;	/* MC firmware */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 0760e70402ec..4c2563a70c2b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -283,7 +283,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
>  
>  	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
>  
> -	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
> +	start = adev->gmc.num_gart_pages_before_gtt;
>  	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
>  	drm_mm_init(&mgr->mm, start, size);
>  	spin_lock_init(&mgr->lock);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range
  2025-10-28 22:06 ` [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
@ 2025-10-29 10:00   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 10:00 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Instead of using a best-fit algorithm to determine which part
> of the VMID 0 address space to use for GART, always use the low
> address range.
> 
> A subsequent commit will use this to map the VCPU BO in GART
> for the VCE1 IP block.
> 
> Split this into	a separate patch to make it easier to bisect,
> in case	there are any errors in	the future.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index f6ad7911f1e6..499dfd78092d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -213,7 +213,7 @@ static void gmc_v6_0_vram_gtt_location(struct amdgpu_device *adev,
>  
>  	amdgpu_gmc_set_agp_default(adev, mc);
>  	amdgpu_gmc_vram_location(adev, mc, base);
> -	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_BEST_FIT);
> +	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW);
>  }
>  
>  static void gmc_v6_0_mc_program(struct amdgpu_device *adev)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO
  2025-10-28 22:06 ` [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO Timur Kristóf
@ 2025-10-29 10:05   ` Christian König
  2025-10-29 11:26     ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 10:05 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Add an extra 16M (4096 pages) to the GART before GTT.
> This space is going to be used for the VCE VCPU BO.
> 
> Split this into	a separate patch to make it easier to bisect,
> in case	there are any errors in	the future.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index 499dfd78092d..bfeb60cfbf62 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -214,6 +214,9 @@ static void gmc_v6_0_vram_gtt_location(struct amdgpu_device *adev,
>  	amdgpu_gmc_set_agp_default(adev, mc);
>  	amdgpu_gmc_vram_location(adev, mc, base);
>  	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW);
> +
> +	/* Add space for VCE's VCPU BO so that VCE1 can access it. */
> +	mc->num_gart_pages_before_gtt += 4096;

4096*4KiB=16MiB. Do we really need so much?

>  }
>  
>  static void gmc_v6_0_mc_program(struct amdgpu_device *adev)
> @@ -338,7 +341,7 @@ static int gmc_v6_0_mc_init(struct amdgpu_device *adev)
>  		case CHIP_TAHITI:   /* UVD, VCE do not support GPUVM */
>  		case CHIP_PITCAIRN: /* UVD, VCE do not support GPUVM */
>  		case CHIP_OLAND:    /* UVD, VCE do not support GPUVM */
> -			adev->gmc.gart_size = 1024ULL << 20;
> +			adev->gmc.gart_size = 1040ULL << 20;

Ideally that should be a power of two.

We can in theory increase it in units of 2MiB without wasting memory, but I'm not 100% sure if that is actually tested everywhere.

Regards,
Christian.

>  			break;
>  		}
>  	} else {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO
  2025-10-28 22:06 ` [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO Timur Kristóf
@ 2025-10-29 10:16   ` Christian König
  2025-10-29 10:57     ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 10:16 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira



On 10/28/25 23:06, Timur Kristóf wrote:
> Binds a BO that is allocated in VRAM to the GART page table.
> 
> Useful when a kernel BO is located in VRAM but
> needs to be accessed from the GART address space,
> for example to give a kernel BO a 32-bit address
> when GART is placed in LOW address space.
> 
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 41 ++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h |  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 83f3b94ed975..19b5e72a6a26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -390,6 +390,47 @@ void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
>  	amdgpu_gart_map(adev, offset, pages, dma_addr, flags, adev->gart.ptr);
>  }
>  
> +/**
> + * amdgpu_gart_bind - bind VRAM BO into the GART page table

That should be the function name or otherwise you get automated warnings.

> + *
> + * @adev: amdgpu_device pointer
> + * @offset: offset into the GPU's gart aperture
> + * @bo: the BO whose pages should be mapped
> + * @flags: page table entry flags
> + *
> + * Binds a BO that is allocated in VRAM to the GART page table
> + * (all ASICs).
> + * Useful when a kernel BO is located in VRAM but
> + * needs to be accessed from the GART address space,
> + * for example to give a kernel BO a 32-bit address
> + * when GART is placed in LOW address space.
> + */
> +void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t offset,
> +		     struct amdgpu_bo *bo, uint64_t flags)

Please not the BO but just the VRAM pa.

> +{
> +	u64 pa, bo_size;
> +	u32 num_pages, start_page, i, idx;
> +
> +	if (!adev->gart.ptr)
> +		return;
> +
> +	if (!drm_dev_enter(adev_to_drm(adev), &idx))
> +		return;
> +
> +	pa = amdgpu_gmc_vram_pa(adev, bo);
> +	bo_size = amdgpu_bo_size(bo);
> +	num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
> +	start_page = offset / AMDGPU_GPU_PAGE_SIZE;
> +
> +	for (i = 0; i < num_pages; ++i) {
> +		amdgpu_gmc_set_pte_pde(adev, adev->gart.ptr,
> +			start_page + i, pa + AMDGPU_GPU_PAGE_SIZE * i, flags);
> +	}
> +

Ideally amdgpu_gart_map() should be able to take both dma_addr array or VRAM pa (or have two map functions).

This way we could cleanup the code in amdgpu_ttm_map_buffer as well.


> +	amdgpu_gart_invalidate_tlb(adev);

IIRC we moved that out of amdgpu_gart_bind(), probably best to do so here as well.

Regards,
Christian.

> +	drm_dev_exit(idx);
> +}
> +
>  /**
>   * amdgpu_gart_invalidate_tlb - invalidate gart TLB
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> index 7cc980bf4725..756548d0b520 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> @@ -64,5 +64,7 @@ void amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
>  		     void *dst);
>  void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
>  		      int pages, dma_addr_t *dma_addr, uint64_t flags);
> +void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t offset,
> +		     struct amdgpu_bo *bo, uint64_t flags);
>  void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
>  #endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it
  2025-10-28 22:06 ` [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it Timur Kristóf
@ 2025-10-29 10:19   ` Christian König
  2025-10-29 10:48     ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 10:19 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> The VCPU BO doesn't only contain the VCE firmware but also other
> ranges that the VCE uses for its stack and data. Let's initialize
> this to zero to avoid having garbage in the VCPU BO.

Absolutely clear NAK.

This is intentionally not initialized on resume to avoid breaking encode sessions which existed before suspend.

Why exactly is that an issue? The VCE FW BO should be cleared to zero after initial allocation?

Regards,
Christian.

> 
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index b9060bcd4806..eaa06dbef5c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -310,6 +310,7 @@ int amdgpu_vce_resume(struct amdgpu_device *adev)
>  	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
>  
>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> +		memset32(cpu_addr, 0, amdgpu_bo_size(adev->vce.vcpu_bo) / 4);
>  		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
>  			    adev->vce.fw->size - offset);
>  		drm_dev_exit(idx);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  2025-10-28 22:06 ` [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
@ 2025-10-29 10:26   ` Christian König
  2025-10-29 17:16   ` Liu, Leo
  1 sibling, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 10:26 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Liu, Leo

On 10/28/25 23:06, Timur Kristóf wrote:
> Try to load the VCE firmware at early_init.
> 
> When the correct firmware is not found, return -ENOENT.
> This way, the driver initialization will complete even
> without VCE, and the GPU will be functional, albeit
> without video encoding capabilities.
> 
> This is necessary because we are planning to add support
> for the VCE1, and AMD hasn't yet publised the correct
> firmware for this version. So we need to anticipate that
> users will try to boot amdgpu on SI GPUs without the
> correct VCE1 firmware present on their system.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Looks reasonable to me, but Leo and his team should probably take a look as well.

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 121 +++++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
>  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c   |   5 +
>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c   |   5 +
>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   |   5 +
>  5 files changed, 91 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index eaa06dbef5c4..b23a48a1efc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -88,82 +88,87 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>  				      bool direct, struct dma_fence **fence);
>  
>  /**
> - * amdgpu_vce_sw_init - allocate memory, load vce firmware
> + * amdgpu_vce_firmware_name() - determine the firmware file name for VCE
>   *
>   * @adev: amdgpu_device pointer
> - * @size: size for the new BO
>   *
> - * First step to get VCE online, allocate memory and load the firmware
> + * Each chip that has VCE IP may need a different firmware.
> + * This function returns the name of the VCE firmware file
> + * appropriate for the current chip.
>   */
> -int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
> +static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
>  {
> -	const char *fw_name;
> -	const struct common_firmware_header *hdr;
> -	unsigned int ucode_version, version_major, version_minor, binary_id;
> -	int i, r;
> -
>  	switch (adev->asic_type) {
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>  	case CHIP_BONAIRE:
> -		fw_name = FIRMWARE_BONAIRE;
> -		break;
> +		return FIRMWARE_BONAIRE;
>  	case CHIP_KAVERI:
> -		fw_name = FIRMWARE_KAVERI;
> -		break;
> +		return FIRMWARE_KAVERI;
>  	case CHIP_KABINI:
> -		fw_name = FIRMWARE_KABINI;
> -		break;
> +		return FIRMWARE_KABINI;
>  	case CHIP_HAWAII:
> -		fw_name = FIRMWARE_HAWAII;
> -		break;
> +		return FIRMWARE_HAWAII;
>  	case CHIP_MULLINS:
> -		fw_name = FIRMWARE_MULLINS;
> -		break;
> +		return FIRMWARE_MULLINS;
>  #endif
>  	case CHIP_TONGA:
> -		fw_name = FIRMWARE_TONGA;
> -		break;
> +		return  FIRMWARE_TONGA;
>  	case CHIP_CARRIZO:
> -		fw_name = FIRMWARE_CARRIZO;
> -		break;
> +		return  FIRMWARE_CARRIZO;
>  	case CHIP_FIJI:
> -		fw_name = FIRMWARE_FIJI;
> -		break;
> +		return  FIRMWARE_FIJI;
>  	case CHIP_STONEY:
> -		fw_name = FIRMWARE_STONEY;
> -		break;
> +		return  FIRMWARE_STONEY;
>  	case CHIP_POLARIS10:
> -		fw_name = FIRMWARE_POLARIS10;
> -		break;
> +		return  FIRMWARE_POLARIS10;
>  	case CHIP_POLARIS11:
> -		fw_name = FIRMWARE_POLARIS11;
> -		break;
> +		return  FIRMWARE_POLARIS11;
>  	case CHIP_POLARIS12:
> -		fw_name = FIRMWARE_POLARIS12;
> -		break;
> +		return  FIRMWARE_POLARIS12;
>  	case CHIP_VEGAM:
> -		fw_name = FIRMWARE_VEGAM;
> -		break;
> +		return  FIRMWARE_VEGAM;
>  	case CHIP_VEGA10:
> -		fw_name = FIRMWARE_VEGA10;
> -		break;
> +		return  FIRMWARE_VEGA10;
>  	case CHIP_VEGA12:
> -		fw_name = FIRMWARE_VEGA12;
> -		break;
> +		return  FIRMWARE_VEGA12;
>  	case CHIP_VEGA20:
> -		fw_name = FIRMWARE_VEGA20;
> -		break;
> +		return  FIRMWARE_VEGA20;
>  
>  	default:
> -		return -EINVAL;
> +		return NULL;
>  	}
> +}
> +
> +/**
> + * amdgpu_vce_early_init() - try to load VCE firmware
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Tries to load the VCE firmware.
> + *
> + * When not found, returns ENOENT so that the driver can
> + * still load and initialize the rest of the IP blocks.
> + * The GPU can function just fine without VCE, they will just
> + * not support video encoding.
> + */
> +int amdgpu_vce_early_init(struct amdgpu_device *adev)
> +{
> +	const char *fw_name = amdgpu_vce_firmware_name(adev);
> +	const struct common_firmware_header *hdr;
> +	unsigned int ucode_version, version_major, version_minor, binary_id;
> +	int r;
> +
> +	if (!fw_name)
> +		return -ENOENT;
>  
>  	r = amdgpu_ucode_request(adev, &adev->vce.fw, AMDGPU_UCODE_REQUIRED, "%s", fw_name);
>  	if (r) {
> -		dev_err(adev->dev, "amdgpu_vce: Can't validate firmware \"%s\"\n",
> -			fw_name);
> +		dev_err(adev->dev,
> +			"amdgpu_vce: Firmware \"%s\" not found or failed to validate (%d)\n",
> +			fw_name, r);
> +
>  		amdgpu_ucode_release(&adev->vce.fw);
> -		return r;
> +		return -ENOENT;
>  	}
>  
>  	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
> @@ -172,11 +177,35 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
>  	version_major = (ucode_version >> 20) & 0xfff;
>  	version_minor = (ucode_version >> 8) & 0xfff;
>  	binary_id = ucode_version & 0xff;
> -	DRM_INFO("Found VCE firmware Version: %d.%d Binary ID: %d\n",
> +	dev_info(adev->dev, "Found VCE firmware Version: %d.%d Binary ID: %d\n",
>  		version_major, version_minor, binary_id);
>  	adev->vce.fw_version = ((version_major << 24) | (version_minor << 16) |
>  				(binary_id << 8));
>  
> +	return 0;
> +}
> +
> +/**
> + * amdgpu_vce_sw_init() - allocate memory for VCE BO
> + *
> + * @adev: amdgpu_device pointer
> + * @size: size for the new BO
> + *
> + * First step to get VCE online: allocate memory for VCE BO.
> + * The VCE firmware binary is copied into the VCE BO later,
> + * in amdgpu_vce_resume. The VCE executes its code from the
> + * VCE BO and also uses the space in this BO for its stack and data.
> + *
> + * Ideally this BO should be placed in VRAM for optimal performance,
> + * although technically it also runs from system RAM (albeit slowly).
> + */
> +int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
> +{
> +	int i, r;
> +
> +	if (!adev->vce.fw)
> +		return -ENOENT;
> +
>  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>  				    AMDGPU_GEM_DOMAIN_VRAM |
>  				    AMDGPU_GEM_DOMAIN_GTT,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> index 6e53f872d084..22acd7b35945 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> @@ -53,6 +53,7 @@ struct amdgpu_vce {
>  	unsigned		num_rings;
>  };
>  
> +int amdgpu_vce_early_init(struct amdgpu_device *adev);
>  int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size);
>  int amdgpu_vce_sw_fini(struct amdgpu_device *adev);
>  int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> index bee3e904a6bc..8ea8a6193492 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> @@ -407,6 +407,11 @@ static void vce_v2_0_enable_mgcg(struct amdgpu_device *adev, bool enable,
>  static int vce_v2_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	r = amdgpu_vce_early_init(adev);
> +	if (r)
> +		return r;
>  
>  	adev->vce.num_rings = 2;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> index 708123899c41..719e9643c43d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> @@ -399,6 +399,7 @@ static unsigned vce_v3_0_get_harvest_config(struct amdgpu_device *adev)
>  static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
>  
>  	adev->vce.harvest_config = vce_v3_0_get_harvest_config(adev);
>  
> @@ -407,6 +408,10 @@ static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
>  	    (AMDGPU_VCE_HARVEST_VCE0 | AMDGPU_VCE_HARVEST_VCE1))
>  		return -ENOENT;
>  
> +	r = amdgpu_vce_early_init(adev);
> +	if (r)
> +		return r;
> +
>  	adev->vce.num_rings = 3;
>  
>  	vce_v3_0_set_ring_funcs(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> index 335bda64ff5b..2d64002bed61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> @@ -410,6 +410,11 @@ static int vce_v4_0_stop(struct amdgpu_device *adev)
>  static int vce_v4_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	r = amdgpu_vce_early_init(adev);
> +	if (r)
> +		return r;
>  
>  	if (amdgpu_sriov_vf(adev)) /* currently only VCN0 support SRIOV */
>  		adev->vce.num_rings = 1;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs
  2025-10-28 22:06 ` [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs Timur Kristóf
@ 2025-10-29 10:35   ` Christian König
  2025-10-29 10:54     ` [PATCH 07/14] drm/amdgpu/si,cik,vi: " Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 10:35 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira



On 10/28/25 23:06, Timur Kristóf wrote:
> Some harvested chips may not have any IP blocks,
> or we may not have the firmware for the IP blocks.
> In these cases, the query should return that no video
> codec is supported.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/cik.c        | 6 ++++++
>  drivers/gpu/drm/amd/amdgpu/si.c         | 6 ++++++
>  drivers/gpu/drm/amd/amdgpu/vi.c         | 6 ++++++
>  4 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b3e6b3fcdf2c..42b5da59d00f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1263,7 +1263,8 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>  			-EFAULT : 0;
>  	}
>  	case AMDGPU_INFO_VIDEO_CAPS: {
> -		const struct amdgpu_video_codecs *codecs;
> +		static const struct amdgpu_video_codecs no_codecs = {0};

No zero init for static variables please, that will raise you a constant checker warning.

> +		const struct amdgpu_video_codecs *codecs = &no_codecs;
>  		struct drm_amdgpu_info_video_caps *caps;
>  		int r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
> index 9cd63b4177bf..b755238c2c3d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik.c
> @@ -130,6 +130,12 @@ static const struct amdgpu_video_codecs cik_video_codecs_decode =
>  static int cik_query_video_codecs(struct amdgpu_device *adev, bool encode,
>  				  const struct amdgpu_video_codecs **codecs)
>  {
> +	const enum amd_ip_block_type ip =
> +		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
> +
> +	if (!amdgpu_device_ip_is_valid(adev, ip))
> +		return 0;

I'm wondering if returning EOPNOTSUPP is not more appropriate here than returning an empty cappability list.

Anyway setting the codecs list to empty in the caller is rather bad coding style.

Regards,
Christian.

> +
>  	switch (adev->asic_type) {
>  	case CHIP_BONAIRE:
>  	case CHIP_HAWAII:
> diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
> index e0f139de7991..9468c03bdb1b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> @@ -1003,6 +1003,12 @@ static const struct amdgpu_video_codecs hainan_video_codecs_decode =
>  static int si_query_video_codecs(struct amdgpu_device *adev, bool encode,
>  				 const struct amdgpu_video_codecs **codecs)
>  {
> +	const enum amd_ip_block_type ip =
> +		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
> +
> +	if (!amdgpu_device_ip_is_valid(adev, ip))
> +		return 0;
> +
>  	switch (adev->asic_type) {
>  	case CHIP_VERDE:
>  	case CHIP_TAHITI:
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index a611a7345125..f0e4193cf722 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -256,6 +256,12 @@ static const struct amdgpu_video_codecs cz_video_codecs_decode =
>  static int vi_query_video_codecs(struct amdgpu_device *adev, bool encode,
>  				 const struct amdgpu_video_codecs **codecs)
>  {
> +	const enum amd_ip_block_type ip =
> +		encode ? AMD_IP_BLOCK_TYPE_VCE : AMD_IP_BLOCK_TYPE_UVD;
> +
> +	if (!amdgpu_device_ip_is_valid(adev, ip))
> +		return 0;
> +
>  	switch (adev->asic_type) {
>  	case CHIP_TOPAZ:
>  		if (encode)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it
  2025-10-29 10:19   ` Christian König
@ 2025-10-29 10:48     ` Timur Kristóf
  0 siblings, 0 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 10:48 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Wed, 2025-10-29 at 11:19 +0100, Christian König wrote:
> On 10/28/25 23:06, Timur Kristóf wrote:
> > The VCPU BO doesn't only contain the VCE firmware but also other
> > ranges that the VCE uses for its stack and data. Let's initialize
> > this to zero to avoid having garbage in the VCPU BO.
> 
> Absolutely clear NAK.
> 
> This is intentionally not initialized on resume to avoid breaking
> encode sessions which existed before suspend.

How can there be encode sessions from before suspend?
I think that there can't be.

As far as I see, before suspend we wait for the VCE to go idle, meaning
that we wait for all pending work to finish.
amdgpu_vce_suspend has a comment which says:
suspending running encoding sessions isn't supported

> Why exactly is that an issue?

We need to clear at least some of the BO for the VCE1 firmware
validation mechanism. This is done in a memset in vce_v1_0_load_fw in
the old radeon driver.

Also I think it's a good idea to avoid having garbage in the VCPU BO.

> The VCE FW BO should be cleared to zero after initial allocation?

To clarify, are you suggesting that I move the memset to after the BO
creation, and then never clear it again? Or are you saying that
amdgpu_bo_create_reserved already clears the BO?

> 
> Regards,
> Christian.
> 
> > 
> > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > index b9060bcd4806..eaa06dbef5c4 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > @@ -310,6 +310,7 @@ int amdgpu_vce_resume(struct amdgpu_device
> > *adev)
> >  	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> >  
> >  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > +		memset32(cpu_addr, 0, amdgpu_bo_size(adev-
> > >vce.vcpu_bo) / 4);
> >  		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
> >  			    adev->vce.fw->size - offset);
> >  		drm_dev_exit(idx);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/14] drm/amdgpu/si,cik,vi: Verify IP block when querying video codecs
  2025-10-29 10:35   ` Christian König
@ 2025-10-29 10:54     ` Timur Kristóf
  0 siblings, 0 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 10:54 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Wed, 2025-10-29 at 11:35 +0100, Christian König wrote:
> 
> 
> On 10/28/25 23:06, Timur Kristóf wrote:
> > Some harvested chips may not have any IP blocks,
> > or we may not have the firmware for the IP blocks.
> > In these cases, the query should return that no video
> > codec is supported.
> > 
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 ++-
> >  drivers/gpu/drm/amd/amdgpu/cik.c        | 6 ++++++
> >  drivers/gpu/drm/amd/amdgpu/si.c         | 6 ++++++
> >  drivers/gpu/drm/amd/amdgpu/vi.c         | 6 ++++++
> >  4 files changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > index b3e6b3fcdf2c..42b5da59d00f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > @@ -1263,7 +1263,8 @@ int amdgpu_info_ioctl(struct drm_device *dev,
> > void *data, struct drm_file *filp)
> >  			-EFAULT : 0;
> >  	}
> >  	case AMDGPU_INFO_VIDEO_CAPS: {
> > -		const struct amdgpu_video_codecs *codecs;
> > +		static const struct amdgpu_video_codecs no_codecs
> > = {0};
> 
> No zero init for static variables please, that will raise you a
> constant checker warning.
> 
> > +		const struct amdgpu_video_codecs *codecs =
> > &no_codecs;
> >  		struct drm_amdgpu_info_video_caps *caps;
> >  		int r;
> >  
> > diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c
> > b/drivers/gpu/drm/amd/amdgpu/cik.c
> > index 9cd63b4177bf..b755238c2c3d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/cik.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/cik.c
> > @@ -130,6 +130,12 @@ static const struct amdgpu_video_codecs
> > cik_video_codecs_decode =
> >  static int cik_query_video_codecs(struct amdgpu_device *adev, bool
> > encode,
> >  				  const struct amdgpu_video_codecs
> > **codecs)
> >  {
> > +	const enum amd_ip_block_type ip =
> > +		encode ? AMD_IP_BLOCK_TYPE_VCE :
> > AMD_IP_BLOCK_TYPE_UVD;
> > +
> > +	if (!amdgpu_device_ip_is_valid(adev, ip))
> > +		return 0;
> 
> I'm wondering if returning EOPNOTSUPP is not more appropriate here
> than returning an empty cappability list.

I don't think so.

Returning EOPNOTSUPP would indicate that the operation of querying the
codec support is not supported, and not that the list of supported
codecs is empty.

> 
> Anyway setting the codecs list to empty in the caller is rather bad
> coding style.

Sure, I'll come up with a better way to do this.

> 
> Regards,
> Christian.
> 
> > +
> >  	switch (adev->asic_type) {
> >  	case CHIP_BONAIRE:
> >  	case CHIP_HAWAII:
> > diff --git a/drivers/gpu/drm/amd/amdgpu/si.c
> > b/drivers/gpu/drm/amd/amdgpu/si.c
> > index e0f139de7991..9468c03bdb1b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/si.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> > @@ -1003,6 +1003,12 @@ static const struct amdgpu_video_codecs
> > hainan_video_codecs_decode =
> >  static int si_query_video_codecs(struct amdgpu_device *adev, bool
> > encode,
> >  				 const struct amdgpu_video_codecs
> > **codecs)
> >  {
> > +	const enum amd_ip_block_type ip =
> > +		encode ? AMD_IP_BLOCK_TYPE_VCE :
> > AMD_IP_BLOCK_TYPE_UVD;
> > +
> > +	if (!amdgpu_device_ip_is_valid(adev, ip))
> > +		return 0;
> > +
> >  	switch (adev->asic_type) {
> >  	case CHIP_VERDE:
> >  	case CHIP_TAHITI:
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> > b/drivers/gpu/drm/amd/amdgpu/vi.c
> > index a611a7345125..f0e4193cf722 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > @@ -256,6 +256,12 @@ static const struct amdgpu_video_codecs
> > cz_video_codecs_decode =
> >  static int vi_query_video_codecs(struct amdgpu_device *adev, bool
> > encode,
> >  				 const struct amdgpu_video_codecs
> > **codecs)
> >  {
> > +	const enum amd_ip_block_type ip =
> > +		encode ? AMD_IP_BLOCK_TYPE_VCE :
> > AMD_IP_BLOCK_TYPE_UVD;
> > +
> > +	if (!amdgpu_device_ip_is_valid(adev, ip))
> > +		return 0;
> > +
> >  	switch (adev->asic_type) {
> >  	case CHIP_TOPAZ:
> >  		if (encode)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO
  2025-10-29 10:16   ` Christian König
@ 2025-10-29 10:57     ` Timur Kristóf
  0 siblings, 0 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 10:57 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Wed, 2025-10-29 at 11:16 +0100, Christian König wrote:
> 
> 
> On 10/28/25 23:06, Timur Kristóf wrote:
> > Binds a BO that is allocated in VRAM to the GART page table.
> > 
> > Useful when a kernel BO is located in VRAM but
> > needs to be accessed from the GART address space,
> > for example to give a kernel BO a 32-bit address
> > when GART is placed in LOW address space.
> > 
> > Co-developed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 41
> > ++++++++++++++++++++++++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h |  2 ++
> >  2 files changed, 43 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > index 83f3b94ed975..19b5e72a6a26 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > @@ -390,6 +390,47 @@ void amdgpu_gart_bind(struct amdgpu_device
> > *adev, uint64_t offset,
> >  	amdgpu_gart_map(adev, offset, pages, dma_addr, flags,
> > adev->gart.ptr);
> >  }
> >  
> > +/**
> > + * amdgpu_gart_bind - bind VRAM BO into the GART page table
> 
> That should be the function name or otherwise you get automated
> warnings.

That's a copy paste mistake on my part. Thanks for catching that.

> 
> > + *
> > + * @adev: amdgpu_device pointer
> > + * @offset: offset into the GPU's gart aperture
> > + * @bo: the BO whose pages should be mapped
> > + * @flags: page table entry flags
> > + *
> > + * Binds a BO that is allocated in VRAM to the GART page table
> > + * (all ASICs).
> > + * Useful when a kernel BO is located in VRAM but
> > + * needs to be accessed from the GART address space,
> > + * for example to give a kernel BO a 32-bit address
> > + * when GART is placed in LOW address space.
> > + */
> > +void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t
> > offset,
> > +		     struct amdgpu_bo *bo, uint64_t flags)
> 
> Please not the BO but just the VRAM pa.

Sure, will do

> 
> > +{
> > +	u64 pa, bo_size;
> > +	u32 num_pages, start_page, i, idx;
> > +
> > +	if (!adev->gart.ptr)
> > +		return;
> > +
> > +	if (!drm_dev_enter(adev_to_drm(adev), &idx))
> > +		return;
> > +
> > +	pa = amdgpu_gmc_vram_pa(adev, bo);
> > +	bo_size = amdgpu_bo_size(bo);
> > +	num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) /
> > AMDGPU_GPU_PAGE_SIZE;
> > +	start_page = offset / AMDGPU_GPU_PAGE_SIZE;
> > +
> > +	for (i = 0; i < num_pages; ++i) {
> > +		amdgpu_gmc_set_pte_pde(adev, adev->gart.ptr,
> > +			start_page + i, pa + AMDGPU_GPU_PAGE_SIZE
> > * i, flags);
> > +	}
> > +
> 
> Ideally amdgpu_gart_map() should be able to take both dma_addr array
> or VRAM pa (or have two map functions).
> 
> This way we could cleanup the code in amdgpu_ttm_map_buffer as well.

Alright, I'll rework this new function so that it can be reused also by
amdgpu_ttm_map_buffer, does that sound alright to you?

> 
> 
> > +	amdgpu_gart_invalidate_tlb(adev);
> 
> IIRC we moved that out of amdgpu_gart_bind(), probably best to do so
> here as well.

Sure, will do

> 
> Regards,
> Christian.
> 
> > +	drm_dev_exit(idx);
> > +}
> > +
> >  /**
> >   * amdgpu_gart_invalidate_tlb - invalidate gart TLB
> >   *
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> > index 7cc980bf4725..756548d0b520 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> > @@ -64,5 +64,7 @@ void amdgpu_gart_map(struct amdgpu_device *adev,
> > uint64_t offset,
> >  		     void *dst);
> >  void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
> >  		      int pages, dma_addr_t *dma_addr, uint64_t
> > flags);
> > +void amdgpu_gart_bind_vram_bo(struct amdgpu_device *adev, uint64_t
> > offset,
> > +		     struct amdgpu_bo *bo, uint64_t flags);
> >  void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
> >  #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions
  2025-10-28 22:06 ` [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
@ 2025-10-29 11:23   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 11:23 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> The sid.h header contained some VCE1 register definitions, but
> they were using byte offsets (probably copied from the old radeon
> driver). Move all of these to the proper VCE1 headers.
> 
> Also add the register definitions that we need for the
> firmware validation mechanism in VCE1.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/sid.h              | 40 -------------------
>  .../drm/amd/include/asic_reg/vce/vce_1_0_d.h  |  5 +++
>  .../include/asic_reg/vce/vce_1_0_sh_mask.h    | 10 +++++
>  3 files changed, 15 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sid.h b/drivers/gpu/drm/amd/amdgpu/sid.h
> index cbd4f8951cfa..561462a8332e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sid.h
> +++ b/drivers/gpu/drm/amd/amdgpu/sid.h
> @@ -582,45 +582,6 @@
>  #define	DMA_PACKET_NOP					  0xf
>  
>  /* VCE */
> -#define VCE_STATUS					0x20004
> -#define VCE_VCPU_CNTL					0x20014
> -#define		VCE_CLK_EN				(1 << 0)
> -#define VCE_VCPU_CACHE_OFFSET0				0x20024
> -#define VCE_VCPU_CACHE_SIZE0				0x20028
> -#define VCE_VCPU_CACHE_OFFSET1				0x2002c
> -#define VCE_VCPU_CACHE_SIZE1				0x20030
> -#define VCE_VCPU_CACHE_OFFSET2				0x20034
> -#define VCE_VCPU_CACHE_SIZE2				0x20038
> -#define VCE_SOFT_RESET					0x20120
> -#define 	VCE_ECPU_SOFT_RESET			(1 << 0)
> -#define 	VCE_FME_SOFT_RESET			(1 << 2)
> -#define VCE_RB_BASE_LO2					0x2016c
> -#define VCE_RB_BASE_HI2					0x20170
> -#define VCE_RB_SIZE2					0x20174
> -#define VCE_RB_RPTR2					0x20178
> -#define VCE_RB_WPTR2					0x2017c
> -#define VCE_RB_BASE_LO					0x20180
> -#define VCE_RB_BASE_HI					0x20184
> -#define VCE_RB_SIZE					0x20188
> -#define VCE_RB_RPTR					0x2018c
> -#define VCE_RB_WPTR					0x20190
> -#define VCE_CLOCK_GATING_A				0x202f8
> -#define VCE_CLOCK_GATING_B				0x202fc
> -#define VCE_UENC_CLOCK_GATING				0x205bc
> -#define VCE_UENC_REG_CLOCK_GATING			0x205c0
> -#define VCE_FW_REG_STATUS				0x20e10
> -#	define VCE_FW_REG_STATUS_BUSY			(1 << 0)
> -#	define VCE_FW_REG_STATUS_PASS			(1 << 3)
> -#	define VCE_FW_REG_STATUS_DONE			(1 << 11)
> -#define VCE_LMI_FW_START_KEYSEL				0x20e18
> -#define VCE_LMI_FW_PERIODIC_CTRL			0x20e20
> -#define VCE_LMI_CTRL2					0x20e74
> -#define VCE_LMI_CTRL					0x20e98
> -#define VCE_LMI_VM_CTRL					0x20ea0
> -#define VCE_LMI_SWAP_CNTL				0x20eb4
> -#define VCE_LMI_SWAP_CNTL1				0x20eb8
> -#define VCE_LMI_CACHE_CTRL				0x20ef4
> -
>  #define VCE_CMD_NO_OP					0x00000000
>  #define VCE_CMD_END					0x00000001
>  #define VCE_CMD_IB					0x00000002
> @@ -629,7 +590,6 @@
>  #define VCE_CMD_IB_AUTO					0x00000005
>  #define VCE_CMD_SEMAPHORE				0x00000006
>  
> -
>  //#dce stupp
>  /* display controller offsets used for crtc/cur/lut/grph/viewport/etc. */
>  #define CRTC0_REGISTER_OFFSET                 (0x1b7c - 0x1b7c) //(0x6df0 - 0x6df0)/4
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
> index 2176548e9203..9778822dd2a0 100644
> --- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
> +++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
> @@ -60,5 +60,10 @@
>  #define mmVCE_VCPU_CACHE_SIZE1 0x800C
>  #define mmVCE_VCPU_CACHE_SIZE2 0x800E
>  #define mmVCE_VCPU_CNTL 0x8005
> +#define mmVCE_VCPU_SCRATCH7 0x8037
> +#define mmVCE_FW_REG_STATUS 0x8384
> +#define mmVCE_LMI_FW_PERIODIC_CTRL 0x8388
> +#define mmVCE_LMI_FW_START_KEYSEL 0x8386
> +
>  
>  #endif
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
> index ea5b26b11cb1..1f82d6f5abde 100644
> --- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
> +++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
> @@ -61,6 +61,8 @@
>  #define VCE_RB_WPTR__RB_WPTR__SHIFT 0x00000004
>  #define VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK 0x00000001L
>  #define VCE_SOFT_RESET__ECPU_SOFT_RESET__SHIFT 0x00000000
> +#define VCE_SOFT_RESET__FME_SOFT_RESET_MASK 0x00000004L
> +#define VCE_SOFT_RESET__FME_SOFT_RESET__SHIFT 0x00000002
>  #define VCE_STATUS__JOB_BUSY_MASK 0x00000001L
>  #define VCE_STATUS__JOB_BUSY__SHIFT 0x00000000
>  #define VCE_STATUS__UENC_BUSY_MASK 0x00000100L
> @@ -95,5 +97,13 @@
>  #define VCE_VCPU_CNTL__CLK_EN__SHIFT 0x00000000
>  #define VCE_VCPU_CNTL__RBBM_SOFT_RESET_MASK 0x00040000L
>  #define VCE_VCPU_CNTL__RBBM_SOFT_RESET__SHIFT 0x00000012
> +#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK 0x00010000
> +#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_SHIFT 0x00000010
> +#define VCE_FW_REG_STATUS__BUSY_MASK 0x0000001
> +#define VCE_FW_REG_STATUS__BUSY__SHIFT 0x0000001
> +#define VCE_FW_REG_STATUS__PASS_MASK 0x0000008
> +#define VCE_FW_REG_STATUS__PASS__SHIFT 0x0000003
> +#define VCE_FW_REG_STATUS__DONE_MASK 0x0000800
> +#define VCE_FW_REG_STATUS__DONE__SHIFT 0x000000b
>  
>  #endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO
  2025-10-29 10:05   ` Christian König
@ 2025-10-29 11:26     ` Timur Kristóf
  0 siblings, 0 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 11:26 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Wed, 2025-10-29 at 11:05 +0100, Christian König wrote:
> On 10/28/25 23:06, Timur Kristóf wrote:
> > Add an extra 16M (4096 pages) to the GART before GTT.
> > This space is going to be used for the VCE VCPU BO.
> > 
> > Split this into	a separate patch to make it easier to bisect,
> > in case	there are any errors in	the future.
> > 
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> > index 499dfd78092d..bfeb60cfbf62 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> > @@ -214,6 +214,9 @@ static void gmc_v6_0_vram_gtt_location(struct
> > amdgpu_device *adev,
> >  	amdgpu_gmc_set_agp_default(adev, mc);
> >  	amdgpu_gmc_vram_location(adev, mc, base);
> >  	amdgpu_gmc_gart_location(adev, mc,
> > AMDGPU_GART_PLACEMENT_LOW);
> > +
> > +	/* Add space for VCE's VCPU BO so that VCE1 can access it.
> > */
> > +	mc->num_gart_pages_before_gtt += 4096;
> 
> 4096*4KiB=16MiB. Do we really need so much?

Is it enough to have just enough space for the VCPU BO?
In that case, I think we can use just 512 KiB (rounded up) if I
understand the VCPU BO size correctly. That would be 128 pages.

> 
> >  }
> >  
> >  static void gmc_v6_0_mc_program(struct amdgpu_device *adev)
> > @@ -338,7 +341,7 @@ static int gmc_v6_0_mc_init(struct
> > amdgpu_device *adev)
> >  		case CHIP_TAHITI:   /* UVD, VCE do not support
> > GPUVM */
> >  		case CHIP_PITCAIRN: /* UVD, VCE do not support
> > GPUVM */
> >  		case CHIP_OLAND:    /* UVD, VCE do not support
> > GPUVM */
> > -			adev->gmc.gart_size = 1024ULL << 20;
> > +			adev->gmc.gart_size = 1040ULL << 20;
> 
> Ideally that should be a power of two.
> 
> We can in theory increase it in units of 2MiB without wasting memory,
> but I'm not 100% sure if that is actually tested everywhere.
> 
> Regards,
> Christian.
> 
> >  			break;
> >  		}
> >  	} else {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware
  2025-10-28 22:06 ` [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
@ 2025-10-29 11:28   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 11:28 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Load VCE1 firmware using amdgpu_ucode_request, just like
> it is done for other VCE versions.
> 
> All SI chips share the same VCE1 firmware file: vce_1_0_0.bin
> which will be sent to linux-firmware soon.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>

You can probably drop Co-developed-by and Signed-off-by for me on most patches.

Especially this one here is not really from me but Alexandre.

Reviewed-by: Christian König <christian.koenig@amd.com> for the patch.

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index b23a48a1efc1..7fcc27d4453e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -41,6 +41,9 @@
>  #define VCE_IDLE_TIMEOUT	msecs_to_jiffies(1000)
>  
>  /* Firmware Names */
> +#ifdef CONFIG_DRM_AMDGPU_SI
> +#define FIRMWARE_VCE_V1_0	"amdgpu/vce_1_0_0.bin"
> +#endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>  #define FIRMWARE_BONAIRE	"amdgpu/bonaire_vce.bin"
>  #define FIRMWARE_KABINI	"amdgpu/kabini_vce.bin"
> @@ -61,6 +64,9 @@
>  #define FIRMWARE_VEGA12		"amdgpu/vega12_vce.bin"
>  #define FIRMWARE_VEGA20		"amdgpu/vega20_vce.bin"
>  
> +#ifdef CONFIG_DRM_AMDGPU_SI
> +MODULE_FIRMWARE(FIRMWARE_VCE_V1_0);
> +#endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>  MODULE_FIRMWARE(FIRMWARE_BONAIRE);
>  MODULE_FIRMWARE(FIRMWARE_KABINI);
> @@ -99,6 +105,12 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
>  static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
>  {
>  	switch (adev->asic_type) {
> +#ifdef CONFIG_DRM_AMDGPU_SI
> +	case CHIP_PITCAIRN:
> +	case CHIP_TAHITI:
> +	case CHIP_VERDE:
> +		return FIRMWARE_VCE_V1_0;
> +#endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>  	case CHIP_BONAIRE:
>  		return FIRMWARE_BONAIRE;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-28 22:06 ` [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block Timur Kristóf
@ 2025-10-29 11:38   ` Christian König
  2025-10-29 22:48     ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 11:38 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Implement the necessary functionality to support the VCE1.
> This implementation is based on:
> 
> - VCE2 code from amdgpu
> - VCE1 code from radeon (the old driver)
> - Some trial and error
> 
> A subsequent commit will ensure correct mapping for
> the VCPU BO, which will make this actually work.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c   | 805 ++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.h   |  32 +
>  4 files changed, 839 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index ebe08947c5a3..c88760fb52ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -78,7 +78,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o \
>  	dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o vce_v2_0.o
>  
>  amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o dce_v6_0.o \
> -	uvd_v3_1.o
> +	uvd_v3_1.o vce_v1_0.o
>  
>  amdgpu-y += \
>  	vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> index 22acd7b35945..050783802623 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> @@ -51,6 +51,7 @@ struct amdgpu_vce {
>  	struct drm_sched_entity	entity;
>  	uint32_t                srbm_soft_reset;
>  	unsigned		num_rings;
> +	uint32_t		keyselect;
>  };
>  
>  int amdgpu_vce_early_init(struct amdgpu_device *adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> new file mode 100644
> index 000000000000..e62fd8ed1992
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -0,0 +1,805 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2013 Advanced Micro Devices, Inc.
> + * Copyright 2025 Valve Corporation
> + * Copyright 2025 Alexandre Demers
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
> + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
> + * USE OR OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * Authors: Christian König <christian.koenig@amd.com>
> + *          Timur Kristóf <timur.kristof@gmail.com>
> + *          Alexandre Demers <alexandre.f.demers@gmail.com>
> + */
> +
> +#include <linux/firmware.h>
> +
> +#include "amdgpu.h"
> +#include "amdgpu_vce.h"
> +#include "sid.h"
> +#include "vce_v1_0.h"
> +#include "vce/vce_1_0_d.h"
> +#include "vce/vce_1_0_sh_mask.h"
> +#include "oss/oss_1_0_d.h"
> +#include "oss/oss_1_0_sh_mask.h"
> +
> +#define VCE_V1_0_FW_SIZE	(256 * 1024)
> +#define VCE_V1_0_STACK_SIZE	(64 * 1024)
> +#define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
> +#define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
> +
> +static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
> +static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
> +
> +struct vce_v1_0_fw_signature {
> +	int32_t offset;
> +	uint32_t length;
> +	int32_t number;
> +	struct {
> +		uint32_t chip_id;
> +		uint32_t keyselect;
> +		uint32_t nonce[4];
> +		uint32_t sigval[4];
> +	} val[8];
> +};
> +
> +/**
> + * vce_v1_0_ring_get_rptr - get read pointer
> + *
> + * @ring: amdgpu_ring pointer
> + *
> + * Returns the current hardware read pointer
> + */
> +static uint64_t vce_v1_0_ring_get_rptr(struct amdgpu_ring *ring)
> +{
> +	struct amdgpu_device *adev = ring->adev;
> +
> +	if (ring->me == 0)
> +		return RREG32(mmVCE_RB_RPTR);
> +	else
> +		return RREG32(mmVCE_RB_RPTR2);
> +}
> +
> +/**
> + * vce_v1_0_ring_get_wptr - get write pointer
> + *
> + * @ring: amdgpu_ring pointer
> + *
> + * Returns the current hardware write pointer
> + */
> +static uint64_t vce_v1_0_ring_get_wptr(struct amdgpu_ring *ring)
> +{
> +	struct amdgpu_device *adev = ring->adev;
> +
> +	if (ring->me == 0)
> +		return RREG32(mmVCE_RB_WPTR);
> +	else
> +		return RREG32(mmVCE_RB_WPTR2);
> +}
> +
> +/**
> + * vce_v1_0_ring_set_wptr - set write pointer
> + *
> + * @ring: amdgpu_ring pointer
> + *
> + * Commits the write pointer to the hardware
> + */
> +static void vce_v1_0_ring_set_wptr(struct amdgpu_ring *ring)
> +{
> +	struct amdgpu_device *adev = ring->adev;
> +
> +	if (ring->me == 0)
> +		WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
> +	else
> +		WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
> +}
> +
> +static int vce_v1_0_lmi_clean(struct amdgpu_device *adev)
> +{
> +	int i, j;
> +
> +	for (i = 0; i < 10; ++i) {
> +		for (j = 0; j < 100; ++j) {
> +			if (RREG32(mmVCE_LMI_STATUS) & 0x337f)
> +				return 0;
> +
> +			mdelay(10);
> +		}
> +	}
> +
> +	return -ETIMEDOUT;
> +}
> +
> +static int vce_v1_0_firmware_loaded(struct amdgpu_device *adev)
> +{
> +	int i, j;
> +
> +	for (i = 0; i < 10; ++i) {
> +		for (j = 0; j < 100; ++j) {
> +			if (RREG32(mmVCE_STATUS) & VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK)
> +				return 0;
> +			mdelay(10);
> +		}
> +
> +		dev_err(adev->dev, "VCE not responding, trying to reset the ECPU\n");
> +
> +		WREG32_P(mmVCE_SOFT_RESET,
> +			VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK,
> +			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
> +		mdelay(10);
> +		WREG32_P(mmVCE_SOFT_RESET, 0,
> +			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
> +		mdelay(10);
> +	}
> +
> +	return -ETIMEDOUT;
> +}
> +
> +static void vce_v1_0_init_cg(struct amdgpu_device *adev)
> +{
> +	u32 tmp;
> +
> +	tmp = RREG32(mmVCE_CLOCK_GATING_A);
> +	tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> +	WREG32(mmVCE_CLOCK_GATING_A, tmp);
> +
> +	tmp = RREG32(mmVCE_CLOCK_GATING_B);
> +	tmp |= 0x1e;
> +	tmp &= ~0xe100e1;
> +	WREG32(mmVCE_CLOCK_GATING_B, tmp);
> +
> +	tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> +	tmp &= ~0xff9ff000;
> +	WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> +
> +	tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> +	tmp &= ~0x3ff;
> +	WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> +}
> +
> +/**
> + * vce_v1_0_load_fw_signature - load firmware signature into VCPU BO
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * The VCE1 firmware validation mechanism needs a firmware signature.
> + * This function finds the signature appropriate for the current
> + * ASIC and writes that into the VCPU BO.
> + */
> +static int vce_v1_0_load_fw_signature(struct amdgpu_device *adev)
> +{
> +	const struct common_firmware_header *hdr;
> +	struct vce_v1_0_fw_signature *sign;
> +	unsigned int ucode_offset;
> +	uint32_t chip_id;
> +	u32 *cpu_addr;
> +	int i, r;
> +
> +	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
> +	ucode_offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> +
> +	sign = (void *)adev->vce.fw->data + ucode_offset;
> +
> +	switch (adev->asic_type) {
> +	case CHIP_TAHITI:
> +		chip_id = 0x01000014;
> +		break;
> +	case CHIP_VERDE:
> +		chip_id = 0x01000015;
> +		break;
> +	case CHIP_PITCAIRN:
> +		chip_id = 0x01000016;
> +		break;
> +	default:
> +		dev_err(adev->dev, "asic_type %#010x was not found!", adev->asic_type);
> +		return -EINVAL;
> +	}
> +

> +	ASSERT(adev->vce.vcpu_bo);

Please drop that.

> +
> +	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> +	if (r) {
> +		dev_err(adev->dev, "%s (%d) failed to reserve VCE bo\n", __func__, r);
> +		return r;
> +	}
> +
> +	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void **)&cpu_addr);
> +	if (r) {
> +		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> +		dev_err(adev->dev, "%s (%d) VCE map failed\n", __func__, r);
> +		return r;
> +	}

That part is actually pretty pointless the cpu addr is already available as adev->vce.cpu_addr.

> +
> +	for (i = 0; i < le32_to_cpu(sign->number); ++i) {
> +		if (le32_to_cpu(sign->val[i].chip_id) == chip_id)
> +			break;
> +	}
> +
> +	if (i == le32_to_cpu(sign->number)) {
> +		dev_err(adev->dev, "%s chip_id %#010x was not found for %s in VCE firmware",
> +			__func__, chip_id, amdgpu_asic_name[adev->asic_type]);

Drop the __func__ here. It should be obvious where we are fro the message.

> +		return -EINVAL;
> +	}
> +
> +	cpu_addr += (256 - 64) / 4;
> +	cpu_addr[0] = sign->val[i].nonce[0];
> +	cpu_addr[1] = sign->val[i].nonce[1];
> +	cpu_addr[2] = sign->val[i].nonce[2];
> +	cpu_addr[3] = sign->val[i].nonce[3];
> +	cpu_addr[4] = cpu_to_le32(le32_to_cpu(sign->length) + 64);
> +
> +	memset(&cpu_addr[5], 0, 44);
> +	memcpy(&cpu_addr[16], &sign[1], hdr->ucode_size_bytes - sizeof(*sign));

That should probably be memcpy_io() and the direct writes to cpu_addr modified as well.

> +
> +	cpu_addr += (le32_to_cpu(sign->length) + 64) / 4;
> +	cpu_addr[0] = sign->val[i].sigval[0];
> +	cpu_addr[1] = sign->val[i].sigval[1];
> +	cpu_addr[2] = sign->val[i].sigval[2];
> +	cpu_addr[3] = sign->val[i].sigval[3];
> +
> +	adev->vce.keyselect = le32_to_cpu(sign->val[i].keyselect);
> +


> +	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> +	amdgpu_bo_unreserve(adev->vce.vcpu_bo);

That can be dropped as well.

> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_wait_for_fw_validation(struct amdgpu_device *adev)
> +{
> +	int i;
> +
> +	for (i = 0; i < 10; ++i) {
> +		mdelay(10);
> +		if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)
> +			break;
> +	}
> +
> +	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)) {
> +		dev_err(adev->dev, "%s VCE validation timeout\n", __func__);
> +		return -ETIMEDOUT;
> +	}
> +
> +	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__PASS_MASK)) {
> +		dev_err(adev->dev, "%s VCE firmware validation failed\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < 10; ++i) {
> +		mdelay(10);
> +		if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK))
> +			break;
> +	}
> +
> +	if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK) {
> +		dev_err(adev->dev, "%s VCE firmware busy timeout\n", __func__);

Here as well, please drop the __func__ arguments.

> +		return -ETIMEDOUT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_mc_resume(struct amdgpu_device *adev)
> +{
> +	uint32_t offset;
> +	uint32_t size;
> +
> +	/* When the keyselect is already set, don't perturb VCE FW.
> +	 * Validation seems to always fail the second time.
> +	 */

Coding style for multi line /* */ comments! checkpatch.pl should point out when that is wrong.

> +	if (RREG32(mmVCE_LMI_FW_START_KEYSEL)) {
> +		dev_dbg(adev->dev, "%s keyselect already set: 0x%x (on CPU: 0x%x)\n",
> +			__func__, RREG32(mmVCE_LMI_FW_START_KEYSEL), adev->vce.keyselect);
> +
> +		WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
> +		return 0;
> +	}
> +
> +	WREG32_P(mmVCE_CLOCK_GATING_A, 0, ~(1 << 16));
> +	WREG32_P(mmVCE_UENC_CLOCK_GATING, 0x1FF000, ~0xFF9FF000);
> +	WREG32_P(mmVCE_UENC_REG_CLOCK_GATING, 0x3F, ~0x3F);
> +	WREG32(mmVCE_CLOCK_GATING_B, 0);
> +
> +	WREG32_P(mmVCE_LMI_FW_PERIODIC_CTRL, 0x4, ~0x4);
> +
> +	WREG32(mmVCE_LMI_CTRL, 0x00398000);
> +
> +	WREG32_P(mmVCE_LMI_CACHE_CTRL, 0x0, ~0x1);
> +	WREG32(mmVCE_LMI_SWAP_CNTL, 0);
> +	WREG32(mmVCE_LMI_SWAP_CNTL1, 0);
> +	WREG32(mmVCE_LMI_VM_CTRL, 0);
> +
> +	WREG32(mmVCE_VCPU_SCRATCH7, AMDGPU_MAX_VCE_HANDLES);
> +
> +	offset =  adev->vce.gpu_addr + AMDGPU_VCE_FIRMWARE_OFFSET;
> +	size = VCE_V1_0_FW_SIZE;
> +	WREG32(mmVCE_VCPU_CACHE_OFFSET0, offset & 0x7fffffff);
> +	WREG32(mmVCE_VCPU_CACHE_SIZE0, size);
> +
> +	offset += size;
> +	size = VCE_V1_0_STACK_SIZE;
> +	WREG32(mmVCE_VCPU_CACHE_OFFSET1, offset & 0x7fffffff);
> +	WREG32(mmVCE_VCPU_CACHE_SIZE1, size);
> +
> +	offset += size;
> +	size = VCE_V1_0_DATA_SIZE;
> +	WREG32(mmVCE_VCPU_CACHE_OFFSET2, offset & 0x7fffffff);
> +	WREG32(mmVCE_VCPU_CACHE_SIZE2, size);
> +
> +	WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
> +
> +	dev_dbg(adev->dev, "VCE keyselect: %d", adev->vce.keyselect);
> +	WREG32(mmVCE_LMI_FW_START_KEYSEL, adev->vce.keyselect);
> +
> +	return vce_v1_0_wait_for_fw_validation(adev);

Maybe inline wait_for_fw_validation here, it doesn't make much sense to write START_KEYSEL outside and then have that in a separate function.

Regards,
Christian.

> +}
> +
> +/**
> + * vce_v1_0_is_idle() - Check idle status of VCE1 IP block
> + *
> + * @ip_block: amdgpu_ip_block pointer
> + *
> + * Check whether VCE is busy according to VCE_STATUS.
> + * Also check whether the SRBM thinks VCE is busy, although
> + * SRBM_STATUS.VCE_BUSY seems to be bogus because it
> + * appears to mirror the VCE_STATUS.VCPU_REPORT_FW_LOADED bit.
> + */
> +static bool vce_v1_0_is_idle(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	bool busy =
> +		(RREG32(mmVCE_STATUS) & (VCE_STATUS__JOB_BUSY_MASK | VCE_STATUS__UENC_BUSY_MASK)) ||
> +		(RREG32(mmSRBM_STATUS2) & SRBM_STATUS2__VCE_BUSY_MASK);
> +
> +	return !busy;
> +}
> +
> +static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	unsigned int i;
> +
> +	for (i = 0; i < adev->usec_timeout; i++) {
> +		udelay(1);
> +		if (vce_v1_0_is_idle(ip_block))
> +			return 0;
> +	}
> +	return -ETIMEDOUT;
> +}
> +
> +/**
> + * vce_v1_0_start - start VCE block
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Setup and start the VCE block
> + */
> +static int vce_v1_0_start(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_ring *ring;
> +	int r;
> +
> +	WREG32_P(mmVCE_STATUS, 1, ~1);
> +
> +	r = vce_v1_0_mc_resume(adev);
> +	if (r)
> +		return r;
> +
> +	ring = &adev->vce.ring[0];
> +	WREG32(mmVCE_RB_RPTR, lower_32_bits(ring->wptr));
> +	WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
> +	WREG32(mmVCE_RB_BASE_LO, lower_32_bits(ring->gpu_addr));
> +	WREG32(mmVCE_RB_BASE_HI, upper_32_bits(ring->gpu_addr));
> +	WREG32(mmVCE_RB_SIZE, ring->ring_size / 4);
> +
> +	ring = &adev->vce.ring[1];
> +	WREG32(mmVCE_RB_RPTR2, lower_32_bits(ring->wptr));
> +	WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
> +	WREG32(mmVCE_RB_BASE_LO2, lower_32_bits(ring->gpu_addr));
> +	WREG32(mmVCE_RB_BASE_HI2, upper_32_bits(ring->gpu_addr));
> +	WREG32(mmVCE_RB_SIZE2, ring->ring_size / 4);
> +
> +	WREG32_P(mmVCE_VCPU_CNTL, VCE_VCPU_CNTL__CLK_EN_MASK,
> +		 ~VCE_VCPU_CNTL__CLK_EN_MASK);
> +
> +	WREG32_P(mmVCE_SOFT_RESET,
> +		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> +		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
> +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> +
> +	mdelay(100);
> +
> +	WREG32_P(mmVCE_SOFT_RESET, 0,
> +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> +
> +	r = vce_v1_0_firmware_loaded(adev);
> +
> +	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
> +	WREG32(mmVCE_STATUS, 0);
> +
> +	if (r) {
> +		dev_err(adev->dev, "VCE not responding, giving up!!!\n");
> +		return r;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_stop(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_ip_block *ip_block;
> +	int status;
> +	int i;
> +
> +	ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCE);
> +	if (!ip_block)
> +		return -EINVAL;
> +
> +	if (vce_v1_0_lmi_clean(adev))
> +		dev_warn(adev->dev, "%s VCE is not idle\n", __func__);
> +
> +	if (vce_v1_0_wait_for_idle(ip_block))
> +		dev_warn(adev->dev, "VCE is busy: VCE_STATUS=0x%x, SRBM_STATUS2=0x%x\n",
> +			RREG32(mmVCE_STATUS), RREG32(mmSRBM_STATUS2));
> +
> +	/* Stall UMC and register bus before resetting VCPU */
> +	WREG32_P(mmVCE_LMI_CTRL2, 1 << 8, ~(1 << 8));
> +
> +	for (i = 0; i < 100; ++i) {
> +		status = RREG32(mmVCE_LMI_STATUS);
> +		if (status & 0x240)
> +			break;
> +		mdelay(1);
> +	}
> +
> +	WREG32_P(mmVCE_VCPU_CNTL, 0, ~VCE_VCPU_CNTL__CLK_EN_MASK);
> +
> +	WREG32_P(mmVCE_SOFT_RESET,
> +		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> +		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
> +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> +
> +	WREG32(mmVCE_STATUS, 0);
> +
> +	return 0;
> +}
> +
> +static void vce_v1_0_enable_mgcg(struct amdgpu_device *adev, bool enable)
> +{
> +	u32 tmp;
> +
> +	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_VCE_MGCG)) {
> +		tmp = RREG32(mmVCE_CLOCK_GATING_A);
> +		tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> +		WREG32(mmVCE_CLOCK_GATING_A, tmp);
> +
> +		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> +		tmp &= ~0x1ff000;
> +		tmp |= 0xff800000;
> +		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> +
> +		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> +		tmp &= ~0x3ff;
> +		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> +	} else {
> +		tmp = RREG32(mmVCE_CLOCK_GATING_A);
> +		tmp &= ~VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> +		WREG32(mmVCE_CLOCK_GATING_A, tmp);
> +
> +		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> +		tmp |= 0x1ff000;
> +		tmp &= ~0xff800000;
> +		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> +
> +		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> +		tmp |= 0x3ff;
> +		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> +	}
> +}
> +
> +static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	r = amdgpu_vce_early_init(adev);
> +	if (r)
> +		return r;
> +
> +	adev->vce.num_rings = 2;
> +
> +	vce_v1_0_set_ring_funcs(adev);
> +	vce_v1_0_set_irq_funcs(adev);
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	struct amdgpu_ring *ring;
> +	int r, i;
> +
> +	r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY, 167, &adev->vce.irq);
> +	if (r)
> +		return r;
> +
> +	r = amdgpu_vce_sw_init(adev, VCE_V1_0_FW_SIZE +
> +		VCE_V1_0_STACK_SIZE + VCE_V1_0_DATA_SIZE);
> +	if (r)
> +		return r;
> +
> +	r = amdgpu_vce_resume(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +
> +	for (i = 0; i < adev->vce.num_rings; i++) {
> +		enum amdgpu_ring_priority_level hw_prio = amdgpu_vce_get_ring_prio(i);
> +
> +		ring = &adev->vce.ring[i];
> +		sprintf(ring->name, "vce%d", i);
> +		r = amdgpu_ring_init(adev, ring, 512, &adev->vce.irq, 0,
> +				     hw_prio, NULL);
> +		if (r)
> +			return r;
> +	}
> +
> +	return r;
> +}
> +
> +static int vce_v1_0_sw_fini(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	r = amdgpu_vce_suspend(adev);
> +	if (r)
> +		return r;
> +
> +	return amdgpu_vce_sw_fini(adev);
> +}
> +
> +/**
> + * vce_v1_0_hw_init - start and test VCE block
> + *
> + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance.
> + *
> + * Initialize the hardware, boot up the VCPU and do some testing
> + */
> +static int vce_v1_0_hw_init(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	int i, r;
> +
> +	if (adev->pm.dpm_enabled)
> +		amdgpu_dpm_enable_vce(adev, true);
> +	else
> +		amdgpu_asic_set_vce_clocks(adev, 10000, 10000);
> +
> +	for (i = 0; i < adev->vce.num_rings; i++) {
> +		r = amdgpu_ring_test_helper(&adev->vce.ring[i]);
> +		if (r)
> +			return r;
> +	}
> +
> +	dev_info(adev->dev, "VCE initialized successfully.\n");
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_hw_fini(struct amdgpu_ip_block *ip_block)
> +{
> +	int r;
> +
> +	r = vce_v1_0_stop(ip_block->adev);
> +	if (r)
> +		return r;
> +
> +	cancel_delayed_work_sync(&ip_block->adev->vce.idle_work);
> +	return 0;
> +}
> +
> +static int vce_v1_0_suspend(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	/*
> +	 * Proper cleanups before halting the HW engine:
> +	 *   - cancel the delayed idle work
> +	 *   - enable powergating
> +	 *   - enable clockgating
> +	 *   - disable dpm
> +	 *
> +	 * TODO: to align with the VCN implementation, move the
> +	 * jobs for clockgating/powergating/dpm setting to
> +	 * ->set_powergating_state().
> +	 */
> +	cancel_delayed_work_sync(&adev->vce.idle_work);
> +
> +	if (adev->pm.dpm_enabled) {
> +		amdgpu_dpm_enable_vce(adev, false);
> +	} else {
> +		amdgpu_asic_set_vce_clocks(adev, 0, 0);
> +		amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
> +						       AMD_PG_STATE_GATE);
> +		amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
> +						       AMD_CG_STATE_GATE);
> +	}
> +
> +	r = vce_v1_0_hw_fini(ip_block);
> +	if (r) {
> +		dev_err(adev->dev, "vce_v1_0_hw_fini() failed with error %i", r);
> +		return r;
> +	}
> +
> +	return amdgpu_vce_suspend(adev);
> +}
> +
> +static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +	int r;
> +
> +	r = amdgpu_vce_resume(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +
> +	return vce_v1_0_hw_init(ip_block);
> +}
> +
> +static int vce_v1_0_set_interrupt_state(struct amdgpu_device *adev,
> +					struct amdgpu_irq_src *source,
> +					unsigned int type,
> +					enum amdgpu_interrupt_state state)
> +{
> +	uint32_t val = 0;
> +
> +	if (state == AMDGPU_IRQ_STATE_ENABLE)
> +		val |= VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK;
> +
> +	WREG32_P(mmVCE_SYS_INT_EN, val,
> +		 ~VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK);
> +	return 0;
> +}
> +
> +static int vce_v1_0_process_interrupt(struct amdgpu_device *adev,
> +				      struct amdgpu_irq_src *source,
> +				      struct amdgpu_iv_entry *entry)
> +{
> +	dev_dbg(adev->dev, "IH: VCE\n");
> +	switch (entry->src_data[0]) {
> +	case 0:
> +	case 1:
> +		amdgpu_fence_process(&adev->vce.ring[entry->src_data[0]]);
> +		break;
> +	default:
> +		dev_err(adev->dev, "Unhandled interrupt: %d %d\n",
> +			  entry->src_id, entry->src_data[0]);
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_set_clockgating_state(struct amdgpu_ip_block *ip_block,
> +					  enum amd_clockgating_state state)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +
> +	vce_v1_0_init_cg(adev);
> +	vce_v1_0_enable_mgcg(adev, state == AMD_CG_STATE_GATE);
> +
> +	return 0;
> +}
> +
> +static int vce_v1_0_set_powergating_state(struct amdgpu_ip_block *ip_block,
> +					  enum amd_powergating_state state)
> +{
> +	struct amdgpu_device *adev = ip_block->adev;
> +
> +	/* This doesn't actually powergate the VCE block.
> +	 * That's done in the dpm code via the SMC.  This
> +	 * just re-inits the block as necessary.  The actual
> +	 * gating still happens in the dpm code.  We should
> +	 * revisit this when there is a cleaner line between
> +	 * the smc and the hw blocks
> +	 */
> +	if (state == AMD_PG_STATE_GATE)
> +		return vce_v1_0_stop(adev);
> +	else
> +		return vce_v1_0_start(adev);
> +}
> +
> +static const struct amd_ip_funcs vce_v1_0_ip_funcs = {
> +	.name = "vce_v1_0",
> +	.early_init = vce_v1_0_early_init,
> +	.sw_init = vce_v1_0_sw_init,
> +	.sw_fini = vce_v1_0_sw_fini,
> +	.hw_init = vce_v1_0_hw_init,
> +	.hw_fini = vce_v1_0_hw_fini,
> +	.suspend = vce_v1_0_suspend,
> +	.resume = vce_v1_0_resume,
> +	.is_idle = vce_v1_0_is_idle,
> +	.wait_for_idle = vce_v1_0_wait_for_idle,
> +	.set_clockgating_state = vce_v1_0_set_clockgating_state,
> +	.set_powergating_state = vce_v1_0_set_powergating_state,
> +};
> +
> +static const struct amdgpu_ring_funcs vce_v1_0_ring_funcs = {
> +	.type = AMDGPU_RING_TYPE_VCE,
> +	.align_mask = 0xf,
> +	.nop = VCE_CMD_NO_OP,
> +	.support_64bit_ptrs = false,
> +	.no_user_fence = true,
> +	.get_rptr = vce_v1_0_ring_get_rptr,
> +	.get_wptr = vce_v1_0_ring_get_wptr,
> +	.set_wptr = vce_v1_0_ring_set_wptr,
> +	.parse_cs = amdgpu_vce_ring_parse_cs,
> +	.emit_frame_size = 6, /* amdgpu_vce_ring_emit_fence  x1 no user fence */
> +	.emit_ib_size = 4, /* amdgpu_vce_ring_emit_ib */
> +	.emit_ib = amdgpu_vce_ring_emit_ib,
> +	.emit_fence = amdgpu_vce_ring_emit_fence,
> +	.test_ring = amdgpu_vce_ring_test_ring,
> +	.test_ib = amdgpu_vce_ring_test_ib,
> +	.insert_nop = amdgpu_ring_insert_nop,
> +	.pad_ib = amdgpu_ring_generic_pad_ib,
> +	.begin_use = amdgpu_vce_ring_begin_use,
> +	.end_use = amdgpu_vce_ring_end_use,
> +};
> +
> +static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev)
> +{
> +	int i;
> +
> +	for (i = 0; i < adev->vce.num_rings; i++) {
> +		adev->vce.ring[i].funcs = &vce_v1_0_ring_funcs;
> +		adev->vce.ring[i].me = i;
> +	}
> +};
> +
> +static const struct amdgpu_irq_src_funcs vce_v1_0_irq_funcs = {
> +	.set = vce_v1_0_set_interrupt_state,
> +	.process = vce_v1_0_process_interrupt,
> +};
> +
> +static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev)
> +{
> +	adev->vce.irq.num_types = 1;
> +	adev->vce.irq.funcs = &vce_v1_0_irq_funcs;
> +};
> +
> +const struct amdgpu_ip_block_version vce_v1_0_ip_block = {
> +	.type = AMD_IP_BLOCK_TYPE_VCE,
> +	.major = 1,
> +	.minor = 0,
> +	.rev = 0,
> +	.funcs = &vce_v1_0_ip_funcs,
> +};
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> new file mode 100644
> index 000000000000..206e7bec897f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright 2025 Advanced Micro Devices, Inc.
> + * Copyright 2025 Valve Corporation
> + * Copyright 2025 Alexandre Demers
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __VCE_V1_0_H__
> +#define __VCE_V1_0_H__
> +
> +extern const struct amdgpu_ip_block_version vce_v1_0_ip_block;
> +
> +#endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space
  2025-10-28 22:06 ` [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space Timur Kristóf
@ 2025-10-29 11:41   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 11:41 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Based on research carried out by Alexandre and Christian.
> 
> VCE1 actually executes its code from the VCPU BO.
> Due to various hardware limitations, the VCE1 requires
> the VCPU BO to be in the low 32 bit address range.
> However, VRAM is typically mapped at the high address range,
> which means the VCPU can't access VRAM through the FB aperture.
> 
> To solve this, we write a few page table entries to
> map the VCPU BO in the GART address range. And we make sure
> that the GART is located at the low address range.
> That way the VCE1 can access the VCPU BO.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Make that a suggested-by and drop co-developed and signed-off-by for me.

The code was solely written by you if I'm not completely mistaken.

Patch itself is Reviewed-by: Christian König <christian.koenig@amd.com>

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 44 +++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> index e62fd8ed1992..27f70146293d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -34,6 +34,7 @@
>  
>  #include "amdgpu.h"
>  #include "amdgpu_vce.h"
> +#include "amdgpu_gart.h"
>  #include "sid.h"
>  #include "vce_v1_0.h"
>  #include "vce/vce_1_0_d.h"
> @@ -46,6 +47,11 @@
>  #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
>  #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
>  
> +#define VCE_V1_0_GART_PAGE_START \
> +	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
> +#define VCE_V1_0_GART_ADDR_START \
> +	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
> +
>  static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
>  
> @@ -535,6 +541,38 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
>  	return 0;
>  }
>  
> +/**
> + * vce_v1_0_ensure_vcpu_bo_32bit_addr() - ensure the VCPU BO has a 32-bit address
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Due to various hardware limitations, the VCE1 requires
> + * the VCPU BO to be in the low 32 bit address range.
> + * Ensure that the VCPU BO has a 32-bit GPU address,
> + * or return an error code when that isn't possible.
> + */
> +static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
> +{
> +	const u64 gpu_addr = amdgpu_bo_gpu_offset(adev->vce.vcpu_bo);
> +	const u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
> +	const u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
> +
> +	/* Check if the VCPU BO already has a 32-bit address.
> +	 * Eg. if MC is configured to put VRAM in the low address range.
> +	 */
> +	if (gpu_addr <= max_vcpu_bo_addr)
> +		return 0;
> +
> +	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
> +	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
> +		return -EINVAL;
> +
> +	amdgpu_gart_bind_vram_bo(adev, VCE_V1_0_GART_ADDR_START, adev->vce.vcpu_bo,
> +		AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID);
> +	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
> +		return 0;
> +}
> +
>  static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -554,6 +592,9 @@ static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
>  	if (r)
>  		return r;
>  	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
>  	if (r)
>  		return r;
>  
> @@ -669,6 +710,9 @@ static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
>  	if (r)
>  		return r;
>  	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
>  	if (r)
>  		return r;
>  


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT
  2025-10-29 10:00   ` Christian König
@ 2025-10-29 11:41     ` Timur Kristóf
  0 siblings, 0 replies; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 11:41 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira
  Cc: Pelloux-Prayer, Pierre-Eric

On Wed, 2025-10-29 at 11:00 +0100, Christian König wrote:
> On 10/28/25 23:06, Timur Kristóf wrote:
> > GART contains some pages in its address space that come before
> > the GTT and are used for BO copies.
> > 
> > Instead of hardcoding the size of the GART space before GTT,
> > make it a field in the amdgpu_gmc struct. This allows us to map
> > more things in GART before GTT.
> > 
> > Split this into a separate patch to make it easier to bisect,
> > in case there are any errors in the future.
> 
> Pierre-Eric has been working on something similar.
> 
> On the newer HW generations we need more transfer windows since we
> want to utilize more DMA engines for copies and clears.
> 
> My suggestion is that we just make AMDGPU_GTT_NUM_TRANSFER_WINDOWS
> depend on adev and so the HW generation and then reserve one extra
> transfer window for this workaround on SI.

I think the best would be to leave this patch as-is to avoid conflicts
with Pierre-Eric's work. After that work lands, we can revisit this
workaround.

Does that sound reasonable to you?

> 
> Regards,
> Christian.
> 
> > 
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     | 2 ++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h     | 1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 +-
> >  3 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index 97b562a79ea8..bf31bd022d6d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -325,6 +325,8 @@ void amdgpu_gmc_gart_location(struct
> > amdgpu_device *adev, struct amdgpu_gmc *mc,
> >  		break;
> >  	}
> >  
> > +	mc->num_gart_pages_before_gtt =
> > +		AMDGPU_GTT_MAX_TRANSFER_SIZE *
> > AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
> >  	mc->gart_start &= ~(four_gb - 1);
> >  	mc->gart_end = mc->gart_start + mc->gart_size - 1;
> >  	dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > index 55097ca10738..568eed3eb557 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > @@ -266,6 +266,7 @@ struct amdgpu_gmc {
> >  	u64			fb_end;
> >  	unsigned		vram_width;
> >  	u64			real_vram_size;
> > +	u32			num_gart_pages_before_gtt;
> >  	int			vram_mtrr;
> >  	u64                     mc_mask;
> >  	const struct firmware   *fw;	/* MC firmware */
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > index 0760e70402ec..4c2563a70c2b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > @@ -283,7 +283,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device
> > *adev, uint64_t gtt_size)
> >  
> >  	ttm_resource_manager_init(man, &adev->mman.bdev,
> > gtt_size);
> >  
> > -	start = AMDGPU_GTT_MAX_TRANSFER_SIZE *
> > AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
> > +	start = adev->gmc.num_gart_pages_before_gtt;
> >  	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
> >  	drm_mm_init(&mgr->mm, start, size);
> >  	spin_lock_init(&mgr->lock);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM
  2025-10-28 22:06 ` [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
@ 2025-10-29 11:47   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 11:47 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> On SI GPUs, the SMC needs to be aware of whether or not the VCE1
> is used. The VCE1 is enabled/disabled through the DPM code.
> 
> Also print VCE clocks in amdgpu_pm_info.
> Users can inspect the current power state using:
> cat /sys/kernel/debug/dri/<card>/amdgpu_pm_info
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Alex should probably take a look as well, but from my side that sounds reasonable.

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index 3a9522c17fee..bf7ab93b265d 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7051,13 +7051,20 @@ static void si_set_vce_clock(struct amdgpu_device *adev,
>  	if ((old_rps->evclk != new_rps->evclk) ||
>  	    (old_rps->ecclk != new_rps->ecclk)) {
>  		/* Turn the clocks on when encoding, off otherwise */
> +		dev_dbg(adev->dev, "set VCE clocks: %u, %u\n", new_rps->evclk, new_rps->ecclk);
> +
>  		if (new_rps->evclk || new_rps->ecclk) {
> -			/* Place holder for future VCE1.0 porting to amdgpu
> -			vce_v1_0_enable_mgcg(adev, false, false);*/
> +			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);
> +			amdgpu_device_ip_set_clockgating_state(
> +				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_UNGATE);
> +			amdgpu_device_ip_set_powergating_state(
> +				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_UNGATE);
>  		} else {
> -			/* Place holder for future VCE1.0 porting to amdgpu
> -			vce_v1_0_enable_mgcg(adev, true, false);
> -			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);*/
> +			amdgpu_device_ip_set_powergating_state(
> +				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_GATE);
> +			amdgpu_device_ip_set_clockgating_state(
> +				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_GATE);
> +			amdgpu_asic_set_vce_clocks(adev, 0, 0);
>  		}
>  	}
>  }
> @@ -7582,6 +7589,7 @@ static void si_dpm_debugfs_print_current_performance_level(void *handle,
>  	} else {
>  		pl = &ps->performance_levels[current_index];
>  		seq_printf(m, "uvd    vclk: %d dclk: %d\n", rps->vclk, rps->dclk);
> +		seq_printf(m, "vce    evclk: %d ecclk: %d\n", rps->evclk, rps->ecclk);
>  		seq_printf(m, "power level %d    sclk: %u mclk: %u vddc: %u vddci: %u pcie gen: %u\n",
>  			   current_index, pl->sclk, pl->mclk, pl->vddc, pl->vddci, pl->pcie_gen + 1);
>  	}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs
  2025-10-28 22:06 ` [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
@ 2025-10-29 11:51   ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-29 11:51 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/28/25 23:06, Timur Kristóf wrote:
> Add the VCE1 IP block to the SI GPUs that have it.
> Advertise the encoder capabilities corresponding to VCE1,
> so the userspace applications can detect and use it.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Co-developed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Again I didn't contributed anything to this patch.

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/si.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
> index 9468c03bdb1b..f7b35b860ba3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> @@ -45,6 +45,7 @@
>  #include "dce_v6_0.h"
>  #include "si.h"
>  #include "uvd_v3_1.h"
> +#include "vce_v1_0.h"
>  
>  #include "uvd/uvd_4_0_d.h"
>  
> @@ -921,8 +922,6 @@ static const u32 hainan_mgcg_cgcg_init[] =
>  	0x3630, 0xfffffff0, 0x00000100,
>  };
>  
> -/* XXX: update when we support VCE */
> -#if 0
>  /* tahiti, pitcairn, verde */
>  static const struct amdgpu_video_codec_info tahiti_video_codecs_encode_array[] =
>  {
> @@ -940,13 +939,7 @@ static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
>  	.codec_count = ARRAY_SIZE(tahiti_video_codecs_encode_array),
>  	.codec_array = tahiti_video_codecs_encode_array,
>  };
> -#else
> -static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
> -{
> -	.codec_count = 0,
> -	.codec_array = NULL,
> -};
> -#endif
> +
>  /* oland and hainan don't support encode */
>  static const struct amdgpu_video_codecs hainan_video_codecs_encode =
>  {
> @@ -2723,7 +2716,7 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
>  		else
>  			amdgpu_device_ip_block_add(adev, &dce_v6_0_ip_block);
>  		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
> -		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
> +		amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block);
>  		break;
>  	case CHIP_OLAND:
>  		amdgpu_device_ip_block_add(adev, &si_common_ip_block);
> @@ -2741,7 +2734,6 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
>  		else
>  			amdgpu_device_ip_block_add(adev, &dce_v6_4_ip_block);
>  		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
> -		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
>  		break;
>  	case CHIP_HAINAN:
>  		amdgpu_device_ip_block_add(adev, &si_common_ip_block);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better
  2025-10-28 22:06 ` [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better Timur Kristóf
@ 2025-10-29 12:02   ` Christian König
  2025-10-29 19:46     ` Deucher, Alexander
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-29 12:02 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Liu, Leo

On 10/28/25 23:06, Timur Kristóf wrote:
> Sometimes the VCE PLL times out while we are programming it.
> When it happens, the VCE still works, but much slower.
> Observed on some Tahiti boards, but not all:
> - FirePro W9000 has the issue
> - Radeon R9 280X not affected
> - Radeon HD 7990 not affected
> 
> Continue the complete VCE PLL programming sequence even when
> it timed out. With this, the VCE will work fine and faster
> after the timeout happened.

Mhm, interesting. No idea what could be causing this.

Not sure if just ignoring the error is ok or not. @Alex?

Regards,
Christian.

> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/si.c       |  6 +-----
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
> index f7b35b860ba3..ed3d4f9bf9d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> @@ -1902,7 +1902,7 @@ static int si_vce_send_vcepll_ctlreq(struct amdgpu_device *adev)
>  	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0, ~UPLL_CTLREQ_MASK);
>  
>  	if (i == SI_MAX_CTLACKS_ASSERTION_WAIT) {
> -		DRM_ERROR("Timeout setting VCE clocks!\n");
> +		DRM_WARN("Timeout setting VCE clocks!\n");
>  		return -ETIMEDOUT;
>  	}
>  
> @@ -1954,8 +1954,6 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
>  	mdelay(1);
>  
>  	r = si_vce_send_vcepll_ctlreq(adev);
> -	if (r)
> -		return r;
>  
>  	/* Assert VCEPLL_RESET again */
>  	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, VCEPLL_RESET_MASK, ~VCEPLL_RESET_MASK);
> @@ -1988,8 +1986,6 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
>  	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0, ~VCEPLL_BYPASS_EN_MASK);
>  
>  	r = si_vce_send_vcepll_ctlreq(adev);
> -	if (r)
> -		return r;
>  
>  	/* Switch VCLK and DCLK selection */
>  	WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL_2,
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> index 27f70146293d..fdc455797258 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -401,7 +401,7 @@ static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block)
>  static int vce_v1_0_start(struct amdgpu_device *adev)
>  {
>  	struct amdgpu_ring *ring;
> -	int r;
> +	int r, i;
>  
>  	WREG32_P(mmVCE_STATUS, 1, ~1);
>  
> @@ -443,6 +443,14 @@ static int vce_v1_0_start(struct amdgpu_device *adev)
>  	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
>  	WREG32(mmVCE_STATUS, 0);
>  
> +	/* Wait for VCE_STATUS to actually clear.
> +	 * This helps when there was a timeout setting the VCE clocks.
> +	 */
> +	for (i = 0; i < adev->usec_timeout && RREG32(mmVCE_STATUS); ++i) {
> +		udelay(1);
> +		WREG32(mmVCE_STATUS, 0);
> +	}
> +
>  	if (r) {
>  		dev_err(adev->dev, "VCE not responding, giving up!!!\n");
>  		return r;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  2025-10-28 22:06 ` [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
  2025-10-29 10:26   ` Christian König
@ 2025-10-29 17:16   ` Liu, Leo
  1 sibling, 0 replies; 41+ messages in thread
From: Liu, Leo @ 2025-10-29 17:16 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx@lists.freedesktop.org,
	Deucher, Alexander, Koenig, Christian, Alexandre Demers,
	Rodrigo Siqueira

[AMD Official Use Only - AMD Internal Distribution Only]

This patch is:
Reviewed-by: Leo Liu <leo.liu@amd.com>

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Timur Kristóf
> Sent: October 28, 2025 6:06 PM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Timur Kristóf <timur.kristof@gmail.com>;
> Alexandre Demers <alexandre.f.demers@gmail.com>; Rodrigo Siqueira
> <siqueira@igalia.com>
> Subject: [PATCH 06/14] drm/amdgpu/vce: Move firmware load to
> amdgpu_vce_early_init
>
> Try to load the VCE firmware at early_init.
>
> When the correct firmware is not found, return -ENOENT.
> This way, the driver initialization will complete even
> without VCE, and the GPU will be functional, albeit
> without video encoding capabilities.
>
> This is necessary because we are planning to add support
> for the VCE1, and AMD hasn't yet publised the correct
> firmware for this version. So we need to anticipate that
> users will try to boot amdgpu on SI GPUs without the
> correct VCE1 firmware present on their system.
>
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 121 +++++++++++++++-----
> ----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
>  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c   |   5 +
>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c   |   5 +
>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   |   5 +
>  5 files changed, 91 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index eaa06dbef5c4..b23a48a1efc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -88,82 +88,87 @@ static int amdgpu_vce_get_destroy_msg(struct
> amdgpu_ring *ring, uint32_t handle,
>                                     bool direct, struct dma_fence **fence);
>
>  /**
> - * amdgpu_vce_sw_init - allocate memory, load vce firmware
> + * amdgpu_vce_firmware_name() - determine the firmware file name for
> VCE
>   *
>   * @adev: amdgpu_device pointer
> - * @size: size for the new BO
>   *
> - * First step to get VCE online, allocate memory and load the firmware
> + * Each chip that has VCE IP may need a different firmware.
> + * This function returns the name of the VCE firmware file
> + * appropriate for the current chip.
>   */
> -int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
> +static const char *amdgpu_vce_firmware_name(struct amdgpu_device
> *adev)
>  {
> -     const char *fw_name;
> -     const struct common_firmware_header *hdr;
> -     unsigned int ucode_version, version_major, version_minor,
> binary_id;
> -     int i, r;
> -
>       switch (adev->asic_type) {
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>       case CHIP_BONAIRE:
> -             fw_name = FIRMWARE_BONAIRE;
> -             break;
> +             return FIRMWARE_BONAIRE;
>       case CHIP_KAVERI:
> -             fw_name = FIRMWARE_KAVERI;
> -             break;
> +             return FIRMWARE_KAVERI;
>       case CHIP_KABINI:
> -             fw_name = FIRMWARE_KABINI;
> -             break;
> +             return FIRMWARE_KABINI;
>       case CHIP_HAWAII:
> -             fw_name = FIRMWARE_HAWAII;
> -             break;
> +             return FIRMWARE_HAWAII;
>       case CHIP_MULLINS:
> -             fw_name = FIRMWARE_MULLINS;
> -             break;
> +             return FIRMWARE_MULLINS;
>  #endif
>       case CHIP_TONGA:
> -             fw_name = FIRMWARE_TONGA;
> -             break;
> +             return  FIRMWARE_TONGA;
>       case CHIP_CARRIZO:
> -             fw_name = FIRMWARE_CARRIZO;
> -             break;
> +             return  FIRMWARE_CARRIZO;
>       case CHIP_FIJI:
> -             fw_name = FIRMWARE_FIJI;
> -             break;
> +             return  FIRMWARE_FIJI;
>       case CHIP_STONEY:
> -             fw_name = FIRMWARE_STONEY;
> -             break;
> +             return  FIRMWARE_STONEY;
>       case CHIP_POLARIS10:
> -             fw_name = FIRMWARE_POLARIS10;
> -             break;
> +             return  FIRMWARE_POLARIS10;
>       case CHIP_POLARIS11:
> -             fw_name = FIRMWARE_POLARIS11;
> -             break;
> +             return  FIRMWARE_POLARIS11;
>       case CHIP_POLARIS12:
> -             fw_name = FIRMWARE_POLARIS12;
> -             break;
> +             return  FIRMWARE_POLARIS12;
>       case CHIP_VEGAM:
> -             fw_name = FIRMWARE_VEGAM;
> -             break;
> +             return  FIRMWARE_VEGAM;
>       case CHIP_VEGA10:
> -             fw_name = FIRMWARE_VEGA10;
> -             break;
> +             return  FIRMWARE_VEGA10;
>       case CHIP_VEGA12:
> -             fw_name = FIRMWARE_VEGA12;
> -             break;
> +             return  FIRMWARE_VEGA12;
>       case CHIP_VEGA20:
> -             fw_name = FIRMWARE_VEGA20;
> -             break;
> +             return  FIRMWARE_VEGA20;
>
>       default:
> -             return -EINVAL;
> +             return NULL;
>       }
> +}
> +
> +/**
> + * amdgpu_vce_early_init() - try to load VCE firmware
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Tries to load the VCE firmware.
> + *
> + * When not found, returns ENOENT so that the driver can
> + * still load and initialize the rest of the IP blocks.
> + * The GPU can function just fine without VCE, they will just
> + * not support video encoding.
> + */
> +int amdgpu_vce_early_init(struct amdgpu_device *adev)
> +{
> +     const char *fw_name = amdgpu_vce_firmware_name(adev);
> +     const struct common_firmware_header *hdr;
> +     unsigned int ucode_version, version_major, version_minor,
> binary_id;
> +     int r;
> +
> +     if (!fw_name)
> +             return -ENOENT;
>
>       r = amdgpu_ucode_request(adev, &adev->vce.fw,
> AMDGPU_UCODE_REQUIRED, "%s", fw_name);
>       if (r) {
> -             dev_err(adev->dev, "amdgpu_vce: Can't validate firmware
> \"%s\"\n",
> -                     fw_name);
> +             dev_err(adev->dev,
> +                     "amdgpu_vce: Firmware \"%s\" not found or failed
> to validate (%d)\n",
> +                     fw_name, r);
> +
>               amdgpu_ucode_release(&adev->vce.fw);
> -             return r;
> +             return -ENOENT;
>       }
>
>       hdr = (const struct common_firmware_header *)adev->vce.fw-
> >data;
> @@ -172,11 +177,35 @@ int amdgpu_vce_sw_init(struct amdgpu_device
> *adev, unsigned long size)
>       version_major = (ucode_version >> 20) & 0xfff;
>       version_minor = (ucode_version >> 8) & 0xfff;
>       binary_id = ucode_version & 0xff;
> -     DRM_INFO("Found VCE firmware Version: %d.%d Binary ID: %d\n",
> +     dev_info(adev->dev, "Found VCE firmware Version: %d.%d Binary
> ID: %d\n",
>               version_major, version_minor, binary_id);
>       adev->vce.fw_version = ((version_major << 24) | (version_minor <<
> 16) |
>                               (binary_id << 8));
>
> +     return 0;
> +}
> +
> +/**
> + * amdgpu_vce_sw_init() - allocate memory for VCE BO
> + *
> + * @adev: amdgpu_device pointer
> + * @size: size for the new BO
> + *
> + * First step to get VCE online: allocate memory for VCE BO.
> + * The VCE firmware binary is copied into the VCE BO later,
> + * in amdgpu_vce_resume. The VCE executes its code from the
> + * VCE BO and also uses the space in this BO for its stack and data.
> + *
> + * Ideally this BO should be placed in VRAM for optimal performance,
> + * although technically it also runs from system RAM (albeit slowly).
> + */
> +int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
> +{
> +     int i, r;
> +
> +     if (!adev->vce.fw)
> +             return -ENOENT;
> +
>       r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>                                   AMDGPU_GEM_DOMAIN_VRAM |
>                                   AMDGPU_GEM_DOMAIN_GTT,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> index 6e53f872d084..22acd7b35945 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> @@ -53,6 +53,7 @@ struct amdgpu_vce {
>       unsigned                num_rings;
>  };
>
> +int amdgpu_vce_early_init(struct amdgpu_device *adev);
>  int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size);
>  int amdgpu_vce_sw_fini(struct amdgpu_device *adev);
>  int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct
> amdgpu_ring *ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> index bee3e904a6bc..8ea8a6193492 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> @@ -407,6 +407,11 @@ static void vce_v2_0_enable_mgcg(struct
> amdgpu_device *adev, bool enable,
>  static int vce_v2_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>       struct amdgpu_device *adev = ip_block->adev;
> +     int r;
> +
> +     r = amdgpu_vce_early_init(adev);
> +     if (r)
> +             return r;
>
>       adev->vce.num_rings = 2;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> index 708123899c41..719e9643c43d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> @@ -399,6 +399,7 @@ static unsigned vce_v3_0_get_harvest_config(struct
> amdgpu_device *adev)
>  static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>       struct amdgpu_device *adev = ip_block->adev;
> +     int r;
>
>       adev->vce.harvest_config = vce_v3_0_get_harvest_config(adev);
>
> @@ -407,6 +408,10 @@ static int vce_v3_0_early_init(struct
> amdgpu_ip_block *ip_block)
>           (AMDGPU_VCE_HARVEST_VCE0 |
> AMDGPU_VCE_HARVEST_VCE1))
>               return -ENOENT;
>
> +     r = amdgpu_vce_early_init(adev);
> +     if (r)
> +             return r;
> +
>       adev->vce.num_rings = 3;
>
>       vce_v3_0_set_ring_funcs(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> index 335bda64ff5b..2d64002bed61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> @@ -410,6 +410,11 @@ static int vce_v4_0_stop(struct amdgpu_device
> *adev)
>  static int vce_v4_0_early_init(struct amdgpu_ip_block *ip_block)
>  {
>       struct amdgpu_device *adev = ip_block->adev;
> +     int r;
> +
> +     r = amdgpu_vce_early_init(adev);
> +     if (r)
> +             return r;
>
>       if (amdgpu_sriov_vf(adev)) /* currently only VCN0 support SRIOV */
>               adev->vce.num_rings = 1;
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better
  2025-10-29 12:02   ` Christian König
@ 2025-10-29 19:46     ` Deucher, Alexander
  2025-11-03 16:01       ` timur.kristof
  0 siblings, 1 reply; 41+ messages in thread
From: Deucher, Alexander @ 2025-10-29 19:46 UTC (permalink / raw)
  To: Koenig, Christian, Timur Kristóf,
	amd-gfx@lists.freedesktop.org, Alexandre Demers, Rodrigo Siqueira,
	Liu, Leo

[Public]

> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Wednesday, October 29, 2025 8:02 AM
> To: Timur Kristóf <timur.kristof@gmail.com>; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander <Alexander.Deucher@amd.com>; Alexandre Demers
> <alexandre.f.demers@gmail.com>; Rodrigo Siqueira <siqueira@igalia.com>; Liu,
> Leo <Leo.Liu@amd.com>
> Subject: Re: [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better
>
> On 10/28/25 23:06, Timur Kristóf wrote:
> > Sometimes the VCE PLL times out while we are programming it.
> > When it happens, the VCE still works, but much slower.
> > Observed on some Tahiti boards, but not all:
> > - FirePro W9000 has the issue
> > - Radeon R9 280X not affected
> > - Radeon HD 7990 not affected
> >
> > Continue the complete VCE PLL programming sequence even when it timed
> > out. With this, the VCE will work fine and faster after the timeout
> > happened.
>
> Mhm, interesting. No idea what could be causing this.
>
> Not sure if just ignoring the error is ok or not. @Alex?

Looks like these registers can also be accessed indirectly via a different index/data accessor besides SMC.  I don't know whether it matters or not.  The other indirect accessors are:

#define mmCG_IND_ADDR                                   0x023C
#define mmCG_IND_DATA                                   0x023D

And the indirect indexes are:

#define ixCG_VCEPLL_FUNC_CNTL                      0x0600
#define ixCG_VCEPLL_FUNC_CNTL_2                    0x0601
#define ixCG_VCEPLL_FUNC_CNTL_3                    0x0602
#define ixCG_VCEPLL_FUNC_CNTL_4                    0x0603
#define ixCG_VCEPLL_FUNC_CNTL_5                    0x0604
#define ixCG_VCEPLL_STATUS                         0x0605
#define ixCG_VCEPLL_SPREAD_SPECTRUM                0x0606
#define ixCG_VCEPLL_SPREAD_SPECTRUM_2              0x0607

// CG_VCEPLL_FUNC_CNTL
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_RESET_MASK             0x00000001L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_SLEEP_MASK             0x00000002L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_BYPASS_EN_MASK         0x00000004L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLREQ_MASK            0x00000008L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_REFCLK_SEL_MASK        0x00000030L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CLKF_UPDATE_MASK       0x00000040L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CLKR_UPDATE_MASK       0x00000080L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_RESET_EN_MASK          0x00000100L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_VCO_MODE_MASK          0x00000600L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_REF_DIV_MASK           0x003f0000L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLACK_MASK            0x40000000L
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLACK2_MASK           0x80000000L

// CG_VCEPLL_FUNC_CNTL_2
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_PDIV_A_MASK          0x0000007fL
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_PDIV_B_MASK          0x00007f00L
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_LEGACY_PDIV_MASK     0x00020000L
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_FASTEN_MASK          0x00040000L
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_ENSAT_MASK           0x00080000L
#define CG_VCEPLL_FUNC_CNTL_2__EVCLK_SRC_SEL_MASK          0x01f00000L
#define CG_VCEPLL_FUNC_CNTL_2__ECCLK_SRC_SEL_MASK          0x3e000000L
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_TEST_MASK            0x40000000L
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_UNLOCK_CLEAR_MASK    0x80000000L

// CG_VCEPLL_FUNC_CNTL_3
#define CG_VCEPLL_FUNC_CNTL_3__VCEPLL_FB_DIV_MASK          0x03ffffffL
#define CG_VCEPLL_FUNC_CNTL_3__VCEPLL_DITHEN_MASK          0x10000000L

// CG_VCEPLL_FUNC_CNTL_4
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_TEST_SEL_MASK   0x00000007L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EXT_SEL_MASK    0x00000030L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EN_MASK         0x000000c0L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SPARE_MASK           0x0003ff00L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_REG_BIAS_MASK        0x001c0000L
#define CG_VCEPLL_FUNC_CNTL_4__TEST_FRAC_BYPASS_MASK       0x00200000L
#define CG_VCEPLL_FUNC_CNTL_4__BG_PDN_MASK                 0x00400000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_ILOCK_MASK           0x00800000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_FBCLK_SEL_MASK       0x01000000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_VCTRLADC_EN_MASK     0x02000000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EXT_MASK        0x0c000000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SPARE_EXT_MASK       0x70000000L
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_VTOI_BIAS_CNTL_MASK  0x80000000L

// CG_VCEPLL_FUNC_CNTL_5
#define CG_VCEPLL_FUNC_CNTL_5__FBDIV_SSC_BYPASS_MASK       0x00000001L
#define CG_VCEPLL_FUNC_CNTL_5__RISEFBVCO_EN_MASK           0x00000002L
#define CG_VCEPLL_FUNC_CNTL_5__PFD_RESET_CNTRL_MASK        0x0000000cL
#define CG_VCEPLL_FUNC_CNTL_5__RESET_TIMER_MASK            0x00000030L
#define CG_VCEPLL_FUNC_CNTL_5__FAST_LOCK_CNTRL_MASK        0x000000c0L
#define CG_VCEPLL_FUNC_CNTL_5__FAST_LOCK_EN_MASK           0x00000100L
#define CG_VCEPLL_FUNC_CNTL_5__RESET_ANTI_MUX_MASK         0x00000200L

// CG_VCEPLL_STATUS
#define CG_VCEPLL_STATUS__VCEPLL_CTLACK_A_MASK             0x00000001L
#define CG_VCEPLL_STATUS__VCEPLL_CTLACK_B_MASK             0x00000002L
#define CG_VCEPLL_STATUS__VCEPLL_CLKF_ACK_MASK             0x00000004L
#define CG_VCEPLL_STATUS__VCEPLL_CLKR_ACK_MASK             0x00000008L
#define CG_VCEPLL_STATUS__VCEPLL_VCTRLADC_MASK             0x000000f0L
#define CG_VCEPLL_STATUS__VCEPLL_OSPARE_MASK               0x00000f00L
#define CG_VCEPLL_STATUS__VCEPLL_INTRESET_MASK             0x00001000L
#define CG_VCEPLL_STATUS__VCEPLL_UNLOCK_MASK               0x00010000L
#define CG_VCEPLL_STATUS__VCEPLL_UNLOCK_STICKY_MASK        0x00020000L

// CG_VCEPLL_SPREAD_SPECTRUM
#define CG_VCEPLL_SPREAD_SPECTRUM__SSEN_MASK               0x00000003L
#define CG_VCEPLL_SPREAD_SPECTRUM__SPARE_MASK              0x0000000cL
#define CG_VCEPLL_SPREAD_SPECTRUM__CLKS_MASK               0x0000fff0L
#define CG_VCEPLL_SPREAD_SPECTRUM__BWADJ_MASK              0x0fff0000L

// CG_VCEPLL_SPREAD_SPECTRUM_2
#define CG_VCEPLL_SPREAD_SPECTRUM_2__CLKV_MASK             0x03ffffffL


// CG_VCEPLL_FUNC_CNTL
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_RESET__SHIFT           0x00000000
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_SLEEP__SHIFT           0x00000001
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_BYPASS_EN__SHIFT       0x00000002
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLREQ__SHIFT          0x00000003
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_REFCLK_SEL__SHIFT      0x00000004
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CLKF_UPDATE__SHIFT     0x00000006
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CLKR_UPDATE__SHIFT     0x00000007
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_RESET_EN__SHIFT        0x00000008
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_VCO_MODE__SHIFT        0x00000009
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_REF_DIV__SHIFT         0x00000010
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLACK__SHIFT          0x0000001e
#define CG_VCEPLL_FUNC_CNTL__VCEPLL_CTLACK2__SHIFT         0x0000001f

// CG_VCEPLL_FUNC_CNTL_2
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_PDIV_A__SHIFT        0x00000000
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_PDIV_B__SHIFT        0x00000008
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_LEGACY_PDIV__SHIFT   0x00000011
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_FASTEN__SHIFT        0x00000012
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_ENSAT__SHIFT         0x00000013
#define CG_VCEPLL_FUNC_CNTL_2__EVCLK_SRC_SEL__SHIFT        0x00000014
#define CG_VCEPLL_FUNC_CNTL_2__ECCLK_SRC_SEL__SHIFT        0x00000019
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_TEST__SHIFT          0x0000001e
#define CG_VCEPLL_FUNC_CNTL_2__VCEPLL_UNLOCK_CLEAR__SHIFT  0x0000001f

// CG_VCEPLL_FUNC_CNTL_3
#define CG_VCEPLL_FUNC_CNTL_3__VCEPLL_FB_DIV__SHIFT        0x00000000
#define CG_VCEPLL_FUNC_CNTL_3__VCEPLL_DITHEN__SHIFT        0x0000001c

// CG_VCEPLL_FUNC_CNTL_4
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_TEST_SEL__SHIFT 0x00000000
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EXT_SEL__SHIFT  0x00000004
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EN__SHIFT       0x00000006
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SPARE__SHIFT         0x00000008
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_REG_BIAS__SHIFT      0x00000012
#define CG_VCEPLL_FUNC_CNTL_4__TEST_FRAC_BYPASS__SHIFT     0x00000015
#define CG_VCEPLL_FUNC_CNTL_4__BG_PDN__SHIFT               0x00000016
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_ILOCK__SHIFT         0x00000017
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_FBCLK_SEL__SHIFT     0x00000018
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_VCTRLADC_EN__SHIFT   0x00000019
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SCLK_EXT__SHIFT      0x0000001a
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_SPARE_EXT__SHIFT     0x0000001c
#define CG_VCEPLL_FUNC_CNTL_4__VCEPLL_VTOI_BIAS_CNTL__SHIFT 0x0000001f

// CG_VCEPLL_FUNC_CNTL_5
#define CG_VCEPLL_FUNC_CNTL_5__FBDIV_SSC_BYPASS__SHIFT     0x00000000
#define CG_VCEPLL_FUNC_CNTL_5__RISEFBVCO_EN__SHIFT         0x00000001
#define CG_VCEPLL_FUNC_CNTL_5__PFD_RESET_CNTRL__SHIFT      0x00000002
#define CG_VCEPLL_FUNC_CNTL_5__RESET_TIMER__SHIFT          0x00000004
#define CG_VCEPLL_FUNC_CNTL_5__FAST_LOCK_CNTRL__SHIFT      0x00000006
#define CG_VCEPLL_FUNC_CNTL_5__FAST_LOCK_EN__SHIFT         0x00000008
#define CG_VCEPLL_FUNC_CNTL_5__RESET_ANTI_MUX__SHIFT       0x00000009

// CG_VCEPLL_STATUS
#define CG_VCEPLL_STATUS__VCEPLL_CTLACK_A__SHIFT           0x00000000
#define CG_VCEPLL_STATUS__VCEPLL_CTLACK_B__SHIFT           0x00000001
#define CG_VCEPLL_STATUS__VCEPLL_CLKF_ACK__SHIFT           0x00000002
#define CG_VCEPLL_STATUS__VCEPLL_CLKR_ACK__SHIFT           0x00000003
#define CG_VCEPLL_STATUS__VCEPLL_VCTRLADC__SHIFT           0x00000004
#define CG_VCEPLL_STATUS__VCEPLL_OSPARE__SHIFT             0x00000008
#define CG_VCEPLL_STATUS__VCEPLL_INTRESET__SHIFT           0x0000000c
#define CG_VCEPLL_STATUS__VCEPLL_UNLOCK__SHIFT             0x00000010
#define CG_VCEPLL_STATUS__VCEPLL_UNLOCK_STICKY__SHIFT      0x00000011

// CG_VCEPLL_SPREAD_SPECTRUM
#define CG_VCEPLL_SPREAD_SPECTRUM__SSEN__SHIFT             0x00000000
#define CG_VCEPLL_SPREAD_SPECTRUM__SPARE__SHIFT            0x00000002
#define CG_VCEPLL_SPREAD_SPECTRUM__CLKS__SHIFT             0x00000004
#define CG_VCEPLL_SPREAD_SPECTRUM__BWADJ__SHIFT            0x00000010

// CG_VCEPLL_SPREAD_SPECTRUM_2
#define CG_VCEPLL_SPREAD_SPECTRUM_2__CLKV__SHIFT           0x00000000

>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/si.c       |  6 +-----
> >  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 10 +++++++++-
> >  2 files changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/si.c
> > b/drivers/gpu/drm/amd/amdgpu/si.c index f7b35b860ba3..ed3d4f9bf9d9
> > 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/si.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> > @@ -1902,7 +1902,7 @@ static int si_vce_send_vcepll_ctlreq(struct
> amdgpu_device *adev)
> >     WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0,
> ~UPLL_CTLREQ_MASK);
> >
> >     if (i == SI_MAX_CTLACKS_ASSERTION_WAIT) {
> > -           DRM_ERROR("Timeout setting VCE clocks!\n");
> > +           DRM_WARN("Timeout setting VCE clocks!\n");
> >             return -ETIMEDOUT;
> >     }
> >
> > @@ -1954,8 +1954,6 @@ static int si_set_vce_clocks(struct amdgpu_device
> *adev, u32 evclk, u32 ecclk)
> >     mdelay(1);
> >
> >     r = si_vce_send_vcepll_ctlreq(adev);
> > -   if (r)
> > -           return r;
> >
> >     /* Assert VCEPLL_RESET again */
> >     WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, VCEPLL_RESET_MASK,
> > ~VCEPLL_RESET_MASK); @@ -1988,8 +1986,6 @@ static int
> si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
> >     WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, 0,
> ~VCEPLL_BYPASS_EN_MASK);
> >
> >     r = si_vce_send_vcepll_ctlreq(adev);
> > -   if (r)
> > -           return r;
> >
> >     /* Switch VCLK and DCLK selection */
> >     WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL_2,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > index 27f70146293d..fdc455797258 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > @@ -401,7 +401,7 @@ static int vce_v1_0_wait_for_idle(struct
> > amdgpu_ip_block *ip_block)  static int vce_v1_0_start(struct
> > amdgpu_device *adev)  {
> >     struct amdgpu_ring *ring;
> > -   int r;
> > +   int r, i;
> >
> >     WREG32_P(mmVCE_STATUS, 1, ~1);
> >
> > @@ -443,6 +443,14 @@ static int vce_v1_0_start(struct amdgpu_device *adev)
> >     /* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
> >     WREG32(mmVCE_STATUS, 0);
> >
> > +   /* Wait for VCE_STATUS to actually clear.
> > +    * This helps when there was a timeout setting the VCE clocks.
> > +    */
> > +   for (i = 0; i < adev->usec_timeout && RREG32(mmVCE_STATUS); ++i) {
> > +           udelay(1);
> > +           WREG32(mmVCE_STATUS, 0);
> > +   }
> > +
> >     if (r) {
> >             dev_err(adev->dev, "VCE not responding, giving up!!!\n");
> >             return r;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-29 11:38   ` Christian König
@ 2025-10-29 22:48     ` Timur Kristóf
  2025-10-30 11:12       ` Christian König
  0 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-29 22:48 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Wed, 2025-10-29 at 12:38 +0100, Christian König wrote:
> On 10/28/25 23:06, Timur Kristóf wrote:
> > Implement the necessary functionality to support the VCE1.
> > This implementation is based on:
> > 
> > - VCE2 code from amdgpu
> > - VCE1 code from radeon (the old driver)
> > - Some trial and error
> > 
> > A subsequent commit will ensure correct mapping for
> > the VCPU BO, which will make this actually work.
> > 
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> > Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> > Co-developed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
> >  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c   | 805
> > ++++++++++++++++++++++++
> >  drivers/gpu/drm/amd/amdgpu/vce_v1_0.h   |  32 +
> >  4 files changed, 839 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> >  create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> > b/drivers/gpu/drm/amd/amdgpu/Makefile
> > index ebe08947c5a3..c88760fb52ea 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> > @@ -78,7 +78,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o
> > \
> >  	dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o vce_v2_0.o
> >  
> >  amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o
> > si_ih.o si_dma.o dce_v6_0.o \
> > -	uvd_v3_1.o
> > +	uvd_v3_1.o vce_v1_0.o
> >  
> >  amdgpu-y += \
> >  	vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o
> > nbio_v7_0.o vega10_reg_init.o \
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> > index 22acd7b35945..050783802623 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> > @@ -51,6 +51,7 @@ struct amdgpu_vce {
> >  	struct drm_sched_entity	entity;
> >  	uint32_t                srbm_soft_reset;
> >  	unsigned		num_rings;
> > +	uint32_t		keyselect;
> >  };
> >  
> >  int amdgpu_vce_early_init(struct amdgpu_device *adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > new file mode 100644
> > index 000000000000..e62fd8ed1992
> > --- /dev/null
> > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> > @@ -0,0 +1,805 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright 2013 Advanced Micro Devices, Inc.
> > + * Copyright 2025 Valve Corporation
> > + * Copyright 2025 Alexandre Demers
> > + * All Rights Reserved.
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > obtaining a
> > + * copy of this software and associated documentation files (the
> > + * "Software"), to deal in the Software without restriction,
> > including
> > + * without limitation the rights to use, copy, modify, merge,
> > publish,
> > + * distribute, sub license, and/or sell copies of the Software,
> > and to
> > + * permit persons to whom the Software is furnished to do so,
> > subject to
> > + * the following conditions:
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO
> > EVENT SHALL
> > + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE
> > FOR ANY CLAIM,
> > + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> > TORT OR
> > + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> > SOFTWARE OR THE
> > + * USE OR OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > + * The above copyright notice and this permission notice
> > (including the
> > + * next paragraph) shall be included in all copies or substantial
> > portions
> > + * of the Software.
> > + *
> > + * Authors: Christian König <christian.koenig@amd.com>
> > + *          Timur Kristóf <timur.kristof@gmail.com>
> > + *          Alexandre Demers <alexandre.f.demers@gmail.com>
> > + */
> > +
> > +#include <linux/firmware.h>
> > +
> > +#include "amdgpu.h"
> > +#include "amdgpu_vce.h"
> > +#include "sid.h"
> > +#include "vce_v1_0.h"
> > +#include "vce/vce_1_0_d.h"
> > +#include "vce/vce_1_0_sh_mask.h"
> > +#include "oss/oss_1_0_d.h"
> > +#include "oss/oss_1_0_sh_mask.h"
> > +
> > +#define VCE_V1_0_FW_SIZE	(256 * 1024)
> > +#define VCE_V1_0_STACK_SIZE	(64 * 1024)
> > +#define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES +
> > 1))
> > +#define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
> > +
> > +static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
> > +static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
> > +
> > +struct vce_v1_0_fw_signature {
> > +	int32_t offset;
> > +	uint32_t length;
> > +	int32_t number;
> > +	struct {
> > +		uint32_t chip_id;
> > +		uint32_t keyselect;
> > +		uint32_t nonce[4];
> > +		uint32_t sigval[4];
> > +	} val[8];
> > +};
> > +
> > +/**
> > + * vce_v1_0_ring_get_rptr - get read pointer
> > + *
> > + * @ring: amdgpu_ring pointer
> > + *
> > + * Returns the current hardware read pointer
> > + */
> > +static uint64_t vce_v1_0_ring_get_rptr(struct amdgpu_ring *ring)
> > +{
> > +	struct amdgpu_device *adev = ring->adev;
> > +
> > +	if (ring->me == 0)
> > +		return RREG32(mmVCE_RB_RPTR);
> > +	else
> > +		return RREG32(mmVCE_RB_RPTR2);
> > +}
> > +
> > +/**
> > + * vce_v1_0_ring_get_wptr - get write pointer
> > + *
> > + * @ring: amdgpu_ring pointer
> > + *
> > + * Returns the current hardware write pointer
> > + */
> > +static uint64_t vce_v1_0_ring_get_wptr(struct amdgpu_ring *ring)
> > +{
> > +	struct amdgpu_device *adev = ring->adev;
> > +
> > +	if (ring->me == 0)
> > +		return RREG32(mmVCE_RB_WPTR);
> > +	else
> > +		return RREG32(mmVCE_RB_WPTR2);
> > +}
> > +
> > +/**
> > + * vce_v1_0_ring_set_wptr - set write pointer
> > + *
> > + * @ring: amdgpu_ring pointer
> > + *
> > + * Commits the write pointer to the hardware
> > + */
> > +static void vce_v1_0_ring_set_wptr(struct amdgpu_ring *ring)
> > +{
> > +	struct amdgpu_device *adev = ring->adev;
> > +
> > +	if (ring->me == 0)
> > +		WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
> > +	else
> > +		WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
> > +}
> > +
> > +static int vce_v1_0_lmi_clean(struct amdgpu_device *adev)
> > +{
> > +	int i, j;
> > +
> > +	for (i = 0; i < 10; ++i) {
> > +		for (j = 0; j < 100; ++j) {
> > +			if (RREG32(mmVCE_LMI_STATUS) & 0x337f)
> > +				return 0;
> > +
> > +			mdelay(10);
> > +		}
> > +	}
> > +
> > +	return -ETIMEDOUT;
> > +}
> > +
> > +static int vce_v1_0_firmware_loaded(struct amdgpu_device *adev)
> > +{
> > +	int i, j;
> > +
> > +	for (i = 0; i < 10; ++i) {
> > +		for (j = 0; j < 100; ++j) {
> > +			if (RREG32(mmVCE_STATUS) &
> > VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK)
> > +				return 0;
> > +			mdelay(10);
> > +		}
> > +
> > +		dev_err(adev->dev, "VCE not responding, trying to
> > reset the ECPU\n");
> > +
> > +		WREG32_P(mmVCE_SOFT_RESET,
> > +			VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK,
> > +			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
> > +		mdelay(10);
> > +		WREG32_P(mmVCE_SOFT_RESET, 0,
> > +			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
> > +		mdelay(10);
> > +	}
> > +
> > +	return -ETIMEDOUT;
> > +}
> > +
> > +static void vce_v1_0_init_cg(struct amdgpu_device *adev)
> > +{
> > +	u32 tmp;
> > +
> > +	tmp = RREG32(mmVCE_CLOCK_GATING_A);
> > +	tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> > +	WREG32(mmVCE_CLOCK_GATING_A, tmp);
> > +
> > +	tmp = RREG32(mmVCE_CLOCK_GATING_B);
> > +	tmp |= 0x1e;
> > +	tmp &= ~0xe100e1;
> > +	WREG32(mmVCE_CLOCK_GATING_B, tmp);
> > +
> > +	tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> > +	tmp &= ~0xff9ff000;
> > +	WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> > +
> > +	tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> > +	tmp &= ~0x3ff;
> > +	WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> > +}
> > +
> > +/**
> > + * vce_v1_0_load_fw_signature - load firmware signature into VCPU
> > BO
> > + *
> > + * @adev: amdgpu_device pointer
> > + *
> > + * The VCE1 firmware validation mechanism needs a firmware
> > signature.
> > + * This function finds the signature appropriate for the current
> > + * ASIC and writes that into the VCPU BO.
> > + */
> > +static int vce_v1_0_load_fw_signature(struct amdgpu_device *adev)
> > +{
> > +	const struct common_firmware_header *hdr;
> > +	struct vce_v1_0_fw_signature *sign;
> > +	unsigned int ucode_offset;
> > +	uint32_t chip_id;
> > +	u32 *cpu_addr;
> > +	int i, r;
> > +
> > +	hdr = (const struct common_firmware_header *)adev->vce.fw-
> > >data;
> > +	ucode_offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> > +
> > +	sign = (void *)adev->vce.fw->data + ucode_offset;
> > +
> > +	switch (adev->asic_type) {
> > +	case CHIP_TAHITI:
> > +		chip_id = 0x01000014;
> > +		break;
> > +	case CHIP_VERDE:
> > +		chip_id = 0x01000015;
> > +		break;
> > +	case CHIP_PITCAIRN:
> > +		chip_id = 0x01000016;
> > +		break;
> > +	default:
> > +		dev_err(adev->dev, "asic_type %#010x was not
> > found!", adev->asic_type);
> > +		return -EINVAL;
> > +	}
> > +
> 
> > +	ASSERT(adev->vce.vcpu_bo);
> 
> Please drop that.

Sure, but can you say why?

> 
> > +
> > +	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> > +	if (r) {
> > +		dev_err(adev->dev, "%s (%d) failed to reserve VCE
> > bo\n", __func__, r);
> > +		return r;
> > +	}
> > +
> > +	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void **)&cpu_addr);
> > +	if (r) {
> > +		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > +		dev_err(adev->dev, "%s (%d) VCE map failed\n",
> > __func__, r);
> > +		return r;
> > +	}
> 
> That part is actually pretty pointless the cpu addr is already
> available as adev->vce.cpu_addr.

I don't think so. amdgpu_vce_resume actually unmaps and unreserves the
VCE BO, so I think we need to map and reserve it again if we want to
access it again. Am I misunderstanding something?

> 
> > +
> > +	for (i = 0; i < le32_to_cpu(sign->number); ++i) {
> > +		if (le32_to_cpu(sign->val[i].chip_id) == chip_id)
> > +			break;
> > +	}
> > +
> > +	if (i == le32_to_cpu(sign->number)) {
> > +		dev_err(adev->dev, "%s chip_id %#010x was not
> > found for %s in VCE firmware",
> > +			__func__, chip_id, amdgpu_asic_name[adev-
> > >asic_type]);
> 
> Drop the __func__ here. It should be obvious where we are fro the
> message.

Sure.

> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	cpu_addr += (256 - 64) / 4;
> > +	cpu_addr[0] = sign->val[i].nonce[0];
> > +	cpu_addr[1] = sign->val[i].nonce[1];
> > +	cpu_addr[2] = sign->val[i].nonce[2];
> > +	cpu_addr[3] = sign->val[i].nonce[3];
> > +	cpu_addr[4] = cpu_to_le32(le32_to_cpu(sign->length) + 64);
> > +
> > +	memset(&cpu_addr[5], 0, 44);
> > +	memcpy(&cpu_addr[16], &sign[1], hdr->ucode_size_bytes -
> > sizeof(*sign));
> 
> That should probably be memcpy_io() and the direct writes to cpu_addr
> modified as well.

Sure, I can do that but can you explain why?

> 
> > +
> > +	cpu_addr += (le32_to_cpu(sign->length) + 64) / 4;
> > +	cpu_addr[0] = sign->val[i].sigval[0];
> > +	cpu_addr[1] = sign->val[i].sigval[1];
> > +	cpu_addr[2] = sign->val[i].sigval[2];
> > +	cpu_addr[3] = sign->val[i].sigval[3];
> > +
> > +	adev->vce.keyselect = le32_to_cpu(sign->val[i].keyselect);
> > +
> 
> 
> > +	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> > +	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> 
> That can be dropped as well.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_wait_for_fw_validation(struct amdgpu_device
> > *adev)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < 10; ++i) {
> > +		mdelay(10);
> > +		if (RREG32(mmVCE_FW_REG_STATUS) &
> > VCE_FW_REG_STATUS__DONE_MASK)
> > +			break;
> > +	}
> > +
> > +	if (!(RREG32(mmVCE_FW_REG_STATUS) &
> > VCE_FW_REG_STATUS__DONE_MASK)) {
> > +		dev_err(adev->dev, "%s VCE validation timeout\n",
> > __func__);
> > +		return -ETIMEDOUT;
> > +	}
> > +
> > +	if (!(RREG32(mmVCE_FW_REG_STATUS) &
> > VCE_FW_REG_STATUS__PASS_MASK)) {
> > +		dev_err(adev->dev, "%s VCE firmware validation
> > failed\n", __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	for (i = 0; i < 10; ++i) {
> > +		mdelay(10);
> > +		if (!(RREG32(mmVCE_FW_REG_STATUS) &
> > VCE_FW_REG_STATUS__BUSY_MASK))
> > +			break;
> > +	}
> > +
> > +	if (RREG32(mmVCE_FW_REG_STATUS) &
> > VCE_FW_REG_STATUS__BUSY_MASK) {
> > +		dev_err(adev->dev, "%s VCE firmware busy
> > timeout\n", __func__);
> 
> Here as well, please drop the __func__ arguments.
> 
> > +		return -ETIMEDOUT;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_mc_resume(struct amdgpu_device *adev)
> > +{
> > +	uint32_t offset;
> > +	uint32_t size;
> > +
> > +	/* When the keyselect is already set, don't perturb VCE
> > FW.
> > +	 * Validation seems to always fail the second time.
> > +	 */
> 
> Coding style for multi line /* */ comments! checkpatch.pl should
> point out when that is wrong.

Please note that I check every patch with the check patch script before
I send them to the mailing list, and it didn't raise any issues with
this comment.

That being said, sure I can change the comment style to whatever you
prefer.

> 
> > +	if (RREG32(mmVCE_LMI_FW_START_KEYSEL)) {
> > +		dev_dbg(adev->dev, "%s keyselect already set: 0x%x
> > (on CPU: 0x%x)\n",
> > +			__func__,
> > RREG32(mmVCE_LMI_FW_START_KEYSEL), adev->vce.keyselect);
> > +
> > +		WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
> > +		return 0;
> > +	}
> > +
> > +	WREG32_P(mmVCE_CLOCK_GATING_A, 0, ~(1 << 16));
> > +	WREG32_P(mmVCE_UENC_CLOCK_GATING, 0x1FF000, ~0xFF9FF000);
> > +	WREG32_P(mmVCE_UENC_REG_CLOCK_GATING, 0x3F, ~0x3F);
> > +	WREG32(mmVCE_CLOCK_GATING_B, 0);
> > +
> > +	WREG32_P(mmVCE_LMI_FW_PERIODIC_CTRL, 0x4, ~0x4);
> > +
> > +	WREG32(mmVCE_LMI_CTRL, 0x00398000);
> > +
> > +	WREG32_P(mmVCE_LMI_CACHE_CTRL, 0x0, ~0x1);
> > +	WREG32(mmVCE_LMI_SWAP_CNTL, 0);
> > +	WREG32(mmVCE_LMI_SWAP_CNTL1, 0);
> > +	WREG32(mmVCE_LMI_VM_CTRL, 0);
> > +
> > +	WREG32(mmVCE_VCPU_SCRATCH7, AMDGPU_MAX_VCE_HANDLES);
> > +
> > +	offset =  adev->vce.gpu_addr + AMDGPU_VCE_FIRMWARE_OFFSET;
> > +	size = VCE_V1_0_FW_SIZE;
> > +	WREG32(mmVCE_VCPU_CACHE_OFFSET0, offset & 0x7fffffff);
> > +	WREG32(mmVCE_VCPU_CACHE_SIZE0, size);
> > +
> > +	offset += size;
> > +	size = VCE_V1_0_STACK_SIZE;
> > +	WREG32(mmVCE_VCPU_CACHE_OFFSET1, offset & 0x7fffffff);
> > +	WREG32(mmVCE_VCPU_CACHE_SIZE1, size);
> > +
> > +	offset += size;
> > +	size = VCE_V1_0_DATA_SIZE;
> > +	WREG32(mmVCE_VCPU_CACHE_OFFSET2, offset & 0x7fffffff);
> > +	WREG32(mmVCE_VCPU_CACHE_SIZE2, size);
> > +
> > +	WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
> > +
> > +	dev_dbg(adev->dev, "VCE keyselect: %d", adev-
> > >vce.keyselect);
> > +	WREG32(mmVCE_LMI_FW_START_KEYSEL, adev->vce.keyselect);
> > +
> > +	return vce_v1_0_wait_for_fw_validation(adev);
> 
> Maybe inline wait_for_fw_validation here, it doesn't make much sense
> to write START_KEYSEL outside and then have that in a separate
> function.

OK.

> 
> 
> > +}
> > +
> > +/**
> > + * vce_v1_0_is_idle() - Check idle status of VCE1 IP block
> > + *
> > + * @ip_block: amdgpu_ip_block pointer
> > + *
> > + * Check whether VCE is busy according to VCE_STATUS.
> > + * Also check whether the SRBM thinks VCE is busy, although
> > + * SRBM_STATUS.VCE_BUSY seems to be bogus because it
> > + * appears to mirror the VCE_STATUS.VCPU_REPORT_FW_LOADED bit.
> > + */
> > +static bool vce_v1_0_is_idle(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	bool busy =
> > +		(RREG32(mmVCE_STATUS) & (VCE_STATUS__JOB_BUSY_MASK
> > | VCE_STATUS__UENC_BUSY_MASK)) ||
> > +		(RREG32(mmSRBM_STATUS2) &
> > SRBM_STATUS2__VCE_BUSY_MASK);
> > +
> > +	return !busy;
> > +}
> > +
> > +static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block
> > *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < adev->usec_timeout; i++) {
> > +		udelay(1);
> > +		if (vce_v1_0_is_idle(ip_block))
> > +			return 0;
> > +	}
> > +	return -ETIMEDOUT;
> > +}
> > +
> > +/**
> > + * vce_v1_0_start - start VCE block
> > + *
> > + * @adev: amdgpu_device pointer
> > + *
> > + * Setup and start the VCE block
> > + */
> > +static int vce_v1_0_start(struct amdgpu_device *adev)
> > +{
> > +	struct amdgpu_ring *ring;
> > +	int r;
> > +
> > +	WREG32_P(mmVCE_STATUS, 1, ~1);
> > +
> > +	r = vce_v1_0_mc_resume(adev);
> > +	if (r)
> > +		return r;
> > +
> > +	ring = &adev->vce.ring[0];
> > +	WREG32(mmVCE_RB_RPTR, lower_32_bits(ring->wptr));
> > +	WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
> > +	WREG32(mmVCE_RB_BASE_LO, lower_32_bits(ring->gpu_addr));
> > +	WREG32(mmVCE_RB_BASE_HI, upper_32_bits(ring->gpu_addr));
> > +	WREG32(mmVCE_RB_SIZE, ring->ring_size / 4);
> > +
> > +	ring = &adev->vce.ring[1];
> > +	WREG32(mmVCE_RB_RPTR2, lower_32_bits(ring->wptr));
> > +	WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
> > +	WREG32(mmVCE_RB_BASE_LO2, lower_32_bits(ring->gpu_addr));
> > +	WREG32(mmVCE_RB_BASE_HI2, upper_32_bits(ring->gpu_addr));
> > +	WREG32(mmVCE_RB_SIZE2, ring->ring_size / 4);
> > +
> > +	WREG32_P(mmVCE_VCPU_CNTL, VCE_VCPU_CNTL__CLK_EN_MASK,
> > +		 ~VCE_VCPU_CNTL__CLK_EN_MASK);
> > +
> > +	WREG32_P(mmVCE_SOFT_RESET,
> > +		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> > +		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
> > +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> > +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> > +
> > +	mdelay(100);
> > +
> > +	WREG32_P(mmVCE_SOFT_RESET, 0,
> > +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> > +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> > +
> > +	r = vce_v1_0_firmware_loaded(adev);
> > +
> > +	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy.
> > */
> > +	WREG32(mmVCE_STATUS, 0);
> > +
> > +	if (r) {
> > +		dev_err(adev->dev, "VCE not responding, giving
> > up!!!\n");
> > +		return r;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_stop(struct amdgpu_device *adev)
> > +{
> > +	struct amdgpu_ip_block *ip_block;
> > +	int status;
> > +	int i;
> > +
> > +	ip_block = amdgpu_device_ip_get_ip_block(adev,
> > AMD_IP_BLOCK_TYPE_VCE);
> > +	if (!ip_block)
> > +		return -EINVAL;
> > +
> > +	if (vce_v1_0_lmi_clean(adev))
> > +		dev_warn(adev->dev, "%s VCE is not idle\n",
> > __func__);
> > +
> > +	if (vce_v1_0_wait_for_idle(ip_block))
> > +		dev_warn(adev->dev, "VCE is busy: VCE_STATUS=0x%x,
> > SRBM_STATUS2=0x%x\n",
> > +			RREG32(mmVCE_STATUS),
> > RREG32(mmSRBM_STATUS2));
> > +
> > +	/* Stall UMC and register bus before resetting VCPU */
> > +	WREG32_P(mmVCE_LMI_CTRL2, 1 << 8, ~(1 << 8));
> > +
> > +	for (i = 0; i < 100; ++i) {
> > +		status = RREG32(mmVCE_LMI_STATUS);
> > +		if (status & 0x240)
> > +			break;
> > +		mdelay(1);
> > +	}
> > +
> > +	WREG32_P(mmVCE_VCPU_CNTL, 0, ~VCE_VCPU_CNTL__CLK_EN_MASK);
> > +
> > +	WREG32_P(mmVCE_SOFT_RESET,
> > +		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> > +		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
> > +		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
> > +		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
> > +
> > +	WREG32(mmVCE_STATUS, 0);
> > +
> > +	return 0;
> > +}
> > +
> > +static void vce_v1_0_enable_mgcg(struct amdgpu_device *adev, bool
> > enable)
> > +{
> > +	u32 tmp;
> > +
> > +	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_VCE_MGCG))
> > {
> > +		tmp = RREG32(mmVCE_CLOCK_GATING_A);
> > +		tmp |=
> > VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> > +		WREG32(mmVCE_CLOCK_GATING_A, tmp);
> > +
> > +		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> > +		tmp &= ~0x1ff000;
> > +		tmp |= 0xff800000;
> > +		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> > +
> > +		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> > +		tmp &= ~0x3ff;
> > +		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> > +	} else {
> > +		tmp = RREG32(mmVCE_CLOCK_GATING_A);
> > +		tmp &=
> > ~VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
> > +		WREG32(mmVCE_CLOCK_GATING_A, tmp);
> > +
> > +		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
> > +		tmp |= 0x1ff000;
> > +		tmp &= ~0xff800000;
> > +		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
> > +
> > +		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
> > +		tmp |= 0x3ff;
> > +		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
> > +	}
> > +}
> > +
> > +static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	int r;
> > +
> > +	r = amdgpu_vce_early_init(adev);
> > +	if (r)
> > +		return r;
> > +
> > +	adev->vce.num_rings = 2;
> > +
> > +	vce_v1_0_set_ring_funcs(adev);
> > +	vce_v1_0_set_irq_funcs(adev);
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	struct amdgpu_ring *ring;
> > +	int r, i;
> > +
> > +	r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY,
> > 167, &adev->vce.irq);
> > +	if (r)
> > +		return r;
> > +
> > +	r = amdgpu_vce_sw_init(adev, VCE_V1_0_FW_SIZE +
> > +		VCE_V1_0_STACK_SIZE + VCE_V1_0_DATA_SIZE);
> > +	if (r)
> > +		return r;
> > +
> > +	r = amdgpu_vce_resume(adev);
> > +	if (r)
> > +		return r;
> > +	r = vce_v1_0_load_fw_signature(adev);
> > +	if (r)
> > +		return r;
> > +
> > +	for (i = 0; i < adev->vce.num_rings; i++) {
> > +		enum amdgpu_ring_priority_level hw_prio =
> > amdgpu_vce_get_ring_prio(i);
> > +
> > +		ring = &adev->vce.ring[i];
> > +		sprintf(ring->name, "vce%d", i);
> > +		r = amdgpu_ring_init(adev, ring, 512, &adev-
> > >vce.irq, 0,
> > +				     hw_prio, NULL);
> > +		if (r)
> > +			return r;
> > +	}
> > +
> > +	return r;
> > +}
> > +
> > +static int vce_v1_0_sw_fini(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	int r;
> > +
> > +	r = amdgpu_vce_suspend(adev);
> > +	if (r)
> > +		return r;
> > +
> > +	return amdgpu_vce_sw_fini(adev);
> > +}
> > +
> > +/**
> > + * vce_v1_0_hw_init - start and test VCE block
> > + *
> > + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance.
> > + *
> > + * Initialize the hardware, boot up the VCPU and do some testing
> > + */
> > +static int vce_v1_0_hw_init(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	int i, r;
> > +
> > +	if (adev->pm.dpm_enabled)
> > +		amdgpu_dpm_enable_vce(adev, true);
> > +	else
> > +		amdgpu_asic_set_vce_clocks(adev, 10000, 10000);
> > +
> > +	for (i = 0; i < adev->vce.num_rings; i++) {
> > +		r = amdgpu_ring_test_helper(&adev->vce.ring[i]);
> > +		if (r)
> > +			return r;
> > +	}
> > +
> > +	dev_info(adev->dev, "VCE initialized successfully.\n");
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_hw_fini(struct amdgpu_ip_block *ip_block)
> > +{
> > +	int r;
> > +
> > +	r = vce_v1_0_stop(ip_block->adev);
> > +	if (r)
> > +		return r;
> > +
> > +	cancel_delayed_work_sync(&ip_block->adev->vce.idle_work);
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_suspend(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	int r;
> > +
> > +	/*
> > +	 * Proper cleanups before halting the HW engine:
> > +	 *   - cancel the delayed idle work
> > +	 *   - enable powergating
> > +	 *   - enable clockgating
> > +	 *   - disable dpm
> > +	 *
> > +	 * TODO: to align with the VCN implementation, move the
> > +	 * jobs for clockgating/powergating/dpm setting to
> > +	 * ->set_powergating_state().
> > +	 */
> > +	cancel_delayed_work_sync(&adev->vce.idle_work);
> > +
> > +	if (adev->pm.dpm_enabled) {
> > +		amdgpu_dpm_enable_vce(adev, false);
> > +	} else {
> > +		amdgpu_asic_set_vce_clocks(adev, 0, 0);
> > +		amdgpu_device_ip_set_powergating_state(adev,
> > AMD_IP_BLOCK_TYPE_VCE,
> > +						      
> > AMD_PG_STATE_GATE);
> > +		amdgpu_device_ip_set_clockgating_state(adev,
> > AMD_IP_BLOCK_TYPE_VCE,
> > +						      
> > AMD_CG_STATE_GATE);
> > +	}
> > +
> > +	r = vce_v1_0_hw_fini(ip_block);
> > +	if (r) {
> > +		dev_err(adev->dev, "vce_v1_0_hw_fini() failed with
> > error %i", r);
> > +		return r;
> > +	}
> > +
> > +	return amdgpu_vce_suspend(adev);
> > +}
> > +
> > +static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +	int r;
> > +
> > +	r = amdgpu_vce_resume(adev);
> > +	if (r)
> > +		return r;
> > +	r = vce_v1_0_load_fw_signature(adev);
> > +	if (r)
> > +		return r;
> > +
> > +	return vce_v1_0_hw_init(ip_block);
> > +}
> > +
> > +static int vce_v1_0_set_interrupt_state(struct amdgpu_device
> > *adev,
> > +					struct amdgpu_irq_src
> > *source,
> > +					unsigned int type,
> > +					enum
> > amdgpu_interrupt_state state)
> > +{
> > +	uint32_t val = 0;
> > +
> > +	if (state == AMDGPU_IRQ_STATE_ENABLE)
> > +		val |=
> > VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK;
> > +
> > +	WREG32_P(mmVCE_SYS_INT_EN, val,
> > +		
> > ~VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK);
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_process_interrupt(struct amdgpu_device *adev,
> > +				      struct amdgpu_irq_src
> > *source,
> > +				      struct amdgpu_iv_entry
> > *entry)
> > +{
> > +	dev_dbg(adev->dev, "IH: VCE\n");
> > +	switch (entry->src_data[0]) {
> > +	case 0:
> > +	case 1:
> > +		amdgpu_fence_process(&adev->vce.ring[entry-
> > >src_data[0]]);
> > +		break;
> > +	default:
> > +		dev_err(adev->dev, "Unhandled interrupt: %d %d\n",
> > +			  entry->src_id, entry->src_data[0]);
> > +		break;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_set_clockgating_state(struct amdgpu_ip_block
> > *ip_block,
> > +					  enum
> > amd_clockgating_state state)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +
> > +	vce_v1_0_init_cg(adev);
> > +	vce_v1_0_enable_mgcg(adev, state == AMD_CG_STATE_GATE);
> > +
> > +	return 0;
> > +}
> > +
> > +static int vce_v1_0_set_powergating_state(struct amdgpu_ip_block
> > *ip_block,
> > +					  enum
> > amd_powergating_state state)
> > +{
> > +	struct amdgpu_device *adev = ip_block->adev;
> > +
> > +	/* This doesn't actually powergate the VCE block.
> > +	 * That's done in the dpm code via the SMC.  This
> > +	 * just re-inits the block as necessary.  The actual
> > +	 * gating still happens in the dpm code.  We should
> > +	 * revisit this when there is a cleaner line between
> > +	 * the smc and the hw blocks
> > +	 */
> > +	if (state == AMD_PG_STATE_GATE)
> > +		return vce_v1_0_stop(adev);
> > +	else
> > +		return vce_v1_0_start(adev);
> > +}
> > +
> > +static const struct amd_ip_funcs vce_v1_0_ip_funcs = {
> > +	.name = "vce_v1_0",
> > +	.early_init = vce_v1_0_early_init,
> > +	.sw_init = vce_v1_0_sw_init,
> > +	.sw_fini = vce_v1_0_sw_fini,
> > +	.hw_init = vce_v1_0_hw_init,
> > +	.hw_fini = vce_v1_0_hw_fini,
> > +	.suspend = vce_v1_0_suspend,
> > +	.resume = vce_v1_0_resume,
> > +	.is_idle = vce_v1_0_is_idle,
> > +	.wait_for_idle = vce_v1_0_wait_for_idle,
> > +	.set_clockgating_state = vce_v1_0_set_clockgating_state,
> > +	.set_powergating_state = vce_v1_0_set_powergating_state,
> > +};
> > +
> > +static const struct amdgpu_ring_funcs vce_v1_0_ring_funcs = {
> > +	.type = AMDGPU_RING_TYPE_VCE,
> > +	.align_mask = 0xf,
> > +	.nop = VCE_CMD_NO_OP,
> > +	.support_64bit_ptrs = false,
> > +	.no_user_fence = true,
> > +	.get_rptr = vce_v1_0_ring_get_rptr,
> > +	.get_wptr = vce_v1_0_ring_get_wptr,
> > +	.set_wptr = vce_v1_0_ring_set_wptr,
> > +	.parse_cs = amdgpu_vce_ring_parse_cs,
> > +	.emit_frame_size = 6, /* amdgpu_vce_ring_emit_fence  x1 no
> > user fence */
> > +	.emit_ib_size = 4, /* amdgpu_vce_ring_emit_ib */
> > +	.emit_ib = amdgpu_vce_ring_emit_ib,
> > +	.emit_fence = amdgpu_vce_ring_emit_fence,
> > +	.test_ring = amdgpu_vce_ring_test_ring,
> > +	.test_ib = amdgpu_vce_ring_test_ib,
> > +	.insert_nop = amdgpu_ring_insert_nop,
> > +	.pad_ib = amdgpu_ring_generic_pad_ib,
> > +	.begin_use = amdgpu_vce_ring_begin_use,
> > +	.end_use = amdgpu_vce_ring_end_use,
> > +};
> > +
> > +static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < adev->vce.num_rings; i++) {
> > +		adev->vce.ring[i].funcs = &vce_v1_0_ring_funcs;
> > +		adev->vce.ring[i].me = i;
> > +	}
> > +};
> > +
> > +static const struct amdgpu_irq_src_funcs vce_v1_0_irq_funcs = {
> > +	.set = vce_v1_0_set_interrupt_state,
> > +	.process = vce_v1_0_process_interrupt,
> > +};
> > +
> > +static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev)
> > +{
> > +	adev->vce.irq.num_types = 1;
> > +	adev->vce.irq.funcs = &vce_v1_0_irq_funcs;
> > +};
> > +
> > +const struct amdgpu_ip_block_version vce_v1_0_ip_block = {
> > +	.type = AMD_IP_BLOCK_TYPE_VCE,
> > +	.major = 1,
> > +	.minor = 0,
> > +	.rev = 0,
> > +	.funcs = &vce_v1_0_ip_funcs,
> > +};
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> > b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> > new file mode 100644
> > index 000000000000..206e7bec897f
> > --- /dev/null
> > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
> > @@ -0,0 +1,32 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright 2025 Advanced Micro Devices, Inc.
> > + * Copyright 2025 Valve Corporation
> > + * Copyright 2025 Alexandre Demers
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > obtaining a
> > + * copy of this software and associated documentation files (the
> > "Software"),
> > + * to deal in the Software without restriction, including without
> > limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > sublicense,
> > + * and/or sell copies of the Software, and to permit persons to
> > whom the
> > + * Software is furnished to do so, subject to the following
> > conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be
> > included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > EVENT SHALL
> > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> > DAMAGES OR
> > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > OTHERWISE,
> > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
> > USE OR
> > + * OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > + */
> > +
> > +#ifndef __VCE_V1_0_H__
> > +#define __VCE_V1_0_H__
> > +
> > +extern const struct amdgpu_ip_block_version vce_v1_0_ip_block;
> > +
> > +#endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-29 22:48     ` Timur Kristóf
@ 2025-10-30 11:12       ` Christian König
  2025-10-30 13:47         ` Timur Kristóf
  0 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2025-10-30 11:12 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/29/25 23:48, Timur Kristóf wrote:
>>> +	ASSERT(adev->vce.vcpu_bo);
>>
>> Please drop that.
> 
> Sure, but can you say why?

ASSERT either uses BUG_ON() or WARN_ON().

BUG_ON() will crash the kernel immediately and WARN_ON will warn, continue and then crash.

The justification for a BUG_ON() is to prevent further data corruption and that is not the case here.

What you can do is to use something like "if (WARN_ON(...)) return -EINVAL;".

>>
>>> +
>>> +	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
>>> +	if (r) {
>>> +		dev_err(adev->dev, "%s (%d) failed to reserve VCE
>>> bo\n", __func__, r);
>>> +		return r;
>>> +	}
>>> +
>>> +	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void **)&cpu_addr);
>>> +	if (r) {
>>> +		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>> +		dev_err(adev->dev, "%s (%d) VCE map failed\n",
>>> __func__, r);
>>> +		return r;
>>> +	}
>>
>> That part is actually pretty pointless the cpu addr is already
>> available as adev->vce.cpu_addr.
> 
> I don't think so. amdgpu_vce_resume actually unmaps and unreserves the
> VCE BO, so I think we need to map and reserve it again if we want to
> access it again. Am I misunderstanding something?

Yeah, I see. But that is a totally pointless leftover from radeon as well which we should probably be removed.

The VCE BO needs to stay at the same location before and after resume since the FW code is not relocateable once started.

So we need to keep it pinned all the time and so can keep it CPU mapped all the time as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-30 11:12       ` Christian König
@ 2025-10-30 13:47         ` Timur Kristóf
  2025-10-30 13:56           ` Christian König
  0 siblings, 1 reply; 41+ messages in thread
From: Timur Kristóf @ 2025-10-30 13:47 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On Thu, 2025-10-30 at 12:12 +0100, Christian König wrote:
> On 10/29/25 23:48, Timur Kristóf wrote:
> > > > +	ASSERT(adev->vce.vcpu_bo);
> > > 
> > > Please drop that.
> > 
> > Sure, but can you say why?
> 
> ASSERT either uses BUG_ON() or WARN_ON().
> 
> BUG_ON() will crash the kernel immediately and WARN_ON will warn,
> continue and then crash.
> 
> The justification for a BUG_ON() is to prevent further data
> corruption and that is not the case here.

Thanks for explaining that. Technically the vcpu_bo should never be
NULL, so I think I'll just go with your original suggestion and remove
the assertion.

> 
> What you can do is to use something like "if (WARN_ON(...)) return -
> EINVAL;".
> 
> > > 
> > > > +
> > > > +	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> > > > +	if (r) {
> > > > +		dev_err(adev->dev, "%s (%d) failed to reserve
> > > > VCE
> > > > bo\n", __func__, r);
> > > > +		return r;
> > > > +	}
> > > > +
> > > > +	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void
> > > > **)&cpu_addr);
> > > > +	if (r) {
> > > > +		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > +		dev_err(adev->dev, "%s (%d) VCE map failed\n",
> > > > __func__, r);
> > > > +		return r;
> > > > +	}
> > > 
> > > That part is actually pretty pointless the cpu addr is already
> > > available as adev->vce.cpu_addr.
> > 
> > I don't think so. amdgpu_vce_resume actually unmaps and unreserves
> > the
> > VCE BO, so I think we need to map and reserve it again if we want
> > to
> > access it again. Am I misunderstanding something?
> 
> Yeah, I see. But that is a totally pointless leftover from radeon as
> well which we should probably be removed.
> 
> The VCE BO needs to stay at the same location before and after resume
> since the FW code is not relocateable once started.
> 
> So we need to keep it pinned all the time and so can keep it CPU
> mapped all the time as well.

Right, that makes a lot of sense. I can do it, but I'd like to be
careful about it because it sounds like this would affect all VCE
versions and not just VCE1.

Do you prefer that I add a patch to this series to deal with that, or
would it be better to do that after this series lands?

Thanks & best regards,
Timur


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block
  2025-10-30 13:47         ` Timur Kristóf
@ 2025-10-30 13:56           ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2025-10-30 13:56 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira

On 10/30/25 14:47, Timur Kristóf wrote:
> On Thu, 2025-10-30 at 12:12 +0100, Christian König wrote:
>> On 10/29/25 23:48, Timur Kristóf wrote:
>>>>> +	ASSERT(adev->vce.vcpu_bo);
>>>>
>>>> Please drop that.
>>>
>>> Sure, but can you say why?
>>
>> ASSERT either uses BUG_ON() or WARN_ON().
>>
>> BUG_ON() will crash the kernel immediately and WARN_ON will warn,
>> continue and then crash.
>>
>> The justification for a BUG_ON() is to prevent further data
>> corruption and that is not the case here.
> 
> Thanks for explaining that. Technically the vcpu_bo should never be
> NULL, so I think I'll just go with your original suggestion and remove
> the assertion.
> 
>>
>> What you can do is to use something like "if (WARN_ON(...)) return -
>> EINVAL;".
>>
>>>>
>>>>> +
>>>>> +	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
>>>>> +	if (r) {
>>>>> +		dev_err(adev->dev, "%s (%d) failed to reserve
>>>>> VCE
>>>>> bo\n", __func__, r);
>>>>> +		return r;
>>>>> +	}
>>>>> +
>>>>> +	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, (void
>>>>> **)&cpu_addr);
>>>>> +	if (r) {
>>>>> +		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>> +		dev_err(adev->dev, "%s (%d) VCE map failed\n",
>>>>> __func__, r);
>>>>> +		return r;
>>>>> +	}
>>>>
>>>> That part is actually pretty pointless the cpu addr is already
>>>> available as adev->vce.cpu_addr.
>>>
>>> I don't think so. amdgpu_vce_resume actually unmaps and unreserves
>>> the
>>> VCE BO, so I think we need to map and reserve it again if we want
>>> to
>>> access it again. Am I misunderstanding something?
>>
>> Yeah, I see. But that is a totally pointless leftover from radeon as
>> well which we should probably be removed.
>>
>> The VCE BO needs to stay at the same location before and after resume
>> since the FW code is not relocateable once started.
>>
>> So we need to keep it pinned all the time and so can keep it CPU
>> mapped all the time as well.
> 
> Right, that makes a lot of sense. I can do it, but I'd like to be
> careful about it because it sounds like this would affect all VCE
> versions and not just VCE1.
> 
> Do you prefer that I add a patch to this series to deal with that, or
> would it be better to do that after this series lands?

Add a patch early into the series to clean that up.

It should be a pretty straight forward change and I can throw it into our CI system which should have plenty of HW still using VCE (Polaris/Vega/MI*).

Thanks,
Christian.

> 
> Thanks & best regards,
> Timur
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better
  2025-10-29 19:46     ` Deucher, Alexander
@ 2025-11-03 16:01       ` timur.kristof
  0 siblings, 0 replies; 41+ messages in thread
From: timur.kristof @ 2025-11-03 16:01 UTC (permalink / raw)
  To: Deucher, Alexander, Koenig, Christian,
	amd-gfx@lists.freedesktop.org, Alexandre Demers, Rodrigo Siqueira,
	Liu, Leo

On Wed, 2025-10-29 at 19:46 +0000, Deucher, Alexander wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Koenig, Christian <Christian.Koenig@amd.com>
> > Sent: Wednesday, October 29, 2025 8:02 AM
> > To: Timur Kristóf <timur.kristof@gmail.com>;
> > amd-gfx@lists.freedesktop.org;
> > Deucher, Alexander <Alexander.Deucher@amd.com>; Alexandre Demers
> > <alexandre.f.demers@gmail.com>; Rodrigo Siqueira
> > <siqueira@igalia.com>; Liu,
> > Leo <Leo.Liu@amd.com>
> > Subject: Re: [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL
> > timeout better
> > 
> > On 10/28/25 23:06, Timur Kristóf wrote:
> > > Sometimes the VCE PLL times out while we are programming it.
> > > When it happens, the VCE still works, but much slower.
> > > Observed on some Tahiti boards, but not all:
> > > - FirePro W9000 has the issue
> > > - Radeon R9 280X not affected
> > > - Radeon HD 7990 not affected
> > > 
> > > Continue the complete VCE PLL programming sequence even when it
> > > timed
> > > out. With this, the VCE will work fine and faster after the
> > > timeout
> > > happened.
> > 
> > Mhm, interesting. No idea what could be causing this.
> > 
> > Not sure if just ignoring the error is ok or not. @Alex?
> 
> Looks like these registers can also be accessed indirectly via a
> different index/data accessor besides SMC.  I don't know whether it
> matters or not.  


I've tried various things to work around this issue, including the
indirect accessors that Alex suggested, but they didn't help with the
timeout.

After some trial and error I think that on this one specific chip, the
PLL takes forever to wake from sleep mode. Unfortunately, just
increasing the delays or adding an extra timeout does not solve it.

However, if I never put the PLL into sleep mode in si_set_vce_clocks
then the timeout never happens. It works if I either just leave the PLL
in bypass mode or if I put it in reset mode when turning the VCE clocks
off.

I would lean towards leaving it in bypass mode, because that seems to
be the least risky and simplest solution. Does that sound OK to you
guys?

Thanks,
Timur


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-11-03 16:01 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-28 22:06 [PATCH 00/14] drm/amdgpu: Support VCE1 IP block Timur Kristóf
2025-10-28 22:06 ` [PATCH 01/14] drm/amdgpu/gmc: Don't hardcode GART page count before GTT Timur Kristóf
2025-10-29 10:00   ` Christian König
2025-10-29 11:41     ` Timur Kristóf
2025-10-28 22:06 ` [PATCH 02/14] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
2025-10-29 10:00   ` Christian König
2025-10-28 22:06 ` [PATCH 03/14] drm/amdgpu/gmc6: Add GART space for VCPU BO Timur Kristóf
2025-10-29 10:05   ` Christian König
2025-10-29 11:26     ` Timur Kristóf
2025-10-28 22:06 ` [PATCH 04/14] drm/amdgpu/gart: Add helper to bind VRAM BO Timur Kristóf
2025-10-29 10:16   ` Christian König
2025-10-29 10:57     ` Timur Kristóf
2025-10-28 22:06 ` [PATCH 05/14] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it Timur Kristóf
2025-10-29 10:19   ` Christian König
2025-10-29 10:48     ` Timur Kristóf
2025-10-28 22:06 ` [PATCH 06/14] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
2025-10-29 10:26   ` Christian König
2025-10-29 17:16   ` Liu, Leo
2025-10-28 22:06 ` [PATCH 07/14] drm/amdgpu/si, cik, vi: Verify IP block when querying video codecs Timur Kristóf
2025-10-29 10:35   ` Christian König
2025-10-29 10:54     ` [PATCH 07/14] drm/amdgpu/si,cik,vi: " Timur Kristóf
2025-10-28 22:06 ` [PATCH 08/14] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
2025-10-29 11:23   ` Christian König
2025-10-28 22:06 ` [PATCH 09/14] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
2025-10-29 11:28   ` Christian König
2025-10-28 22:06 ` [PATCH 10/14] drm/amdgpu/vce1: Implement VCE1 IP block Timur Kristóf
2025-10-29 11:38   ` Christian König
2025-10-29 22:48     ` Timur Kristóf
2025-10-30 11:12       ` Christian König
2025-10-30 13:47         ` Timur Kristóf
2025-10-30 13:56           ` Christian König
2025-10-28 22:06 ` [PATCH 11/14] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space Timur Kristóf
2025-10-29 11:41   ` Christian König
2025-10-28 22:06 ` [PATCH 12/14] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
2025-10-29 11:47   ` Christian König
2025-10-28 22:06 ` [PATCH 13/14] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
2025-10-29 11:51   ` Christian König
2025-10-28 22:06 ` [PATCH 14/14] drm/amdgpu/vce1: Tolerate VCE PLL timeout better Timur Kristóf
2025-10-29 12:02   ` Christian König
2025-10-29 19:46     ` Deucher, Alexander
2025-11-03 16:01       ` timur.kristof

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).