[PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3)

amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3)
@ 2025-11-06 18:44 Timur Kristóf
  2025-11-06 18:44 ` [PATCH 01/13] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

VCE1 is the Video Coding Engine 1.0 found in SI GPUs.
Add support for the VCE1 IP block, which is the last
missing piece for fully-featured SI support in amdgpu.
Co-developed by Alexandre Demers.

This VCE1 implementation is based on:
VCE2 code in amdgpu
VCE1 code in radeon
Research by Alexandre
Suggestions from Christian

The biggest challenge was getting the firmware
validation mechanism to work correctly. Due to
some limitations in the HW, the VCE1 requires
the VCPU BO to be located at a low 32-bit address.
This was achieved by placing the GART in the
LOW address space and manually mapping the
VCPU BO in the GART page table.

Also hook up the VCE1 to the DPM.

Tested on the following HW:
Radeon R9 280X (Tahiti)
Radeon HD 7990 (Tahiti)
FirePro W9000 (Tahiti)
Radeon R7 450 (Cape Verde)

Changes in v2:
- Refactor the new GART helper to be reusable
- Reduce reserved GART size from 16 MiB to 2 MiB
- Clear VCPU BO only on device creation
- Save VCPU BO on suspend, restore on resume
- Rework how an empty codec list is handled
- Add different workaround for FirePro W9000
- Address all other feedback from the review

Changes in v3:
- Drop patch to return empty codec list when VCE is not present,
  this is actually not needed.
- Instead of relying on gmc_v6 to set the GART space before GTT,
  add a new function amdgpu_vce_required_gart_pages() which is
  called from amdgpu_gtt_mgr_init() directly.

Looking forward to reviews and feedback!

Timur Kristóf (13):
  drm/amdgpu/gmc6: Place gart at low address range
  drm/amdgpu/gart: Add helper to bind VRAM pages (v2)
  drm/amdgpu/ttm: Use GART helper to map VRAM pages (v2)
  drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2)
  drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  drm/amdgpu/vce1: Clean up register definitions
  drm/amdgpu/vce1: Load VCE1 firmware
  drm/amdgpu/vce1: Implement VCE1 IP block (v2)
  drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3)
  drm/amd/pm/si: Hook up VCE1 to SI DPM
  drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs
  drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000

 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c      |  36 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h      |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       | 197 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h       |   3 +
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c         |   2 +-
 drivers/gpu/drm/amd/amdgpu/si.c               |  22 +-
 drivers/gpu/drm/amd/amdgpu/sid.h              |  40 -
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c         | 839 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h         |  32 +
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c         |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c         |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c         |  57 +-
 .../drm/amd/include/asic_reg/vce/vce_1_0_d.h  |   5 +
 .../include/asic_reg/vce/vce_1_0_sh_mask.h    |  10 +
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c    |  18 +-
 18 files changed, 1103 insertions(+), 186 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h

-- 
2.51.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/13] drm/amdgpu/gmc6: Place gart at low address range
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 02/13] drm/amdgpu/gart: Add helper to bind VRAM pages (v2) Timur Kristóf
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Instead of using a best-fit algorithm to determine which part
of the VMID 0 address space to use for GART, always use the low
address range.

A subsequent commit will use this to map the VCPU BO in GART
for the VCE1 IP block.

Split this into	a separate patch to make it easier to bisect,
in case	there are any errors in	the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index f6ad7911f1e6..499dfd78092d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -213,7 +213,7 @@ static void gmc_v6_0_vram_gtt_location(struct amdgpu_device *adev,
 
 	amdgpu_gmc_set_agp_default(adev, mc);
 	amdgpu_gmc_vram_location(adev, mc, base);
-	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_BEST_FIT);
+	amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW);
 }
 
 static void gmc_v6_0_mc_program(struct amdgpu_device *adev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/13] drm/amdgpu/gart: Add helper to bind VRAM pages (v2)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
  2025-11-06 18:44 ` [PATCH 01/13] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 03/13] drm/amdgpu/ttm: Use GART helper to map " Timur Kristóf
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Binds pages that located in VRAM to the GART page table.

Useful when a kernel BO is located in VRAM but
needs to be accessed from the GART address space,
for example to give a kernel BO a 32-bit address
when GART is placed in LOW address space.

v2:
- Refactor function to be more reusable

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 36 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h |  3 ++
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 83f3b94ed975..d2237ce9da70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -367,6 +367,42 @@ void amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
 	drm_dev_exit(idx);
 }
 
+/**
+ * amdgpu_gart_map_vram_range - map VRAM pages into the GART page table
+ *
+ * @adev: amdgpu_device pointer
+ * @pa: physical address of the first page to be mapped
+ * @start_page: first page to map in the GART aperture
+ * @num_pages: number of pages to be mapped
+ * @flags: page table entry flags
+ * @dst: CPU address of the GART table
+ *
+ * Binds a BO that is allocated in VRAM to the GART page table
+ * (all ASICs).
+ *
+ * Useful when a kernel BO is located in VRAM but
+ * needs to be accessed from the GART address space.
+ */
+void amdgpu_gart_map_vram_range(struct amdgpu_device *adev, uint64_t pa,
+				uint64_t start_page, uint64_t num_pages,
+				uint64_t flags, void *dst)
+{
+	u32 i, idx;
+
+	/* The SYSTEM flag indicates the pages aren't in VRAM. */
+	WARN_ON_ONCE(flags & AMDGPU_PTE_SYSTEM);
+
+	if (!drm_dev_enter(adev_to_drm(adev), &idx))
+		return;
+
+	for (i = 0; i < num_pages; ++i) {
+		amdgpu_gmc_set_pte_pde(adev, adev->gart.ptr,
+			start_page + i, pa + AMDGPU_GPU_PAGE_SIZE * i, flags);
+	}
+
+	drm_dev_exit(idx);
+}
+
 /**
  * amdgpu_gart_bind - bind pages into the gart page table
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index 7cc980bf4725..d3118275ddae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -64,5 +64,8 @@ void amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
 		     void *dst);
 void amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
 		      int pages, dma_addr_t *dma_addr, uint64_t flags);
+void amdgpu_gart_map_vram_range(struct amdgpu_device *adev, uint64_t pa,
+				uint64_t start_page, uint64_t num_pages,
+				uint64_t flags, void *dst);
 void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
 #endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/13] drm/amdgpu/ttm: Use GART helper to map VRAM pages (v2)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
  2025-11-06 18:44 ` [PATCH 01/13] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
  2025-11-06 18:44 ` [PATCH 02/13] drm/amdgpu/gart: Add helper to bind VRAM pages (v2) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2) Timur Kristóf
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Use the GART helper function introduced in the previous commit
to map the VRAM pages of the transfer window to GART.
No functional changes, just code cleanup.

Split this into a separate commit to make it easier to bisect,
in case there are problems in the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e226c3aff7d7..84f9d5a57d03 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -188,7 +188,6 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
 	struct amdgpu_job *job;
 	void *cpu_addr;
 	uint64_t flags;
-	unsigned int i;
 	int r;
 
 	BUG_ON(adev->mman.buffer_funcs->copy_max_bytes <
@@ -254,16 +253,9 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
 		dma_addr = &bo->ttm->dma_address[mm_cur->start >> PAGE_SHIFT];
 		amdgpu_gart_map(adev, 0, num_pages, dma_addr, flags, cpu_addr);
 	} else {
-		dma_addr_t dma_address;
-
-		dma_address = mm_cur->start;
-		dma_address += adev->vm_manager.vram_base_offset;
+		u64 pa = mm_cur->start + adev->vm_manager.vram_base_offset;
 
-		for (i = 0; i < num_pages; ++i) {
-			amdgpu_gart_map(adev, i << PAGE_SHIFT, 1, &dma_address,
-					flags, cpu_addr);
-			dma_address += PAGE_SIZE;
-		}
+		amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr);
 	}
 
 	dma_fence_put(amdgpu_job_submit(job));
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (2 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 03/13] drm/amdgpu/ttm: Use GART helper to map " Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-07  9:25   ` Christian König
  2025-11-06 18:44 ` [PATCH 05/13] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

The VCPU BO doesn't only contain the VCE firmware but also other
ranges that the VCE uses for its stack and data. Let's initialize
this to zero to avoid having garbage in the VCPU BO.

v2:
- Only clear BO after creation, not on resume.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index b9060bcd4806..e028ad0d3b7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -187,6 +187,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
 		return r;
 	}
 
+	memset_io(adev->vce.cpu_addr, 0, size);
+
 	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
 		atomic_set(&adev->vce.handles[i], 0);
 		adev->vce.filp[i] = NULL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2)
  2025-11-06 18:44 ` [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2) Timur Kristóf
@ 2025-11-07  9:25   ` Christian König
  2025-11-07  9:39     ` Timur Kristóf
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2025-11-07  9:25 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu

On 11/6/25 19:44, Timur Kristóf wrote:
> The VCPU BO doesn't only contain the VCE firmware but also other
> ranges that the VCE uses for its stack and data. Let's initialize
> this to zero to avoid having garbage in the VCPU BO.
> 
> v2:
> - Only clear BO after creation, not on resume.
> 
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

For now this patch here is Reviewed-by: Christian König <christian.koenig@amd.com> since it addresses a clear problem and potentially even need to be back-ported to older kernels.

But I think we should clean that up more full after we landed VCE1 support.

Assuming that it hold true that VCE1-3 can't continue with sessions after suspend resume we should do something like this:

1. Remove all amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr).
	As kernel BO the VCE FW BO is pinned and mapped on creation time.

2. Rename amdgpu_vce_resume() into amdgpu_vce_reload_fw() and add the memset_io() there like you originally planned.

3. Also add resetting the VCE FW handles into amdgpu_vce_reload_fw().

   E.g. something like this:
  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
  		atomic_set(&adev->vce.handles[i], 0);
  		adev->vce.filp[i] = NULL;
	}

   This way the kernel will reject submissions when userspace tries to use the same FW handles as before the suspend/resume and prevent the HW from crashing.

Does that sounds like a plan to you?

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index b9060bcd4806..e028ad0d3b7a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -187,6 +187,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
>  		return r;
>  	}
>  
> +	memset_io(adev->vce.cpu_addr, 0, size);
> +
>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
>  		atomic_set(&adev->vce.handles[i], 0);
>  		adev->vce.filp[i] = NULL;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2)
  2025-11-07  9:25   ` Christian König
@ 2025-11-07  9:39     ` Timur Kristóf
  0 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-07  9:39 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu

On Fri, 2025-11-07 at 10:25 +0100, Christian König wrote:
> On 11/6/25 19:44, Timur Kristóf wrote:
> > The VCPU BO doesn't only contain the VCE firmware but also other
> > ranges that the VCE uses for its stack and data. Let's initialize
> > this to zero to avoid having garbage in the VCPU BO.
> > 
> > v2:
> > - Only clear BO after creation, not on resume.
> > 
> > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> 
> For now this patch here is Reviewed-by: Christian König
> <christian.koenig@amd.com> since it addresses a clear problem and
> potentially even need to be back-ported to older kernels.

Thank you, I agree.

> 
> But I think we should clean that up more full after we landed VCE1
> support.

Yes, I'm happy to continue this work after the VCE 1 support lands.

> 
> Assuming that it hold true that VCE1-3 can't continue with sessions
> after suspend resume we should do something like this:
> 
> 1. Remove all amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr).
> 	As kernel BO the VCE FW BO is pinned and mapped on creation
> time.

This is already done by patch 6 of this series:
"Save/restore and pin VCPU BO for all VCE (v2)"

> 
> 2. Rename amdgpu_vce_resume() into amdgpu_vce_reload_fw() and add the
> memset_io() there like you originally planned.

Also done by patch 6 of this series, except for the rename.

> 
> 3. Also add resetting the VCE FW handles into amdgpu_vce_reload_fw().
> 
>    E.g. something like this:
>   	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
>   		atomic_set(&adev->vce.handles[i], 0);
>   		adev->vce.filp[i] = NULL;
> 	}
> 
>    This way the kernel will reject submissions when userspace tries
> to use the same FW handles as before the suspend/resume and prevent
> the HW from crashing.
> 
> Does that sounds like a plan to you?

Yes, that sounds like a good plan to me.

> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > index b9060bcd4806..e028ad0d3b7a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > @@ -187,6 +187,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device
> > *adev, unsigned long size)
> >  		return r;
> >  	}
> >  
> > +	memset_io(adev->vce.cpu_addr, 0, size);
> > +
> >  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
> >  		atomic_set(&adev->vce.handles[i], 0);
> >  		adev->vce.filp[i] = NULL;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 05/13] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (3 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2) Timur Kristóf
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu
  Cc: Leo Liu

Try to load the VCE firmware at early_init.

When the correct firmware is not found, return -ENOENT.
This way, the driver initialization will complete even
without VCE, and the GPU will be functional, albeit
without video encoding capabilities.

This is necessary because we are planning to add support
for the VCE1, and AMD hasn't yet publised the correct
firmware for this version. So we need to anticipate that
users will try to boot amdgpu on SI GPUs without the
correct VCE1 firmware present on their system.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 121 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c   |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c   |   5 +
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   |   5 +
 5 files changed, 91 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index e028ad0d3b7a..2297608c5191 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -88,82 +88,87 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 				      bool direct, struct dma_fence **fence);
 
 /**
- * amdgpu_vce_sw_init - allocate memory, load vce firmware
+ * amdgpu_vce_firmware_name() - determine the firmware file name for VCE
  *
  * @adev: amdgpu_device pointer
- * @size: size for the new BO
  *
- * First step to get VCE online, allocate memory and load the firmware
+ * Each chip that has VCE IP may need a different firmware.
+ * This function returns the name of the VCE firmware file
+ * appropriate for the current chip.
  */
-int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
+static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
 {
-	const char *fw_name;
-	const struct common_firmware_header *hdr;
-	unsigned int ucode_version, version_major, version_minor, binary_id;
-	int i, r;
-
 	switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_CIK
 	case CHIP_BONAIRE:
-		fw_name = FIRMWARE_BONAIRE;
-		break;
+		return FIRMWARE_BONAIRE;
 	case CHIP_KAVERI:
-		fw_name = FIRMWARE_KAVERI;
-		break;
+		return FIRMWARE_KAVERI;
 	case CHIP_KABINI:
-		fw_name = FIRMWARE_KABINI;
-		break;
+		return FIRMWARE_KABINI;
 	case CHIP_HAWAII:
-		fw_name = FIRMWARE_HAWAII;
-		break;
+		return FIRMWARE_HAWAII;
 	case CHIP_MULLINS:
-		fw_name = FIRMWARE_MULLINS;
-		break;
+		return FIRMWARE_MULLINS;
 #endif
 	case CHIP_TONGA:
-		fw_name = FIRMWARE_TONGA;
-		break;
+		return  FIRMWARE_TONGA;
 	case CHIP_CARRIZO:
-		fw_name = FIRMWARE_CARRIZO;
-		break;
+		return  FIRMWARE_CARRIZO;
 	case CHIP_FIJI:
-		fw_name = FIRMWARE_FIJI;
-		break;
+		return  FIRMWARE_FIJI;
 	case CHIP_STONEY:
-		fw_name = FIRMWARE_STONEY;
-		break;
+		return  FIRMWARE_STONEY;
 	case CHIP_POLARIS10:
-		fw_name = FIRMWARE_POLARIS10;
-		break;
+		return  FIRMWARE_POLARIS10;
 	case CHIP_POLARIS11:
-		fw_name = FIRMWARE_POLARIS11;
-		break;
+		return  FIRMWARE_POLARIS11;
 	case CHIP_POLARIS12:
-		fw_name = FIRMWARE_POLARIS12;
-		break;
+		return  FIRMWARE_POLARIS12;
 	case CHIP_VEGAM:
-		fw_name = FIRMWARE_VEGAM;
-		break;
+		return  FIRMWARE_VEGAM;
 	case CHIP_VEGA10:
-		fw_name = FIRMWARE_VEGA10;
-		break;
+		return  FIRMWARE_VEGA10;
 	case CHIP_VEGA12:
-		fw_name = FIRMWARE_VEGA12;
-		break;
+		return  FIRMWARE_VEGA12;
 	case CHIP_VEGA20:
-		fw_name = FIRMWARE_VEGA20;
-		break;
+		return  FIRMWARE_VEGA20;
 
 	default:
-		return -EINVAL;
+		return NULL;
 	}
+}
+
+/**
+ * amdgpu_vce_early_init() - try to load VCE firmware
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Tries to load the VCE firmware.
+ *
+ * When not found, returns ENOENT so that the driver can
+ * still load and initialize the rest of the IP blocks.
+ * The GPU can function just fine without VCE, they will just
+ * not support video encoding.
+ */
+int amdgpu_vce_early_init(struct amdgpu_device *adev)
+{
+	const char *fw_name = amdgpu_vce_firmware_name(adev);
+	const struct common_firmware_header *hdr;
+	unsigned int ucode_version, version_major, version_minor, binary_id;
+	int r;
+
+	if (!fw_name)
+		return -ENOENT;
 
 	r = amdgpu_ucode_request(adev, &adev->vce.fw, AMDGPU_UCODE_REQUIRED, "%s", fw_name);
 	if (r) {
-		dev_err(adev->dev, "amdgpu_vce: Can't validate firmware \"%s\"\n",
-			fw_name);
+		dev_err(adev->dev,
+			"amdgpu_vce: Firmware \"%s\" not found or failed to validate (%d)\n",
+			fw_name, r);
+
 		amdgpu_ucode_release(&adev->vce.fw);
-		return r;
+		return -ENOENT;
 	}
 
 	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
@@ -172,11 +177,35 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
 	version_major = (ucode_version >> 20) & 0xfff;
 	version_minor = (ucode_version >> 8) & 0xfff;
 	binary_id = ucode_version & 0xff;
-	DRM_INFO("Found VCE firmware Version: %d.%d Binary ID: %d\n",
+	dev_info(adev->dev, "Found VCE firmware Version: %d.%d Binary ID: %d\n",
 		version_major, version_minor, binary_id);
 	adev->vce.fw_version = ((version_major << 24) | (version_minor << 16) |
 				(binary_id << 8));
 
+	return 0;
+}
+
+/**
+ * amdgpu_vce_sw_init() - allocate memory for VCE BO
+ *
+ * @adev: amdgpu_device pointer
+ * @size: size for the new BO
+ *
+ * First step to get VCE online: allocate memory for VCE BO.
+ * The VCE firmware binary is copied into the VCE BO later,
+ * in amdgpu_vce_resume. The VCE executes its code from the
+ * VCE BO and also uses the space in this BO for its stack and data.
+ *
+ * Ideally this BO should be placed in VRAM for optimal performance,
+ * although technically it also runs from system RAM (albeit slowly).
+ */
+int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
+{
+	int i, r;
+
+	if (!adev->vce.fw)
+		return -ENOENT;
+
 	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
 				    AMDGPU_GEM_DOMAIN_VRAM |
 				    AMDGPU_GEM_DOMAIN_GTT,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 6e53f872d084..22acd7b35945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -53,6 +53,7 @@ struct amdgpu_vce {
 	unsigned		num_rings;
 };
 
+int amdgpu_vce_early_init(struct amdgpu_device *adev);
 int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size);
 int amdgpu_vce_sw_fini(struct amdgpu_device *adev);
 int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
index bee3e904a6bc..8ea8a6193492 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
@@ -407,6 +407,11 @@ static void vce_v2_0_enable_mgcg(struct amdgpu_device *adev, bool enable,
 static int vce_v2_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
 
 	adev->vce.num_rings = 2;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 708123899c41..719e9643c43d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -399,6 +399,7 @@ static unsigned vce_v3_0_get_harvest_config(struct amdgpu_device *adev)
 static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
 
 	adev->vce.harvest_config = vce_v3_0_get_harvest_config(adev);
 
@@ -407,6 +408,10 @@ static int vce_v3_0_early_init(struct amdgpu_ip_block *ip_block)
 	    (AMDGPU_VCE_HARVEST_VCE0 | AMDGPU_VCE_HARVEST_VCE1))
 		return -ENOENT;
 
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
+
 	adev->vce.num_rings = 3;
 
 	vce_v3_0_set_ring_funcs(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
index 335bda64ff5b..2d64002bed61 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
@@ -410,6 +410,11 @@ static int vce_v4_0_stop(struct amdgpu_device *adev)
 static int vce_v4_0_early_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
 
 	if (amdgpu_sriov_vf(adev)) /* currently only VCN0 support SRIOV */
 		adev->vce.num_rings = 1;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (4 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 05/13] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-07  9:49   ` Christian König
  2025-11-06 18:44 ` [PATCH 07/13] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

VCE uses the VCPU BO to keep track of currently active
encoding sessions. To make sure the VCE functions correctly
after suspend/resume, save the VCPU BO to system RAM on
suspend and restore it on resume.

Additionally, make sure to keep the VCPU BO pinned.
The VCPU BO needs to stay at the same location before and after
sleep/resume because the FW code is not relocatable once it's
started.

Unfortunately this is not enough to make VCE suspend work when
there are encoding sessions active, so don't allow that yet.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-------------
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++---------------------
 2 files changed, 24 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 2297608c5191..4beec5b56c4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
 	if (!adev->vce.fw)
 		return -ENOENT;
 
+	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
+	if (!adev->vce.saved_bo)
+		return -ENOMEM;
+
 	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
 				    AMDGPU_GEM_DOMAIN_VRAM |
 				    AMDGPU_GEM_DOMAIN_GTT,
@@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device *adev)
 	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev->vce.gpu_addr,
 		(void **)&adev->vce.cpu_addr);
 
+	kvfree(adev->vce.saved_bo);
+	adev->vce.saved_bo = NULL;
+
 	return 0;
 }
 
@@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
  */
 int amdgpu_vce_suspend(struct amdgpu_device *adev)
 {
-	int i;
+	int i, idx;
 
 	cancel_delayed_work_sync(&adev->vce.idle_work);
 
 	if (adev->vce.vcpu_bo == NULL)
 		return 0;
 
+	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
+		memcpy_fromio(adev->vce.saved_bo, adev->vce.cpu_addr,
+			      amdgpu_bo_size(adev->vce.vcpu_bo));
+		drm_dev_exit(idx);
+	}
+
 	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
 		if (atomic_read(&adev->vce.handles[i]))
 			break;
@@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct amdgpu_device *adev)
  */
 int amdgpu_vce_resume(struct amdgpu_device *adev)
 {
-	void *cpu_addr;
-	const struct common_firmware_header *hdr;
-	unsigned int offset;
-	int r, idx;
+	int idx;
 
 	if (adev->vce.vcpu_bo == NULL)
 		return -EINVAL;
 
-	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
-	if (r) {
-		dev_err(adev->dev, "(%d) failed to reserve VCE bo\n", r);
-		return r;
-	}
-
-	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
-	if (r) {
-		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
-		dev_err(adev->dev, "(%d) VCE map failed\n", r);
-		return r;
-	}
-
-	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
-	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
-
 	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
-		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
-			    adev->vce.fw->size - offset);
+		memcpy_toio(adev->vce.cpu_addr, adev->vce.saved_bo,
+			    amdgpu_bo_size(adev->vce.vcpu_bo));
 		drm_dev_exit(idx);
 	}
 
-	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
-
-	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
index 2d64002bed61..21b6656b2f41 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
@@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct amdgpu_ip_block *ip_block)
 		return r;
 
 	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
-		const struct common_firmware_header *hdr;
-		unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
-
-		adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
-		if (!adev->vce.saved_bo)
-			return -ENOMEM;
-
-		hdr = (const struct common_firmware_header *)adev->vce.fw->data;
+		const struct common_firmware_header *hdr =
+			(const struct common_firmware_header *)adev->vce.fw->data;
 		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id = AMDGPU_UCODE_ID_VCE;
 		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw = adev->vce.fw;
 		adev->firmware.fw_size +=
@@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct amdgpu_ip_block *ip_block)
 	/* free MM table */
 	amdgpu_virt_free_mm_table(adev);
 
-	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
-		kvfree(adev->vce.saved_bo);
-		adev->vce.saved_bo = NULL;
-	}
-
 	r = amdgpu_vce_suspend(adev);
 	if (r)
 		return r;
@@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct amdgpu_ip_block *ip_block)
 static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
-	int r, idx;
-
-	if (adev->vce.vcpu_bo == NULL)
-		return 0;
-
-	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
-		if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
-			unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
-			void *ptr = adev->vce.cpu_addr;
-
-			memcpy_fromio(adev->vce.saved_bo, ptr, size);
-		}
-		drm_dev_exit(idx);
-	}
+	int r;
 
 	/*
 	 * Proper cleanups before halting the HW engine:
@@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
 static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
-	int r, idx;
-
-	if (adev->vce.vcpu_bo == NULL)
-		return -EINVAL;
-
-	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
-
-		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
-			unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
-			void *ptr = adev->vce.cpu_addr;
+	int r;
 
-			memcpy_toio(ptr, adev->vce.saved_bo, size);
-			drm_dev_exit(idx);
-		}
-	} else {
-		r = amdgpu_vce_resume(adev);
-		if (r)
-			return r;
-	}
+	r = amdgpu_vce_resume(adev);
+	if (r)
+		return r;
 
 	return vce_v4_0_hw_init(ip_block);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-06 18:44 ` [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2) Timur Kristóf
@ 2025-11-07  9:49   ` Christian König
  2025-11-07  9:53     ` Timur Kristóf
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2025-11-07  9:49 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On 11/6/25 19:44, Timur Kristóf wrote:
> VCE uses the VCPU BO to keep track of currently active
> encoding sessions. To make sure the VCE functions correctly
> after suspend/resume, save the VCPU BO to system RAM on
> suspend and restore it on resume.
> 
> Additionally, make sure to keep the VCPU BO pinned.
> The VCPU BO needs to stay at the same location before and after
> sleep/resume because the FW code is not relocatable once it's
> started.
> 
> Unfortunately this is not enough to make VCE suspend work when
> there are encoding sessions active, so don't allow that yet.

The question if this is the right approach or not can only Leo and Ruijing answer.

If we can get VCE1-3 to keep session working after suspend/resume we should make this change otherwise we should rather make all old sessions invalid after suspend/resume and only re-load the FW.

Anyway I think you should make the removal of "amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);" a separate patch, cause that seems to be a good cleanup no matter what.

Regards,
Christian.

> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-------------
>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++---------------------
>  2 files changed, 24 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index 2297608c5191..4beec5b56c4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size)
>  	if (!adev->vce.fw)
>  		return -ENOENT;
>  
> +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> +	if (!adev->vce.saved_bo)
> +		return -ENOMEM;
> +
>  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>  				    AMDGPU_GEM_DOMAIN_VRAM |
>  				    AMDGPU_GEM_DOMAIN_GTT,
> @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device *adev)
>  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev->vce.gpu_addr,
>  		(void **)&adev->vce.cpu_addr);
>  
> +	kvfree(adev->vce.saved_bo);
> +	adev->vce.saved_bo = NULL;
> +
>  	return 0;
>  }
>  
> @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
>   */
>  int amdgpu_vce_suspend(struct amdgpu_device *adev)
>  {
> -	int i;
> +	int i, idx;
>  
>  	cancel_delayed_work_sync(&adev->vce.idle_work);
>  
>  	if (adev->vce.vcpu_bo == NULL)
>  		return 0;
>  
> +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> +		memcpy_fromio(adev->vce.saved_bo, adev->vce.cpu_addr,
> +			      amdgpu_bo_size(adev->vce.vcpu_bo));
> +		drm_dev_exit(idx);
> +	}
> +
>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
>  		if (atomic_read(&adev->vce.handles[i]))
>  			break;
> @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct amdgpu_device *adev)
>   */
>  int amdgpu_vce_resume(struct amdgpu_device *adev)
>  {
> -	void *cpu_addr;
> -	const struct common_firmware_header *hdr;
> -	unsigned int offset;
> -	int r, idx;
> +	int idx;
>  
>  	if (adev->vce.vcpu_bo == NULL)
>  		return -EINVAL;
>  
> -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> -	if (r) {
> -		dev_err(adev->dev, "(%d) failed to reserve VCE bo\n", r);
> -		return r;
> -	}
> -
> -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
> -	if (r) {
> -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> -		dev_err(adev->dev, "(%d) VCE map failed\n", r);
> -		return r;
> -	}
> -
> -	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
> -	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> -
>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> -		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
> -			    adev->vce.fw->size - offset);
> +		memcpy_toio(adev->vce.cpu_addr, adev->vce.saved_bo,
> +			    amdgpu_bo_size(adev->vce.vcpu_bo));
>  		drm_dev_exit(idx);
>  	}
>  
> -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> -
> -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> -
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> index 2d64002bed61..21b6656b2f41 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct amdgpu_ip_block *ip_block)
>  		return r;
>  
>  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> -		const struct common_firmware_header *hdr;
> -		unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
> -
> -		adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> -		if (!adev->vce.saved_bo)
> -			return -ENOMEM;
> -
> -		hdr = (const struct common_firmware_header *)adev->vce.fw->data;
> +		const struct common_firmware_header *hdr =
> +			(const struct common_firmware_header *)adev->vce.fw->data;
>  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id = AMDGPU_UCODE_ID_VCE;
>  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw = adev->vce.fw;
>  		adev->firmware.fw_size +=
> @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct amdgpu_ip_block *ip_block)
>  	/* free MM table */
>  	amdgpu_virt_free_mm_table(adev);
>  
> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> -		kvfree(adev->vce.saved_bo);
> -		adev->vce.saved_bo = NULL;
> -	}
> -
>  	r = amdgpu_vce_suspend(adev);
>  	if (r)
>  		return r;
> @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct amdgpu_ip_block *ip_block)
>  static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> -	int r, idx;
> -
> -	if (adev->vce.vcpu_bo == NULL)
> -		return 0;
> -
> -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> -		if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> -			unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
> -			void *ptr = adev->vce.cpu_addr;
> -
> -			memcpy_fromio(adev->vce.saved_bo, ptr, size);
> -		}
> -		drm_dev_exit(idx);
> -	}
> +	int r;
>  
>  	/*
>  	 * Proper cleanups before halting the HW engine:
> @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
>  static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> -	int r, idx;
> -
> -	if (adev->vce.vcpu_bo == NULL)
> -		return -EINVAL;
> -
> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> -
> -		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> -			unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
> -			void *ptr = adev->vce.cpu_addr;
> +	int r;
>  
> -			memcpy_toio(ptr, adev->vce.saved_bo, size);
> -			drm_dev_exit(idx);
> -		}
> -	} else {
> -		r = amdgpu_vce_resume(adev);
> -		if (r)
> -			return r;
> -	}
> +	r = amdgpu_vce_resume(adev);
> +	if (r)
> +		return r;
>  
>  	return vce_v4_0_hw_init(ip_block);
>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07  9:49   ` Christian König
@ 2025-11-07  9:53     ` Timur Kristóf
  2025-11-07 10:01       ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-07  9:53 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
> On 11/6/25 19:44, Timur Kristóf wrote:
> > VCE uses the VCPU BO to keep track of currently active
> > encoding sessions. To make sure the VCE functions correctly
> > after suspend/resume, save the VCPU BO to system RAM on
> > suspend and restore it on resume.
> > 
> > Additionally, make sure to keep the VCPU BO pinned.
> > The VCPU BO needs to stay at the same location before and after
> > sleep/resume because the FW code is not relocatable once it's
> > started.
> > 
> > Unfortunately this is not enough to make VCE suspend work when
> > there are encoding sessions active, so don't allow that yet.
> 
> The question if this is the right approach or not can only Leo and
> Ruijing answer.
> 
> If we can get VCE1-3 to keep session working after suspend/resume we
> should make this change otherwise we should rather make all old
> sessions invalid after suspend/resume and only re-load the FW.
> 
> Anyway I think you should make the removal of "amdgpu_bo_kmap(adev-
> >vce.vcpu_bo, &cpu_addr);" a separate patch, cause that seems to be a
> good cleanup no matter what.
> 

This change is necessary for the VCE1 code when it loads the firmware
signature. Without this patch, we would need to call reserve(), kmap(),
kunmap(), kunreserve() in vce_v1_0_load_fw_signature().

Let me know which approach you prefer.

> 
> > 
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-------------
> >  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-----------------
> > ----
> >  2 files changed, 24 insertions(+), 72 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > index 2297608c5191..4beec5b56c4f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct amdgpu_device
> > *adev, unsigned long size)
> >  	if (!adev->vce.fw)
> >  		return -ENOENT;
> >  
> > +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> > +	if (!adev->vce.saved_bo)
> > +		return -ENOMEM;
> > +
> >  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> >  				    AMDGPU_GEM_DOMAIN_VRAM |
> >  				    AMDGPU_GEM_DOMAIN_GTT,
> > @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device
> > *adev)
> >  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
> > >vce.gpu_addr,
> >  		(void **)&adev->vce.cpu_addr);
> >  
> > +	kvfree(adev->vce.saved_bo);
> > +	adev->vce.saved_bo = NULL;
> > +
> >  	return 0;
> >  }
> >  
> > @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
> > amdgpu_device *adev, struct amdgpu_ring *ring)
> >   */
> >  int amdgpu_vce_suspend(struct amdgpu_device *adev)
> >  {
> > -	int i;
> > +	int i, idx;
> >  
> >  	cancel_delayed_work_sync(&adev->vce.idle_work);
> >  
> >  	if (adev->vce.vcpu_bo == NULL)
> >  		return 0;
> >  
> > +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > +		memcpy_fromio(adev->vce.saved_bo, adev-
> > >vce.cpu_addr,
> > +			      amdgpu_bo_size(adev->vce.vcpu_bo));
> > +		drm_dev_exit(idx);
> > +	}
> > +
> >  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
> >  		if (atomic_read(&adev->vce.handles[i]))
> >  			break;
> > @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct amdgpu_device
> > *adev)
> >   */
> >  int amdgpu_vce_resume(struct amdgpu_device *adev)
> >  {
> > -	void *cpu_addr;
> > -	const struct common_firmware_header *hdr;
> > -	unsigned int offset;
> > -	int r, idx;
> > +	int idx;
> >  
> >  	if (adev->vce.vcpu_bo == NULL)
> >  		return -EINVAL;
> >  
> > -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> > -	if (r) {
> > -		dev_err(adev->dev, "(%d) failed to reserve VCE
> > bo\n", r);
> > -		return r;
> > -	}
> > -
> > -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
> > -	if (r) {
> > -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > -		dev_err(adev->dev, "(%d) VCE map failed\n", r);
> > -		return r;
> > -	}
> > -
> > -	hdr = (const struct common_firmware_header *)adev->vce.fw-
> > >data;
> > -	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> > -
> >  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > -		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
> > -			    adev->vce.fw->size - offset);
> > +		memcpy_toio(adev->vce.cpu_addr, adev-
> > >vce.saved_bo,
> > +			    amdgpu_bo_size(adev->vce.vcpu_bo));
> >  		drm_dev_exit(idx);
> >  	}
> >  
> > -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> > -
> > -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > -
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > index 2d64002bed61..21b6656b2f41 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
> > amdgpu_ip_block *ip_block)
> >  		return r;
> >  
> >  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > -		const struct common_firmware_header *hdr;
> > -		unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
> > -
> > -		adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> > -		if (!adev->vce.saved_bo)
> > -			return -ENOMEM;
> > -
> > -		hdr = (const struct common_firmware_header *)adev-
> > >vce.fw->data;
> > +		const struct common_firmware_header *hdr =
> > +			(const struct common_firmware_header
> > *)adev->vce.fw->data;
> >  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
> > = AMDGPU_UCODE_ID_VCE;
> >  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
> > adev->vce.fw;
> >  		adev->firmware.fw_size +=
> > @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
> > amdgpu_ip_block *ip_block)
> >  	/* free MM table */
> >  	amdgpu_virt_free_mm_table(adev);
> >  
> > -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > -		kvfree(adev->vce.saved_bo);
> > -		adev->vce.saved_bo = NULL;
> > -	}
> > -
> >  	r = amdgpu_vce_suspend(adev);
> >  	if (r)
> >  		return r;
> > @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
> > amdgpu_ip_block *ip_block)
> >  static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
> >  {
> >  	struct amdgpu_device *adev = ip_block->adev;
> > -	int r, idx;
> > -
> > -	if (adev->vce.vcpu_bo == NULL)
> > -		return 0;
> > -
> > -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > -		if (adev->firmware.load_type ==
> > AMDGPU_FW_LOAD_PSP) {
> > -			unsigned size = amdgpu_bo_size(adev-
> > >vce.vcpu_bo);
> > -			void *ptr = adev->vce.cpu_addr;
> > -
> > -			memcpy_fromio(adev->vce.saved_bo, ptr,
> > size);
> > -		}
> > -		drm_dev_exit(idx);
> > -	}
> > +	int r;
> >  
> >  	/*
> >  	 * Proper cleanups before halting the HW engine:
> > @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
> > amdgpu_ip_block *ip_block)
> >  static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
> >  {
> >  	struct amdgpu_device *adev = ip_block->adev;
> > -	int r, idx;
> > -
> > -	if (adev->vce.vcpu_bo == NULL)
> > -		return -EINVAL;
> > -
> > -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > -
> > -		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > -			unsigned size = amdgpu_bo_size(adev-
> > >vce.vcpu_bo);
> > -			void *ptr = adev->vce.cpu_addr;
> > +	int r;
> >  
> > -			memcpy_toio(ptr, adev->vce.saved_bo,
> > size);
> > -			drm_dev_exit(idx);
> > -		}
> > -	} else {
> > -		r = amdgpu_vce_resume(adev);
> > -		if (r)
> > -			return r;
> > -	}
> > +	r = amdgpu_vce_resume(adev);
> > +	if (r)
> > +		return r;
> >  
> >  	return vce_v4_0_hw_init(ip_block);
> >  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07  9:53     ` Timur Kristóf
@ 2025-11-07 10:01       ` Christian König
  2025-11-07 10:47         ` Timur Kristóf
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2025-11-07 10:01 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On 11/7/25 10:53, Timur Kristóf wrote:
> On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
>> On 11/6/25 19:44, Timur Kristóf wrote:
>>> VCE uses the VCPU BO to keep track of currently active
>>> encoding sessions. To make sure the VCE functions correctly
>>> after suspend/resume, save the VCPU BO to system RAM on
>>> suspend and restore it on resume.
>>>
>>> Additionally, make sure to keep the VCPU BO pinned.
>>> The VCPU BO needs to stay at the same location before and after
>>> sleep/resume because the FW code is not relocatable once it's
>>> started.
>>>
>>> Unfortunately this is not enough to make VCE suspend work when
>>> there are encoding sessions active, so don't allow that yet.
>>
>> The question if this is the right approach or not can only Leo and
>> Ruijing answer.
>>
>> If we can get VCE1-3 to keep session working after suspend/resume we
>> should make this change otherwise we should rather make all old
>> sessions invalid after suspend/resume and only re-load the FW.
>>
>> Anyway I think you should make the removal of "amdgpu_bo_kmap(adev-
>>> vce.vcpu_bo, &cpu_addr);" a separate patch, cause that seems to be a
>> good cleanup no matter what.
>>
> 
> This change is necessary for the VCE1 code when it loads the firmware
> signature. Without this patch, we would need to call reserve(), kmap(),
> kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
> 
> Let me know which approach you prefer.

In this case definately make removal of amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr) a separate patch.

I want to get initial VCE1 working and landed independent of what Leo/Ruijing say to suspend/resume.

Can be that suspend/resume is then still broken, but that is also the case for VCE2-3 then.

Regards,
Christian.

> 
>>
>>>
>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-------------
>>>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-----------------
>>> ----
>>>  2 files changed, 24 insertions(+), 72 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>> index 2297608c5191..4beec5b56c4f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>> @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct amdgpu_device
>>> *adev, unsigned long size)
>>>  	if (!adev->vce.fw)
>>>  		return -ENOENT;
>>>  
>>> +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
>>> +	if (!adev->vce.saved_bo)
>>> +		return -ENOMEM;
>>> +
>>>  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>>>  				    AMDGPU_GEM_DOMAIN_VRAM |
>>>  				    AMDGPU_GEM_DOMAIN_GTT,
>>> @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device
>>> *adev)
>>>  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
>>>> vce.gpu_addr,
>>>  		(void **)&adev->vce.cpu_addr);
>>>  
>>> +	kvfree(adev->vce.saved_bo);
>>> +	adev->vce.saved_bo = NULL;
>>> +
>>>  	return 0;
>>>  }
>>>  
>>> @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
>>> amdgpu_device *adev, struct amdgpu_ring *ring)
>>>   */
>>>  int amdgpu_vce_suspend(struct amdgpu_device *adev)
>>>  {
>>> -	int i;
>>> +	int i, idx;
>>>  
>>>  	cancel_delayed_work_sync(&adev->vce.idle_work);
>>>  
>>>  	if (adev->vce.vcpu_bo == NULL)
>>>  		return 0;
>>>  
>>> +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>> +		memcpy_fromio(adev->vce.saved_bo, adev-
>>>> vce.cpu_addr,
>>> +			      amdgpu_bo_size(adev->vce.vcpu_bo));
>>> +		drm_dev_exit(idx);
>>> +	}
>>> +
>>>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
>>>  		if (atomic_read(&adev->vce.handles[i]))
>>>  			break;
>>> @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct amdgpu_device
>>> *adev)
>>>   */
>>>  int amdgpu_vce_resume(struct amdgpu_device *adev)
>>>  {
>>> -	void *cpu_addr;
>>> -	const struct common_firmware_header *hdr;
>>> -	unsigned int offset;
>>> -	int r, idx;
>>> +	int idx;
>>>  
>>>  	if (adev->vce.vcpu_bo == NULL)
>>>  		return -EINVAL;
>>>  
>>> -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
>>> -	if (r) {
>>> -		dev_err(adev->dev, "(%d) failed to reserve VCE
>>> bo\n", r);
>>> -		return r;
>>> -	}
>>> -
>>> -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
>>> -	if (r) {
>>> -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>> -		dev_err(adev->dev, "(%d) VCE map failed\n", r);
>>> -		return r;
>>> -	}
>>> -
>>> -	hdr = (const struct common_firmware_header *)adev->vce.fw-
>>>> data;
>>> -	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
>>> -
>>>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>> -		memcpy_toio(cpu_addr, adev->vce.fw->data + offset,
>>> -			    adev->vce.fw->size - offset);
>>> +		memcpy_toio(adev->vce.cpu_addr, adev-
>>>> vce.saved_bo,
>>> +			    amdgpu_bo_size(adev->vce.vcpu_bo));
>>>  		drm_dev_exit(idx);
>>>  	}
>>>  
>>> -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
>>> -
>>> -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>> -
>>>  	return 0;
>>>  }
>>>  
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>> index 2d64002bed61..21b6656b2f41 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>> @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
>>> amdgpu_ip_block *ip_block)
>>>  		return r;
>>>  
>>>  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>> -		const struct common_firmware_header *hdr;
>>> -		unsigned size = amdgpu_bo_size(adev->vce.vcpu_bo);
>>> -
>>> -		adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
>>> -		if (!adev->vce.saved_bo)
>>> -			return -ENOMEM;
>>> -
>>> -		hdr = (const struct common_firmware_header *)adev-
>>>> vce.fw->data;
>>> +		const struct common_firmware_header *hdr =
>>> +			(const struct common_firmware_header
>>> *)adev->vce.fw->data;
>>>  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
>>> = AMDGPU_UCODE_ID_VCE;
>>>  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
>>> adev->vce.fw;
>>>  		adev->firmware.fw_size +=
>>> @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
>>> amdgpu_ip_block *ip_block)
>>>  	/* free MM table */
>>>  	amdgpu_virt_free_mm_table(adev);
>>>  
>>> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>> -		kvfree(adev->vce.saved_bo);
>>> -		adev->vce.saved_bo = NULL;
>>> -	}
>>> -
>>>  	r = amdgpu_vce_suspend(adev);
>>>  	if (r)
>>>  		return r;
>>> @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
>>> amdgpu_ip_block *ip_block)
>>>  static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
>>>  {
>>>  	struct amdgpu_device *adev = ip_block->adev;
>>> -	int r, idx;
>>> -
>>> -	if (adev->vce.vcpu_bo == NULL)
>>> -		return 0;
>>> -
>>> -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>> -		if (adev->firmware.load_type ==
>>> AMDGPU_FW_LOAD_PSP) {
>>> -			unsigned size = amdgpu_bo_size(adev-
>>>> vce.vcpu_bo);
>>> -			void *ptr = adev->vce.cpu_addr;
>>> -
>>> -			memcpy_fromio(adev->vce.saved_bo, ptr,
>>> size);
>>> -		}
>>> -		drm_dev_exit(idx);
>>> -	}
>>> +	int r;
>>>  
>>>  	/*
>>>  	 * Proper cleanups before halting the HW engine:
>>> @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
>>> amdgpu_ip_block *ip_block)
>>>  static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
>>>  {
>>>  	struct amdgpu_device *adev = ip_block->adev;
>>> -	int r, idx;
>>> -
>>> -	if (adev->vce.vcpu_bo == NULL)
>>> -		return -EINVAL;
>>> -
>>> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>> -
>>> -		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>> -			unsigned size = amdgpu_bo_size(adev-
>>>> vce.vcpu_bo);
>>> -			void *ptr = adev->vce.cpu_addr;
>>> +	int r;
>>>  
>>> -			memcpy_toio(ptr, adev->vce.saved_bo,
>>> size);
>>> -			drm_dev_exit(idx);
>>> -		}
>>> -	} else {
>>> -		r = amdgpu_vce_resume(adev);
>>> -		if (r)
>>> -			return r;
>>> -	}
>>> +	r = amdgpu_vce_resume(adev);
>>> +	if (r)
>>> +		return r;
>>>  
>>>  	return vce_v4_0_hw_init(ip_block);
>>>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 10:01       ` Christian König
@ 2025-11-07 10:47         ` Timur Kristóf
  2025-11-07 13:14           ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-07 10:47 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
> On 11/7/25 10:53, Timur Kristóf wrote:
> > On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
> > > On 11/6/25 19:44, Timur Kristóf wrote:
> > > > VCE uses the VCPU BO to keep track of currently active
> > > > encoding sessions. To make sure the VCE functions correctly
> > > > after suspend/resume, save the VCPU BO to system RAM on
> > > > suspend and restore it on resume.
> > > > 
> > > > Additionally, make sure to keep the VCPU BO pinned.
> > > > The VCPU BO needs to stay at the same location before and after
> > > > sleep/resume because the FW code is not relocatable once it's
> > > > started.
> > > > 
> > > > Unfortunately this is not enough to make VCE suspend work when
> > > > there are encoding sessions active, so don't allow that yet.
> > > 
> > > The question if this is the right approach or not can only Leo
> > > and
> > > Ruijing answer.
> > > 
> > > If we can get VCE1-3 to keep session working after suspend/resume
> > > we
> > > should make this change otherwise we should rather make all old
> > > sessions invalid after suspend/resume and only re-load the FW.
> > > 
> > > Anyway I think you should make the removal of
> > > "amdgpu_bo_kmap(adev-
> > > > vce.vcpu_bo, &cpu_addr);" a separate patch, cause that seems to
> > > > be a
> > > good cleanup no matter what.
> > > 
> > 
> > This change is necessary for the VCE1 code when it loads the
> > firmware
> > signature. Without this patch, we would need to call reserve(),
> > kmap(),
> > kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
> > 
> > Let me know which approach you prefer.
> 
> In this case definately make removal of amdgpu_bo_kmap(adev-
> >vce.vcpu_bo, &cpu_addr) a separate patch.

Sorry, can you clarify what you mean?
Should I just go back to the way things were on the first version of
the series, and then clean it up later?

> 
> I want to get initial VCE1 working and landed independent of what
> Leo/Ruijing say to suspend/resume.

Agreed.

> 
> Can be that suspend/resume is then still broken, but that is also the
> case for VCE2-3 then.

Yes, some extra work is going to be needed on top of this patch to make
suspend/resume work (if it's possible at all).

> 
> 
> > 
> > > 
> > > > 
> > > > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > > > ---
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++---------
> > > > ----
> > > >  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-------------
> > > > ----
> > > > ----
> > > >  2 files changed, 24 insertions(+), 72 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > index 2297608c5191..4beec5b56c4f 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
> > > > amdgpu_device
> > > > *adev, unsigned long size)
> > > >  	if (!adev->vce.fw)
> > > >  		return -ENOENT;
> > > >  
> > > > +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> > > > +	if (!adev->vce.saved_bo)
> > > > +		return -ENOMEM;
> > > > +
> > > >  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> > > >  				    AMDGPU_GEM_DOMAIN_VRAM |
> > > >  				    AMDGPU_GEM_DOMAIN_GTT,
> > > > @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device
> > > > *adev)
> > > >  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
> > > > > vce.gpu_addr,
> > > >  		(void **)&adev->vce.cpu_addr);
> > > >  
> > > > +	kvfree(adev->vce.saved_bo);
> > > > +	adev->vce.saved_bo = NULL;
> > > > +
> > > >  	return 0;
> > > >  }
> > > >  
> > > > @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
> > > > amdgpu_device *adev, struct amdgpu_ring *ring)
> > > >   */
> > > >  int amdgpu_vce_suspend(struct amdgpu_device *adev)
> > > >  {
> > > > -	int i;
> > > > +	int i, idx;
> > > >  
> > > >  	cancel_delayed_work_sync(&adev->vce.idle_work);
> > > >  
> > > >  	if (adev->vce.vcpu_bo == NULL)
> > > >  		return 0;
> > > >  
> > > > +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > +		memcpy_fromio(adev->vce.saved_bo, adev-
> > > > > vce.cpu_addr,
> > > > +			      amdgpu_bo_size(adev-
> > > > >vce.vcpu_bo));
> > > > +		drm_dev_exit(idx);
> > > > +	}
> > > > +
> > > >  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
> > > >  		if (atomic_read(&adev->vce.handles[i]))
> > > >  			break;
> > > > @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
> > > > amdgpu_device
> > > > *adev)
> > > >   */
> > > >  int amdgpu_vce_resume(struct amdgpu_device *adev)
> > > >  {
> > > > -	void *cpu_addr;
> > > > -	const struct common_firmware_header *hdr;
> > > > -	unsigned int offset;
> > > > -	int r, idx;
> > > > +	int idx;
> > > >  
> > > >  	if (adev->vce.vcpu_bo == NULL)
> > > >  		return -EINVAL;
> > > >  
> > > > -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> > > > -	if (r) {
> > > > -		dev_err(adev->dev, "(%d) failed to reserve VCE
> > > > bo\n", r);
> > > > -		return r;
> > > > -	}
> > > > -
> > > > -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
> > > > -	if (r) {
> > > > -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > -		dev_err(adev->dev, "(%d) VCE map failed\n",
> > > > r);
> > > > -		return r;
> > > > -	}
> > > > -
> > > > -	hdr = (const struct common_firmware_header *)adev-
> > > > >vce.fw-
> > > > > data;
> > > > -	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
> > > > -
> > > >  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > -		memcpy_toio(cpu_addr, adev->vce.fw->data +
> > > > offset,
> > > > -			    adev->vce.fw->size - offset);
> > > > +		memcpy_toio(adev->vce.cpu_addr, adev-
> > > > > vce.saved_bo,
> > > > +			    amdgpu_bo_size(adev-
> > > > >vce.vcpu_bo));
> > > >  		drm_dev_exit(idx);
> > > >  	}
> > > >  
> > > > -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> > > > -
> > > > -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > -
> > > >  	return 0;
> > > >  }
> > > >  
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > index 2d64002bed61..21b6656b2f41 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
> > > > amdgpu_ip_block *ip_block)
> > > >  		return r;
> > > >  
> > > >  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > > > -		const struct common_firmware_header *hdr;
> > > > -		unsigned size = amdgpu_bo_size(adev-
> > > > >vce.vcpu_bo);
> > > > -
> > > > -		adev->vce.saved_bo = kvmalloc(size,
> > > > GFP_KERNEL);
> > > > -		if (!adev->vce.saved_bo)
> > > > -			return -ENOMEM;
> > > > -
> > > > -		hdr = (const struct common_firmware_header
> > > > *)adev-
> > > > > vce.fw->data;
> > > > +		const struct common_firmware_header *hdr =
> > > > +			(const struct common_firmware_header
> > > > *)adev->vce.fw->data;
> > > >  		adev-
> > > > >firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
> > > > = AMDGPU_UCODE_ID_VCE;
> > > >  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
> > > > adev->vce.fw;
> > > >  		adev->firmware.fw_size +=
> > > > @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
> > > > amdgpu_ip_block *ip_block)
> > > >  	/* free MM table */
> > > >  	amdgpu_virt_free_mm_table(adev);
> > > >  
> > > > -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > > > -		kvfree(adev->vce.saved_bo);
> > > > -		adev->vce.saved_bo = NULL;
> > > > -	}
> > > > -
> > > >  	r = amdgpu_vce_suspend(adev);
> > > >  	if (r)
> > > >  		return r;
> > > > @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
> > > > amdgpu_ip_block *ip_block)
> > > >  static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
> > > >  {
> > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > -	int r, idx;
> > > > -
> > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > -		return 0;
> > > > -
> > > > -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > -		if (adev->firmware.load_type ==
> > > > AMDGPU_FW_LOAD_PSP) {
> > > > -			unsigned size = amdgpu_bo_size(adev-
> > > > > vce.vcpu_bo);
> > > > -			void *ptr = adev->vce.cpu_addr;
> > > > -
> > > > -			memcpy_fromio(adev->vce.saved_bo, ptr,
> > > > size);
> > > > -		}
> > > > -		drm_dev_exit(idx);
> > > > -	}
> > > > +	int r;
> > > >  
> > > >  	/*
> > > >  	 * Proper cleanups before halting the HW engine:
> > > > @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
> > > > amdgpu_ip_block *ip_block)
> > > >  static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
> > > >  {
> > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > -	int r, idx;
> > > > -
> > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > -		return -EINVAL;
> > > > -
> > > > -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
> > > > -
> > > > -		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > -			unsigned size = amdgpu_bo_size(adev-
> > > > > vce.vcpu_bo);
> > > > -			void *ptr = adev->vce.cpu_addr;
> > > > +	int r;
> > > >  
> > > > -			memcpy_toio(ptr, adev->vce.saved_bo,
> > > > size);
> > > > -			drm_dev_exit(idx);
> > > > -		}
> > > > -	} else {
> > > > -		r = amdgpu_vce_resume(adev);
> > > > -		if (r)
> > > > -			return r;
> > > > -	}
> > > > +	r = amdgpu_vce_resume(adev);
> > > > +	if (r)
> > > > +		return r;
> > > >  
> > > >  	return vce_v4_0_hw_init(ip_block);
> > > >  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 10:47         ` Timur Kristóf
@ 2025-11-07 13:14           ` Christian König
  2025-11-07 13:31             ` Timur Kristóf
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2025-11-07 13:14 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On 11/7/25 11:47, Timur Kristóf wrote:
> On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
>> On 11/7/25 10:53, Timur Kristóf wrote:
>>> On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
>>>> On 11/6/25 19:44, Timur Kristóf wrote:
>>>>> VCE uses the VCPU BO to keep track of currently active
>>>>> encoding sessions. To make sure the VCE functions correctly
>>>>> after suspend/resume, save the VCPU BO to system RAM on
>>>>> suspend and restore it on resume.
>>>>>
>>>>> Additionally, make sure to keep the VCPU BO pinned.
>>>>> The VCPU BO needs to stay at the same location before and after
>>>>> sleep/resume because the FW code is not relocatable once it's
>>>>> started.
>>>>>
>>>>> Unfortunately this is not enough to make VCE suspend work when
>>>>> there are encoding sessions active, so don't allow that yet.
>>>>
>>>> The question if this is the right approach or not can only Leo
>>>> and
>>>> Ruijing answer.
>>>>
>>>> If we can get VCE1-3 to keep session working after suspend/resume
>>>> we
>>>> should make this change otherwise we should rather make all old
>>>> sessions invalid after suspend/resume and only re-load the FW.
>>>>
>>>> Anyway I think you should make the removal of
>>>> "amdgpu_bo_kmap(adev-
>>>>> vce.vcpu_bo, &cpu_addr);" a separate patch, cause that seems to
>>>>> be a
>>>> good cleanup no matter what.
>>>>
>>>
>>> This change is necessary for the VCE1 code when it loads the
>>> firmware
>>> signature. Without this patch, we would need to call reserve(),
>>> kmap(),
>>> kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
>>>
>>> Let me know which approach you prefer.
>>
>> In this case definately make removal of amdgpu_bo_kmap(adev-
>>> vce.vcpu_bo, &cpu_addr) a separate patch.
> 
> Sorry, can you clarify what you mean?
> Should I just go back to the way things were on the first version of
> the series, and then clean it up later?

Just add a patch (early in the series, probably as first patch) to remove amdgpu_bo_kmap(adev-> vce.vcpu_bo, &cpu_addr).

Then put the memset_io() into amdgpu_vce_resume() like you had in the first series of the patch so that VCE1 behaves the same as VCE2-3.

And when the series has landed we can clean everything up depending on what Leo/Ruijing has decided to do with suspend/resume on VCE1-3.

Regards,
Christian.

> 
>>
>> I want to get initial VCE1 working and landed independent of what
>> Leo/Ruijing say to suspend/resume.
> 
> Agreed.
> 
>>
>> Can be that suspend/resume is then still broken, but that is also the
>> case for VCE2-3 then.
> 
> Yes, some extra work is going to be needed on top of this patch to make
> suspend/resume work (if it's possible at all).
> 
>>
>>
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>>>> ---
>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++---------
>>>>> ----
>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-------------
>>>>> ----
>>>>> ----
>>>>>  2 files changed, 24 insertions(+), 72 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> index 2297608c5191..4beec5b56c4f 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
>>>>> amdgpu_device
>>>>> *adev, unsigned long size)
>>>>>  	if (!adev->vce.fw)
>>>>>  		return -ENOENT;
>>>>>  
>>>>> +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
>>>>> +	if (!adev->vce.saved_bo)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>>  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>>>>>  				    AMDGPU_GEM_DOMAIN_VRAM |
>>>>>  				    AMDGPU_GEM_DOMAIN_GTT,
>>>>> @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct amdgpu_device
>>>>> *adev)
>>>>>  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
>>>>>> vce.gpu_addr,
>>>>>  		(void **)&adev->vce.cpu_addr);
>>>>>  
>>>>> +	kvfree(adev->vce.saved_bo);
>>>>> +	adev->vce.saved_bo = NULL;
>>>>> +
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
>>>>> amdgpu_device *adev, struct amdgpu_ring *ring)
>>>>>   */
>>>>>  int amdgpu_vce_suspend(struct amdgpu_device *adev)
>>>>>  {
>>>>> -	int i;
>>>>> +	int i, idx;
>>>>>  
>>>>>  	cancel_delayed_work_sync(&adev->vce.idle_work);
>>>>>  
>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>  		return 0;
>>>>>  
>>>>> +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>> +		memcpy_fromio(adev->vce.saved_bo, adev-
>>>>>> vce.cpu_addr,
>>>>> +			      amdgpu_bo_size(adev-
>>>>>> vce.vcpu_bo));
>>>>> +		drm_dev_exit(idx);
>>>>> +	}
>>>>> +
>>>>>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
>>>>>  		if (atomic_read(&adev->vce.handles[i]))
>>>>>  			break;
>>>>> @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
>>>>> amdgpu_device
>>>>> *adev)
>>>>>   */
>>>>>  int amdgpu_vce_resume(struct amdgpu_device *adev)
>>>>>  {
>>>>> -	void *cpu_addr;
>>>>> -	const struct common_firmware_header *hdr;
>>>>> -	unsigned int offset;
>>>>> -	int r, idx;
>>>>> +	int idx;
>>>>>  
>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>  		return -EINVAL;
>>>>>  
>>>>> -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
>>>>> -	if (r) {
>>>>> -		dev_err(adev->dev, "(%d) failed to reserve VCE
>>>>> bo\n", r);
>>>>> -		return r;
>>>>> -	}
>>>>> -
>>>>> -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
>>>>> -	if (r) {
>>>>> -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>> -		dev_err(adev->dev, "(%d) VCE map failed\n",
>>>>> r);
>>>>> -		return r;
>>>>> -	}
>>>>> -
>>>>> -	hdr = (const struct common_firmware_header *)adev-
>>>>>> vce.fw-
>>>>>> data;
>>>>> -	offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
>>>>> -
>>>>>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>> -		memcpy_toio(cpu_addr, adev->vce.fw->data +
>>>>> offset,
>>>>> -			    adev->vce.fw->size - offset);
>>>>> +		memcpy_toio(adev->vce.cpu_addr, adev-
>>>>>> vce.saved_bo,
>>>>> +			    amdgpu_bo_size(adev-
>>>>>> vce.vcpu_bo));
>>>>>  		drm_dev_exit(idx);
>>>>>  	}
>>>>>  
>>>>> -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
>>>>> -
>>>>> -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>> -
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>> index 2d64002bed61..21b6656b2f41 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>> @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
>>>>> amdgpu_ip_block *ip_block)
>>>>>  		return r;
>>>>>  
>>>>>  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>>>> -		const struct common_firmware_header *hdr;
>>>>> -		unsigned size = amdgpu_bo_size(adev-
>>>>>> vce.vcpu_bo);
>>>>> -
>>>>> -		adev->vce.saved_bo = kvmalloc(size,
>>>>> GFP_KERNEL);
>>>>> -		if (!adev->vce.saved_bo)
>>>>> -			return -ENOMEM;
>>>>> -
>>>>> -		hdr = (const struct common_firmware_header
>>>>> *)adev-
>>>>>> vce.fw->data;
>>>>> +		const struct common_firmware_header *hdr =
>>>>> +			(const struct common_firmware_header
>>>>> *)adev->vce.fw->data;
>>>>>  		adev-
>>>>>> firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
>>>>> = AMDGPU_UCODE_ID_VCE;
>>>>>  		adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
>>>>> adev->vce.fw;
>>>>>  		adev->firmware.fw_size +=
>>>>> @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
>>>>> amdgpu_ip_block *ip_block)
>>>>>  	/* free MM table */
>>>>>  	amdgpu_virt_free_mm_table(adev);
>>>>>  
>>>>> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>>>> -		kvfree(adev->vce.saved_bo);
>>>>> -		adev->vce.saved_bo = NULL;
>>>>> -	}
>>>>> -
>>>>>  	r = amdgpu_vce_suspend(adev);
>>>>>  	if (r)
>>>>>  		return r;
>>>>> @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
>>>>> amdgpu_ip_block *ip_block)
>>>>>  static int vce_v4_0_suspend(struct amdgpu_ip_block *ip_block)
>>>>>  {
>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>> -	int r, idx;
>>>>> -
>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>> -		return 0;
>>>>> -
>>>>> -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>> -		if (adev->firmware.load_type ==
>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>> -			unsigned size = amdgpu_bo_size(adev-
>>>>>> vce.vcpu_bo);
>>>>> -			void *ptr = adev->vce.cpu_addr;
>>>>> -
>>>>> -			memcpy_fromio(adev->vce.saved_bo, ptr,
>>>>> size);
>>>>> -		}
>>>>> -		drm_dev_exit(idx);
>>>>> -	}
>>>>> +	int r;
>>>>>  
>>>>>  	/*
>>>>>  	 * Proper cleanups before halting the HW engine:
>>>>> @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
>>>>> amdgpu_ip_block *ip_block)
>>>>>  static int vce_v4_0_resume(struct amdgpu_ip_block *ip_block)
>>>>>  {
>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>> -	int r, idx;
>>>>> -
>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>> -		return -EINVAL;
>>>>> -
>>>>> -	if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
>>>>> -
>>>>> -		if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>> -			unsigned size = amdgpu_bo_size(adev-
>>>>>> vce.vcpu_bo);
>>>>> -			void *ptr = adev->vce.cpu_addr;
>>>>> +	int r;
>>>>>  
>>>>> -			memcpy_toio(ptr, adev->vce.saved_bo,
>>>>> size);
>>>>> -			drm_dev_exit(idx);
>>>>> -		}
>>>>> -	} else {
>>>>> -		r = amdgpu_vce_resume(adev);
>>>>> -		if (r)
>>>>> -			return r;
>>>>> -	}
>>>>> +	r = amdgpu_vce_resume(adev);
>>>>> +	if (r)
>>>>> +		return r;
>>>>>  
>>>>>  	return vce_v4_0_hw_init(ip_block);
>>>>>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 13:14           ` Christian König
@ 2025-11-07 13:31             ` Timur Kristóf
  2025-11-07 13:33               ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-07 13:31 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On Fri, 2025-11-07 at 14:14 +0100, Christian König wrote:
> On 11/7/25 11:47, Timur Kristóf wrote:
> > On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
> > > On 11/7/25 10:53, Timur Kristóf wrote:
> > > > On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
> > > > > On 11/6/25 19:44, Timur Kristóf wrote:
> > > > > > VCE uses the VCPU BO to keep track of currently active
> > > > > > encoding sessions. To make sure the VCE functions correctly
> > > > > > after suspend/resume, save the VCPU BO to system RAM on
> > > > > > suspend and restore it on resume.
> > > > > > 
> > > > > > Additionally, make sure to keep the VCPU BO pinned.
> > > > > > The VCPU BO needs to stay at the same location before and
> > > > > > after
> > > > > > sleep/resume because the FW code is not relocatable once
> > > > > > it's
> > > > > > started.
> > > > > > 
> > > > > > Unfortunately this is not enough to make VCE suspend work
> > > > > > when
> > > > > > there are encoding sessions active, so don't allow that
> > > > > > yet.
> > > > > 
> > > > > The question if this is the right approach or not can only
> > > > > Leo
> > > > > and
> > > > > Ruijing answer.
> > > > > 
> > > > > If we can get VCE1-3 to keep session working after
> > > > > suspend/resume
> > > > > we
> > > > > should make this change otherwise we should rather make all
> > > > > old
> > > > > sessions invalid after suspend/resume and only re-load the
> > > > > FW.
> > > > > 
> > > > > Anyway I think you should make the removal of
> > > > > "amdgpu_bo_kmap(adev-
> > > > > > vce.vcpu_bo, &cpu_addr);" a separate patch, cause that
> > > > > > seems to
> > > > > > be a
> > > > > good cleanup no matter what.
> > > > > 
> > > > 
> > > > This change is necessary for the VCE1 code when it loads the
> > > > firmware
> > > > signature. Without this patch, we would need to call reserve(),
> > > > kmap(),
> > > > kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
> > > > 
> > > > Let me know which approach you prefer.
> > > 
> > > In this case definately make removal of amdgpu_bo_kmap(adev-
> > > > vce.vcpu_bo, &cpu_addr) a separate patch.
> > 
> > Sorry, can you clarify what you mean?
> > Should I just go back to the way things were on the first version
> > of
> > the series, and then clean it up later?
> 
> Just add a patch (early in the series, probably as first patch) to
> remove amdgpu_bo_kmap(adev-> vce.vcpu_bo, &cpu_addr).

Is it allowed to keep a BO mapped when it is unreserved?

> 
> Then put the memset_io() into amdgpu_vce_resume() like you had in the
> first series of the patch so that VCE1 behaves the same as VCE2-3.

Ok

> 
> And when the series has landed we can clean everything up depending
> on what Leo/Ruijing has decided to do with suspend/resume on VCE1-3.

Sounds good.



> 
> 
> > 
> > > 
> > > I want to get initial VCE1 working and landed independent of what
> > > Leo/Ruijing say to suspend/resume.
> > 
> > Agreed.
> > 
> > > 
> > > Can be that suspend/resume is then still broken, but that is also
> > > the
> > > case for VCE2-3 then.
> > 
> > Yes, some extra work is going to be needed on top of this patch to
> > make
> > suspend/resume work (if it's possible at all).
> > 
> > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-----
> > > > > > ----
> > > > > > ----
> > > > > >  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++---------
> > > > > > ----
> > > > > > ----
> > > > > > ----
> > > > > >  2 files changed, 24 insertions(+), 72 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > index 2297608c5191..4beec5b56c4f 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
> > > > > > amdgpu_device
> > > > > > *adev, unsigned long size)
> > > > > >  	if (!adev->vce.fw)
> > > > > >  		return -ENOENT;
> > > > > >  
> > > > > > +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
> > > > > > +	if (!adev->vce.saved_bo)
> > > > > > +		return -ENOMEM;
> > > > > > +
> > > > > >  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> > > > > >  				    AMDGPU_GEM_DOMAIN_VRAM
> > > > > > |
> > > > > >  				    AMDGPU_GEM_DOMAIN_GTT,
> > > > > > @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct
> > > > > > amdgpu_device
> > > > > > *adev)
> > > > > >  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
> > > > > > > vce.gpu_addr,
> > > > > >  		(void **)&adev->vce.cpu_addr);
> > > > > >  
> > > > > > +	kvfree(adev->vce.saved_bo);
> > > > > > +	adev->vce.saved_bo = NULL;
> > > > > > +
> > > > > >  	return 0;
> > > > > >  }
> > > > > >  
> > > > > > @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
> > > > > > amdgpu_device *adev, struct amdgpu_ring *ring)
> > > > > >   */
> > > > > >  int amdgpu_vce_suspend(struct amdgpu_device *adev)
> > > > > >  {
> > > > > > -	int i;
> > > > > > +	int i, idx;
> > > > > >  
> > > > > >  	cancel_delayed_work_sync(&adev->vce.idle_work);
> > > > > >  
> > > > > >  	if (adev->vce.vcpu_bo == NULL)
> > > > > >  		return 0;
> > > > > >  
> > > > > > +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > +		memcpy_fromio(adev->vce.saved_bo, adev-
> > > > > > > vce.cpu_addr,
> > > > > > +			      amdgpu_bo_size(adev-
> > > > > > > vce.vcpu_bo));
> > > > > > +		drm_dev_exit(idx);
> > > > > > +	}
> > > > > > +
> > > > > >  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
> > > > > >  		if (atomic_read(&adev->vce.handles[i]))
> > > > > >  			break;
> > > > > > @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
> > > > > > amdgpu_device
> > > > > > *adev)
> > > > > >   */
> > > > > >  int amdgpu_vce_resume(struct amdgpu_device *adev)
> > > > > >  {
> > > > > > -	void *cpu_addr;
> > > > > > -	const struct common_firmware_header *hdr;
> > > > > > -	unsigned int offset;
> > > > > > -	int r, idx;
> > > > > > +	int idx;
> > > > > >  
> > > > > >  	if (adev->vce.vcpu_bo == NULL)
> > > > > >  		return -EINVAL;
> > > > > >  
> > > > > > -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
> > > > > > -	if (r) {
> > > > > > -		dev_err(adev->dev, "(%d) failed to reserve
> > > > > > VCE
> > > > > > bo\n", r);
> > > > > > -		return r;
> > > > > > -	}
> > > > > > -
> > > > > > -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
> > > > > > -	if (r) {
> > > > > > -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > > > -		dev_err(adev->dev, "(%d) VCE map
> > > > > > failed\n",
> > > > > > r);
> > > > > > -		return r;
> > > > > > -	}
> > > > > > -
> > > > > > -	hdr = (const struct common_firmware_header *)adev-
> > > > > > > vce.fw-
> > > > > > > data;
> > > > > > -	offset = le32_to_cpu(hdr-
> > > > > > >ucode_array_offset_bytes);
> > > > > > -
> > > > > >  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > -		memcpy_toio(cpu_addr, adev->vce.fw->data +
> > > > > > offset,
> > > > > > -			    adev->vce.fw->size - offset);
> > > > > > +		memcpy_toio(adev->vce.cpu_addr, adev-
> > > > > > > vce.saved_bo,
> > > > > > +			    amdgpu_bo_size(adev-
> > > > > > > vce.vcpu_bo));
> > > > > >  		drm_dev_exit(idx);
> > > > > >  	}
> > > > > >  
> > > > > > -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> > > > > > -
> > > > > > -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > > > -
> > > > > >  	return 0;
> > > > > >  }
> > > > > >  
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > index 2d64002bed61..21b6656b2f41 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
> > > > > > amdgpu_ip_block *ip_block)
> > > > > >  		return r;
> > > > > >  
> > > > > >  	if (adev->firmware.load_type ==
> > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > -		const struct common_firmware_header *hdr;
> > > > > > -		unsigned size = amdgpu_bo_size(adev-
> > > > > > > vce.vcpu_bo);
> > > > > > -
> > > > > > -		adev->vce.saved_bo = kvmalloc(size,
> > > > > > GFP_KERNEL);
> > > > > > -		if (!adev->vce.saved_bo)
> > > > > > -			return -ENOMEM;
> > > > > > -
> > > > > > -		hdr = (const struct common_firmware_header
> > > > > > *)adev-
> > > > > > > vce.fw->data;
> > > > > > +		const struct common_firmware_header *hdr =
> > > > > > +			(const struct
> > > > > > common_firmware_header
> > > > > > *)adev->vce.fw->data;
> > > > > >  		adev-
> > > > > > > firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
> > > > > > = AMDGPU_UCODE_ID_VCE;
> > > > > >  		adev-
> > > > > > >firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
> > > > > > adev->vce.fw;
> > > > > >  		adev->firmware.fw_size +=
> > > > > > @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
> > > > > > amdgpu_ip_block *ip_block)
> > > > > >  	/* free MM table */
> > > > > >  	amdgpu_virt_free_mm_table(adev);
> > > > > >  
> > > > > > -	if (adev->firmware.load_type ==
> > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > -		kvfree(adev->vce.saved_bo);
> > > > > > -		adev->vce.saved_bo = NULL;
> > > > > > -	}
> > > > > > -
> > > > > >  	r = amdgpu_vce_suspend(adev);
> > > > > >  	if (r)
> > > > > >  		return r;
> > > > > > @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
> > > > > > amdgpu_ip_block *ip_block)
> > > > > >  static int vce_v4_0_suspend(struct amdgpu_ip_block
> > > > > > *ip_block)
> > > > > >  {
> > > > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > > > -	int r, idx;
> > > > > > -
> > > > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > > > -		return 0;
> > > > > > -
> > > > > > -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > -		if (adev->firmware.load_type ==
> > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > -			unsigned size =
> > > > > > amdgpu_bo_size(adev-
> > > > > > > vce.vcpu_bo);
> > > > > > -			void *ptr = adev->vce.cpu_addr;
> > > > > > -
> > > > > > -			memcpy_fromio(adev->vce.saved_bo,
> > > > > > ptr,
> > > > > > size);
> > > > > > -		}
> > > > > > -		drm_dev_exit(idx);
> > > > > > -	}
> > > > > > +	int r;
> > > > > >  
> > > > > >  	/*
> > > > > >  	 * Proper cleanups before halting the HW engine:
> > > > > > @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
> > > > > > amdgpu_ip_block *ip_block)
> > > > > >  static int vce_v4_0_resume(struct amdgpu_ip_block
> > > > > > *ip_block)
> > > > > >  {
> > > > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > > > -	int r, idx;
> > > > > > -
> > > > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > > > -		return -EINVAL;
> > > > > > -
> > > > > > -	if (adev->firmware.load_type ==
> > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > -
> > > > > > -		if (drm_dev_enter(adev_to_drm(adev),
> > > > > > &idx)) {
> > > > > > -			unsigned size =
> > > > > > amdgpu_bo_size(adev-
> > > > > > > vce.vcpu_bo);
> > > > > > -			void *ptr = adev->vce.cpu_addr;
> > > > > > +	int r;
> > > > > >  
> > > > > > -			memcpy_toio(ptr, adev-
> > > > > > >vce.saved_bo,
> > > > > > size);
> > > > > > -			drm_dev_exit(idx);
> > > > > > -		}
> > > > > > -	} else {
> > > > > > -		r = amdgpu_vce_resume(adev);
> > > > > > -		if (r)
> > > > > > -			return r;
> > > > > > -	}
> > > > > > +	r = amdgpu_vce_resume(adev);
> > > > > > +	if (r)
> > > > > > +		return r;
> > > > > >  
> > > > > >  	return vce_v4_0_hw_init(ip_block);
> > > > > >  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 13:31             ` Timur Kristóf
@ 2025-11-07 13:33               ` Christian König
  2025-11-07 13:39                 ` Timur Kristóf
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2025-11-07 13:33 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On 11/7/25 14:31, Timur Kristóf wrote:
> On Fri, 2025-11-07 at 14:14 +0100, Christian König wrote:
>> On 11/7/25 11:47, Timur Kristóf wrote:
>>> On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
>>>> On 11/7/25 10:53, Timur Kristóf wrote:
>>>>> On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
>>>>>> On 11/6/25 19:44, Timur Kristóf wrote:
>>>>>>> VCE uses the VCPU BO to keep track of currently active
>>>>>>> encoding sessions. To make sure the VCE functions correctly
>>>>>>> after suspend/resume, save the VCPU BO to system RAM on
>>>>>>> suspend and restore it on resume.
>>>>>>>
>>>>>>> Additionally, make sure to keep the VCPU BO pinned.
>>>>>>> The VCPU BO needs to stay at the same location before and
>>>>>>> after
>>>>>>> sleep/resume because the FW code is not relocatable once
>>>>>>> it's
>>>>>>> started.
>>>>>>>
>>>>>>> Unfortunately this is not enough to make VCE suspend work
>>>>>>> when
>>>>>>> there are encoding sessions active, so don't allow that
>>>>>>> yet.
>>>>>>
>>>>>> The question if this is the right approach or not can only
>>>>>> Leo
>>>>>> and
>>>>>> Ruijing answer.
>>>>>>
>>>>>> If we can get VCE1-3 to keep session working after
>>>>>> suspend/resume
>>>>>> we
>>>>>> should make this change otherwise we should rather make all
>>>>>> old
>>>>>> sessions invalid after suspend/resume and only re-load the
>>>>>> FW.
>>>>>>
>>>>>> Anyway I think you should make the removal of
>>>>>> "amdgpu_bo_kmap(adev-
>>>>>>> vce.vcpu_bo, &cpu_addr);" a separate patch, cause that
>>>>>>> seems to
>>>>>>> be a
>>>>>> good cleanup no matter what.
>>>>>>
>>>>>
>>>>> This change is necessary for the VCE1 code when it loads the
>>>>> firmware
>>>>> signature. Without this patch, we would need to call reserve(),
>>>>> kmap(),
>>>>> kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
>>>>>
>>>>> Let me know which approach you prefer.
>>>>
>>>> In this case definately make removal of amdgpu_bo_kmap(adev-
>>>>> vce.vcpu_bo, &cpu_addr) a separate patch.
>>>
>>> Sorry, can you clarify what you mean?
>>> Should I just go back to the way things were on the first version
>>> of
>>> the series, and then clean it up later?
>>
>> Just add a patch (early in the series, probably as first patch) to
>> remove amdgpu_bo_kmap(adev-> vce.vcpu_bo, &cpu_addr).
> 
> Is it allowed to keep a BO mapped when it is unreserved?

Yes, as long as it is also pinned.

IIRC the VCE BO is allocated by amdgpu_bo_create_kernel() and that both pins and maps the BO.

Regards,
Christian.

> 
>>
>> Then put the memset_io() into amdgpu_vce_resume() like you had in the
>> first series of the patch so that VCE1 behaves the same as VCE2-3.
> 
> Ok
> 
>>
>> And when the series has landed we can clean everything up depending
>> on what Leo/Ruijing has decided to do with suspend/resume on VCE1-3.
> 
> Sounds good.
> 
> 
> 
>>
>>
>>>
>>>>
>>>> I want to get initial VCE1 working and landed independent of what
>>>> Leo/Ruijing say to suspend/resume.
>>>
>>> Agreed.
>>>
>>>>
>>>> Can be that suspend/resume is then still broken, but that is also
>>>> the
>>>> case for VCE2-3 then.
>>>
>>> Yes, some extra work is going to be needed on top of this patch to
>>> make
>>> suspend/resume work (if it's possible at all).
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-----
>>>>>>> ----
>>>>>>> ----
>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++---------
>>>>>>> ----
>>>>>>> ----
>>>>>>> ----
>>>>>>>  2 files changed, 24 insertions(+), 72 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>> index 2297608c5191..4beec5b56c4f 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>> @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
>>>>>>> amdgpu_device
>>>>>>> *adev, unsigned long size)
>>>>>>>  	if (!adev->vce.fw)
>>>>>>>  		return -ENOENT;
>>>>>>>  
>>>>>>> +	adev->vce.saved_bo = kvmalloc(size, GFP_KERNEL);
>>>>>>> +	if (!adev->vce.saved_bo)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>>  	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>>>>>>>  				    AMDGPU_GEM_DOMAIN_VRAM
>>>>>>> |
>>>>>>>  				    AMDGPU_GEM_DOMAIN_GTT,
>>>>>>> @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct
>>>>>>> amdgpu_device
>>>>>>> *adev)
>>>>>>>  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo, &adev-
>>>>>>>> vce.gpu_addr,
>>>>>>>  		(void **)&adev->vce.cpu_addr);
>>>>>>>  
>>>>>>> +	kvfree(adev->vce.saved_bo);
>>>>>>> +	adev->vce.saved_bo = NULL;
>>>>>>> +
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
>>>>>>> amdgpu_device *adev, struct amdgpu_ring *ring)
>>>>>>>   */
>>>>>>>  int amdgpu_vce_suspend(struct amdgpu_device *adev)
>>>>>>>  {
>>>>>>> -	int i;
>>>>>>> +	int i, idx;
>>>>>>>  
>>>>>>>  	cancel_delayed_work_sync(&adev->vce.idle_work);
>>>>>>>  
>>>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>>>  		return 0;
>>>>>>>  
>>>>>>> +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>> +		memcpy_fromio(adev->vce.saved_bo, adev-
>>>>>>>> vce.cpu_addr,
>>>>>>> +			      amdgpu_bo_size(adev-
>>>>>>>> vce.vcpu_bo));
>>>>>>> +		drm_dev_exit(idx);
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
>>>>>>>  		if (atomic_read(&adev->vce.handles[i]))
>>>>>>>  			break;
>>>>>>> @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
>>>>>>> amdgpu_device
>>>>>>> *adev)
>>>>>>>   */
>>>>>>>  int amdgpu_vce_resume(struct amdgpu_device *adev)
>>>>>>>  {
>>>>>>> -	void *cpu_addr;
>>>>>>> -	const struct common_firmware_header *hdr;
>>>>>>> -	unsigned int offset;
>>>>>>> -	int r, idx;
>>>>>>> +	int idx;
>>>>>>>  
>>>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>>>  		return -EINVAL;
>>>>>>>  
>>>>>>> -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo, false);
>>>>>>> -	if (r) {
>>>>>>> -		dev_err(adev->dev, "(%d) failed to reserve
>>>>>>> VCE
>>>>>>> bo\n", r);
>>>>>>> -		return r;
>>>>>>> -	}
>>>>>>> -
>>>>>>> -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr);
>>>>>>> -	if (r) {
>>>>>>> -		amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>>>> -		dev_err(adev->dev, "(%d) VCE map
>>>>>>> failed\n",
>>>>>>> r);
>>>>>>> -		return r;
>>>>>>> -	}
>>>>>>> -
>>>>>>> -	hdr = (const struct common_firmware_header *)adev-
>>>>>>>> vce.fw-
>>>>>>>> data;
>>>>>>> -	offset = le32_to_cpu(hdr-
>>>>>>>> ucode_array_offset_bytes);
>>>>>>> -
>>>>>>>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>> -		memcpy_toio(cpu_addr, adev->vce.fw->data +
>>>>>>> offset,
>>>>>>> -			    adev->vce.fw->size - offset);
>>>>>>> +		memcpy_toio(adev->vce.cpu_addr, adev-
>>>>>>>> vce.saved_bo,
>>>>>>> +			    amdgpu_bo_size(adev-
>>>>>>>> vce.vcpu_bo));
>>>>>>>  		drm_dev_exit(idx);
>>>>>>>  	}
>>>>>>>  
>>>>>>> -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
>>>>>>> -
>>>>>>> -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>>>> -
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>> index 2d64002bed61..21b6656b2f41 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>> @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>  		return r;
>>>>>>>  
>>>>>>>  	if (adev->firmware.load_type ==
>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>> -		const struct common_firmware_header *hdr;
>>>>>>> -		unsigned size = amdgpu_bo_size(adev-
>>>>>>>> vce.vcpu_bo);
>>>>>>> -
>>>>>>> -		adev->vce.saved_bo = kvmalloc(size,
>>>>>>> GFP_KERNEL);
>>>>>>> -		if (!adev->vce.saved_bo)
>>>>>>> -			return -ENOMEM;
>>>>>>> -
>>>>>>> -		hdr = (const struct common_firmware_header
>>>>>>> *)adev-
>>>>>>>> vce.fw->data;
>>>>>>> +		const struct common_firmware_header *hdr =
>>>>>>> +			(const struct
>>>>>>> common_firmware_header
>>>>>>> *)adev->vce.fw->data;
>>>>>>>  		adev-
>>>>>>>> firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
>>>>>>> = AMDGPU_UCODE_ID_VCE;
>>>>>>>  		adev-
>>>>>>>> firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
>>>>>>> adev->vce.fw;
>>>>>>>  		adev->firmware.fw_size +=
>>>>>>> @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>  	/* free MM table */
>>>>>>>  	amdgpu_virt_free_mm_table(adev);
>>>>>>>  
>>>>>>> -	if (adev->firmware.load_type ==
>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>> -		kvfree(adev->vce.saved_bo);
>>>>>>> -		adev->vce.saved_bo = NULL;
>>>>>>> -	}
>>>>>>> -
>>>>>>>  	r = amdgpu_vce_suspend(adev);
>>>>>>>  	if (r)
>>>>>>>  		return r;
>>>>>>> @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>  static int vce_v4_0_suspend(struct amdgpu_ip_block
>>>>>>> *ip_block)
>>>>>>>  {
>>>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>>>> -	int r, idx;
>>>>>>> -
>>>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>>>> -		return 0;
>>>>>>> -
>>>>>>> -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>> -		if (adev->firmware.load_type ==
>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>> -			unsigned size =
>>>>>>> amdgpu_bo_size(adev-
>>>>>>>> vce.vcpu_bo);
>>>>>>> -			void *ptr = adev->vce.cpu_addr;
>>>>>>> -
>>>>>>> -			memcpy_fromio(adev->vce.saved_bo,
>>>>>>> ptr,
>>>>>>> size);
>>>>>>> -		}
>>>>>>> -		drm_dev_exit(idx);
>>>>>>> -	}
>>>>>>> +	int r;
>>>>>>>  
>>>>>>>  	/*
>>>>>>>  	 * Proper cleanups before halting the HW engine:
>>>>>>> @@ -609,25 +585,11 @@ static int vce_v4_0_suspend(struct
>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>  static int vce_v4_0_resume(struct amdgpu_ip_block
>>>>>>> *ip_block)
>>>>>>>  {
>>>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>>>> -	int r, idx;
>>>>>>> -
>>>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>>>> -		return -EINVAL;
>>>>>>> -
>>>>>>> -	if (adev->firmware.load_type ==
>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>> -
>>>>>>> -		if (drm_dev_enter(adev_to_drm(adev),
>>>>>>> &idx)) {
>>>>>>> -			unsigned size =
>>>>>>> amdgpu_bo_size(adev-
>>>>>>>> vce.vcpu_bo);
>>>>>>> -			void *ptr = adev->vce.cpu_addr;
>>>>>>> +	int r;
>>>>>>>  
>>>>>>> -			memcpy_toio(ptr, adev-
>>>>>>>> vce.saved_bo,
>>>>>>> size);
>>>>>>> -			drm_dev_exit(idx);
>>>>>>> -		}
>>>>>>> -	} else {
>>>>>>> -		r = amdgpu_vce_resume(adev);
>>>>>>> -		if (r)
>>>>>>> -			return r;
>>>>>>> -	}
>>>>>>> +	r = amdgpu_vce_resume(adev);
>>>>>>> +	if (r)
>>>>>>> +		return r;
>>>>>>>  
>>>>>>>  	return vce_v4_0_hw_init(ip_block);
>>>>>>>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 13:33               ` Christian König
@ 2025-11-07 13:39                 ` Timur Kristóf
  2025-11-07 13:45                   ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-07 13:39 UTC (permalink / raw)
  To: Christian König, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On Fri, 2025-11-07 at 14:33 +0100, Christian König wrote:
> On 11/7/25 14:31, Timur Kristóf wrote:
> > On Fri, 2025-11-07 at 14:14 +0100, Christian König wrote:
> > > On 11/7/25 11:47, Timur Kristóf wrote:
> > > > On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
> > > > > On 11/7/25 10:53, Timur Kristóf wrote:
> > > > > > On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
> > > > > > > On 11/6/25 19:44, Timur Kristóf wrote:
> > > > > > > > VCE uses the VCPU BO to keep track of currently active
> > > > > > > > encoding sessions. To make sure the VCE functions
> > > > > > > > correctly
> > > > > > > > after suspend/resume, save the VCPU BO to system RAM on
> > > > > > > > suspend and restore it on resume.
> > > > > > > > 
> > > > > > > > Additionally, make sure to keep the VCPU BO pinned.
> > > > > > > > The VCPU BO needs to stay at the same location before
> > > > > > > > and
> > > > > > > > after
> > > > > > > > sleep/resume because the FW code is not relocatable
> > > > > > > > once
> > > > > > > > it's
> > > > > > > > started.
> > > > > > > > 
> > > > > > > > Unfortunately this is not enough to make VCE suspend
> > > > > > > > work
> > > > > > > > when
> > > > > > > > there are encoding sessions active, so don't allow that
> > > > > > > > yet.
> > > > > > > 
> > > > > > > The question if this is the right approach or not can
> > > > > > > only
> > > > > > > Leo
> > > > > > > and
> > > > > > > Ruijing answer.
> > > > > > > 
> > > > > > > If we can get VCE1-3 to keep session working after
> > > > > > > suspend/resume
> > > > > > > we
> > > > > > > should make this change otherwise we should rather make
> > > > > > > all
> > > > > > > old
> > > > > > > sessions invalid after suspend/resume and only re-load
> > > > > > > the
> > > > > > > FW.
> > > > > > > 
> > > > > > > Anyway I think you should make the removal of
> > > > > > > "amdgpu_bo_kmap(adev-
> > > > > > > > vce.vcpu_bo, &cpu_addr);" a separate patch, cause that
> > > > > > > > seems to
> > > > > > > > be a
> > > > > > > good cleanup no matter what.
> > > > > > > 
> > > > > > 
> > > > > > This change is necessary for the VCE1 code when it loads
> > > > > > the
> > > > > > firmware
> > > > > > signature. Without this patch, we would need to call
> > > > > > reserve(),
> > > > > > kmap(),
> > > > > > kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
> > > > > > 
> > > > > > Let me know which approach you prefer.
> > > > > 
> > > > > In this case definately make removal of amdgpu_bo_kmap(adev-
> > > > > > vce.vcpu_bo, &cpu_addr) a separate patch.
> > > > 
> > > > Sorry, can you clarify what you mean?
> > > > Should I just go back to the way things were on the first
> > > > version
> > > > of
> > > > the series, and then clean it up later?
> > > 
> > > Just add a patch (early in the series, probably as first patch)
> > > to
> > > remove amdgpu_bo_kmap(adev-> vce.vcpu_bo, &cpu_addr).
> > 
> > Is it allowed to keep a BO mapped when it is unreserved?
> 
> Yes, as long as it is also pinned.
> 
> IIRC the VCE BO is allocated by amdgpu_bo_create_kernel() and that
> both pins and maps the BO.

I am asking because amdgpu_vce_resume calls amdgpu_bo_unreserve on the
BO, but then vce_v1_0_load_fw_signature needs to access it. So I wonder
if the CPU address of the BO will be still correct when it's
unreserved. Also wonder, do I have to call amdgpu_bo_reserve before
accessing the BO? Or is it enough that it's mapped?

> 
> Regards,
> Christian.
> 
> > 
> > > 
> > > Then put the memset_io() into amdgpu_vce_resume() like you had in
> > > the
> > > first series of the patch so that VCE1 behaves the same as VCE2-
> > > 3.
> > 
> > Ok
> > 
> > > 
> > > And when the series has landed we can clean everything up
> > > depending
> > > on what Leo/Ruijing has decided to do with suspend/resume on
> > > VCE1-3.
> > 
> > Sounds good.
> > 
> > 
> > 
> > > 
> > > 
> > > > 
> > > > > 
> > > > > I want to get initial VCE1 working and landed independent of
> > > > > what
> > > > > Leo/Ruijing say to suspend/resume.
> > > > 
> > > > Agreed.
> > > > 
> > > > > 
> > > > > Can be that suspend/resume is then still broken, but that is
> > > > > also
> > > > > the
> > > > > case for VCE2-3 then.
> > > > 
> > > > Yes, some extra work is going to be needed on top of this patch
> > > > to
> > > > make
> > > > suspend/resume work (if it's possible at all).
> > > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > >  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > >  2 files changed, 24 insertions(+), 72 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > > > index 2297608c5191..4beec5b56c4f 100644
> > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > > > > > > > @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
> > > > > > > > amdgpu_device
> > > > > > > > *adev, unsigned long size)
> > > > > > > >  	if (!adev->vce.fw)
> > > > > > > >  		return -ENOENT;
> > > > > > > >  
> > > > > > > > +	adev->vce.saved_bo = kvmalloc(size,
> > > > > > > > GFP_KERNEL);
> > > > > > > > +	if (!adev->vce.saved_bo)
> > > > > > > > +		return -ENOMEM;
> > > > > > > > +
> > > > > > > >  	r = amdgpu_bo_create_kernel(adev, size,
> > > > > > > > PAGE_SIZE,
> > > > > > > >  				   
> > > > > > > > AMDGPU_GEM_DOMAIN_VRAM
> > > > > > > > > 
> > > > > > > >  				   
> > > > > > > > AMDGPU_GEM_DOMAIN_GTT,
> > > > > > > > @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct
> > > > > > > > amdgpu_device
> > > > > > > > *adev)
> > > > > > > >  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo,
> > > > > > > > &adev-
> > > > > > > > > vce.gpu_addr,
> > > > > > > >  		(void **)&adev->vce.cpu_addr);
> > > > > > > >  
> > > > > > > > +	kvfree(adev->vce.saved_bo);
> > > > > > > > +	adev->vce.saved_bo = NULL;
> > > > > > > > +
> > > > > > > >  	return 0;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
> > > > > > > > amdgpu_device *adev, struct amdgpu_ring *ring)
> > > > > > > >   */
> > > > > > > >  int amdgpu_vce_suspend(struct amdgpu_device *adev)
> > > > > > > >  {
> > > > > > > > -	int i;
> > > > > > > > +	int i, idx;
> > > > > > > >  
> > > > > > > >  	cancel_delayed_work_sync(&adev-
> > > > > > > > >vce.idle_work);
> > > > > > > >  
> > > > > > > >  	if (adev->vce.vcpu_bo == NULL)
> > > > > > > >  		return 0;
> > > > > > > >  
> > > > > > > > +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > > > +		memcpy_fromio(adev->vce.saved_bo,
> > > > > > > > adev-
> > > > > > > > > vce.cpu_addr,
> > > > > > > > +			      amdgpu_bo_size(adev-
> > > > > > > > > vce.vcpu_bo));
> > > > > > > > +		drm_dev_exit(idx);
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > >  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
> > > > > > > >  		if (atomic_read(&adev-
> > > > > > > > >vce.handles[i]))
> > > > > > > >  			break;
> > > > > > > > @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
> > > > > > > > amdgpu_device
> > > > > > > > *adev)
> > > > > > > >   */
> > > > > > > >  int amdgpu_vce_resume(struct amdgpu_device *adev)
> > > > > > > >  {
> > > > > > > > -	void *cpu_addr;
> > > > > > > > -	const struct common_firmware_header *hdr;
> > > > > > > > -	unsigned int offset;
> > > > > > > > -	int r, idx;
> > > > > > > > +	int idx;
> > > > > > > >  
> > > > > > > >  	if (adev->vce.vcpu_bo == NULL)
> > > > > > > >  		return -EINVAL;
> > > > > > > >  
> > > > > > > > -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo,
> > > > > > > > false);
> > > > > > > > -	if (r) {
> > > > > > > > -		dev_err(adev->dev, "(%d) failed to
> > > > > > > > reserve
> > > > > > > > VCE
> > > > > > > > bo\n", r);
> > > > > > > > -		return r;
> > > > > > > > -	}
> > > > > > > > -
> > > > > > > > -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo,
> > > > > > > > &cpu_addr);
> > > > > > > > -	if (r) {
> > > > > > > > -		amdgpu_bo_unreserve(adev-
> > > > > > > > >vce.vcpu_bo);
> > > > > > > > -		dev_err(adev->dev, "(%d) VCE map
> > > > > > > > failed\n",
> > > > > > > > r);
> > > > > > > > -		return r;
> > > > > > > > -	}
> > > > > > > > -
> > > > > > > > -	hdr = (const struct common_firmware_header
> > > > > > > > *)adev-
> > > > > > > > > vce.fw-
> > > > > > > > > data;
> > > > > > > > -	offset = le32_to_cpu(hdr-
> > > > > > > > > ucode_array_offset_bytes);
> > > > > > > > -
> > > > > > > >  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > > > -		memcpy_toio(cpu_addr, adev->vce.fw-
> > > > > > > > >data +
> > > > > > > > offset,
> > > > > > > > -			    adev->vce.fw->size -
> > > > > > > > offset);
> > > > > > > > +		memcpy_toio(adev->vce.cpu_addr, adev-
> > > > > > > > > vce.saved_bo,
> > > > > > > > +			    amdgpu_bo_size(adev-
> > > > > > > > > vce.vcpu_bo));
> > > > > > > >  		drm_dev_exit(idx);
> > > > > > > >  	}
> > > > > > > >  
> > > > > > > > -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
> > > > > > > > -
> > > > > > > > -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
> > > > > > > > -
> > > > > > > >  	return 0;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > > > b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > > > index 2d64002bed61..21b6656b2f41 100644
> > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
> > > > > > > > @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
> > > > > > > > amdgpu_ip_block *ip_block)
> > > > > > > >  		return r;
> > > > > > > >  
> > > > > > > >  	if (adev->firmware.load_type ==
> > > > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > > > -		const struct common_firmware_header
> > > > > > > > *hdr;
> > > > > > > > -		unsigned size = amdgpu_bo_size(adev-
> > > > > > > > > vce.vcpu_bo);
> > > > > > > > -
> > > > > > > > -		adev->vce.saved_bo = kvmalloc(size,
> > > > > > > > GFP_KERNEL);
> > > > > > > > -		if (!adev->vce.saved_bo)
> > > > > > > > -			return -ENOMEM;
> > > > > > > > -
> > > > > > > > -		hdr = (const struct
> > > > > > > > common_firmware_header
> > > > > > > > *)adev-
> > > > > > > > > vce.fw->data;
> > > > > > > > +		const struct common_firmware_header
> > > > > > > > *hdr =
> > > > > > > > +			(const struct
> > > > > > > > common_firmware_header
> > > > > > > > *)adev->vce.fw->data;
> > > > > > > >  		adev-
> > > > > > > > > firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
> > > > > > > > = AMDGPU_UCODE_ID_VCE;
> > > > > > > >  		adev-
> > > > > > > > > firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
> > > > > > > > adev->vce.fw;
> > > > > > > >  		adev->firmware.fw_size +=
> > > > > > > > @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
> > > > > > > > amdgpu_ip_block *ip_block)
> > > > > > > >  	/* free MM table */
> > > > > > > >  	amdgpu_virt_free_mm_table(adev);
> > > > > > > >  
> > > > > > > > -	if (adev->firmware.load_type ==
> > > > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > > > -		kvfree(adev->vce.saved_bo);
> > > > > > > > -		adev->vce.saved_bo = NULL;
> > > > > > > > -	}
> > > > > > > > -
> > > > > > > >  	r = amdgpu_vce_suspend(adev);
> > > > > > > >  	if (r)
> > > > > > > >  		return r;
> > > > > > > > @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
> > > > > > > > amdgpu_ip_block *ip_block)
> > > > > > > >  static int vce_v4_0_suspend(struct amdgpu_ip_block
> > > > > > > > *ip_block)
> > > > > > > >  {
> > > > > > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > > > > > -	int r, idx;
> > > > > > > > -
> > > > > > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > > > > > -		return 0;
> > > > > > > > -
> > > > > > > > -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> > > > > > > > -		if (adev->firmware.load_type ==
> > > > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > > > -			unsigned size =
> > > > > > > > amdgpu_bo_size(adev-
> > > > > > > > > vce.vcpu_bo);
> > > > > > > > -			void *ptr = adev-
> > > > > > > > >vce.cpu_addr;
> > > > > > > > -
> > > > > > > > -			memcpy_fromio(adev-
> > > > > > > > >vce.saved_bo,
> > > > > > > > ptr,
> > > > > > > > size);
> > > > > > > > -		}
> > > > > > > > -		drm_dev_exit(idx);
> > > > > > > > -	}
> > > > > > > > +	int r;
> > > > > > > >  
> > > > > > > >  	/*
> > > > > > > >  	 * Proper cleanups before halting the HW
> > > > > > > > engine:
> > > > > > > > @@ -609,25 +585,11 @@ static int
> > > > > > > > vce_v4_0_suspend(struct
> > > > > > > > amdgpu_ip_block *ip_block)
> > > > > > > >  static int vce_v4_0_resume(struct amdgpu_ip_block
> > > > > > > > *ip_block)
> > > > > > > >  {
> > > > > > > >  	struct amdgpu_device *adev = ip_block->adev;
> > > > > > > > -	int r, idx;
> > > > > > > > -
> > > > > > > > -	if (adev->vce.vcpu_bo == NULL)
> > > > > > > > -		return -EINVAL;
> > > > > > > > -
> > > > > > > > -	if (adev->firmware.load_type ==
> > > > > > > > AMDGPU_FW_LOAD_PSP) {
> > > > > > > > -
> > > > > > > > -		if (drm_dev_enter(adev_to_drm(adev),
> > > > > > > > &idx)) {
> > > > > > > > -			unsigned size =
> > > > > > > > amdgpu_bo_size(adev-
> > > > > > > > > vce.vcpu_bo);
> > > > > > > > -			void *ptr = adev-
> > > > > > > > >vce.cpu_addr;
> > > > > > > > +	int r;
> > > > > > > >  
> > > > > > > > -			memcpy_toio(ptr, adev-
> > > > > > > > > vce.saved_bo,
> > > > > > > > size);
> > > > > > > > -			drm_dev_exit(idx);
> > > > > > > > -		}
> > > > > > > > -	} else {
> > > > > > > > -		r = amdgpu_vce_resume(adev);
> > > > > > > > -		if (r)
> > > > > > > > -			return r;
> > > > > > > > -	}
> > > > > > > > +	r = amdgpu_vce_resume(adev);
> > > > > > > > +	if (r)
> > > > > > > > +		return r;
> > > > > > > >  
> > > > > > > >  	return vce_v4_0_hw_init(ip_block);
> > > > > > > >  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2)
  2025-11-07 13:39                 ` Timur Kristóf
@ 2025-11-07 13:45                   ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2025-11-07 13:45 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu, Dong, Ruijing

On 11/7/25 14:39, Timur Kristóf wrote:
> On Fri, 2025-11-07 at 14:33 +0100, Christian König wrote:
>> On 11/7/25 14:31, Timur Kristóf wrote:
>>> On Fri, 2025-11-07 at 14:14 +0100, Christian König wrote:
>>>> On 11/7/25 11:47, Timur Kristóf wrote:
>>>>> On Fri, 2025-11-07 at 11:01 +0100, Christian König wrote:
>>>>>> On 11/7/25 10:53, Timur Kristóf wrote:
>>>>>>> On Fri, 2025-11-07 at 10:49 +0100, Christian König wrote:
>>>>>>>> On 11/6/25 19:44, Timur Kristóf wrote:
>>>>>>>>> VCE uses the VCPU BO to keep track of currently active
>>>>>>>>> encoding sessions. To make sure the VCE functions
>>>>>>>>> correctly
>>>>>>>>> after suspend/resume, save the VCPU BO to system RAM on
>>>>>>>>> suspend and restore it on resume.
>>>>>>>>>
>>>>>>>>> Additionally, make sure to keep the VCPU BO pinned.
>>>>>>>>> The VCPU BO needs to stay at the same location before
>>>>>>>>> and
>>>>>>>>> after
>>>>>>>>> sleep/resume because the FW code is not relocatable
>>>>>>>>> once
>>>>>>>>> it's
>>>>>>>>> started.
>>>>>>>>>
>>>>>>>>> Unfortunately this is not enough to make VCE suspend
>>>>>>>>> work
>>>>>>>>> when
>>>>>>>>> there are encoding sessions active, so don't allow that
>>>>>>>>> yet.
>>>>>>>>
>>>>>>>> The question if this is the right approach or not can
>>>>>>>> only
>>>>>>>> Leo
>>>>>>>> and
>>>>>>>> Ruijing answer.
>>>>>>>>
>>>>>>>> If we can get VCE1-3 to keep session working after
>>>>>>>> suspend/resume
>>>>>>>> we
>>>>>>>> should make this change otherwise we should rather make
>>>>>>>> all
>>>>>>>> old
>>>>>>>> sessions invalid after suspend/resume and only re-load
>>>>>>>> the
>>>>>>>> FW.
>>>>>>>>
>>>>>>>> Anyway I think you should make the removal of
>>>>>>>> "amdgpu_bo_kmap(adev-
>>>>>>>>> vce.vcpu_bo, &cpu_addr);" a separate patch, cause that
>>>>>>>>> seems to
>>>>>>>>> be a
>>>>>>>> good cleanup no matter what.
>>>>>>>>
>>>>>>>
>>>>>>> This change is necessary for the VCE1 code when it loads
>>>>>>> the
>>>>>>> firmware
>>>>>>> signature. Without this patch, we would need to call
>>>>>>> reserve(),
>>>>>>> kmap(),
>>>>>>> kunmap(), kunreserve() in vce_v1_0_load_fw_signature().
>>>>>>>
>>>>>>> Let me know which approach you prefer.
>>>>>>
>>>>>> In this case definately make removal of amdgpu_bo_kmap(adev-
>>>>>>> vce.vcpu_bo, &cpu_addr) a separate patch.
>>>>>
>>>>> Sorry, can you clarify what you mean?
>>>>> Should I just go back to the way things were on the first
>>>>> version
>>>>> of
>>>>> the series, and then clean it up later?
>>>>
>>>> Just add a patch (early in the series, probably as first patch)
>>>> to
>>>> remove amdgpu_bo_kmap(adev-> vce.vcpu_bo, &cpu_addr).
>>>
>>> Is it allowed to keep a BO mapped when it is unreserved?
>>
>> Yes, as long as it is also pinned.
>>
>> IIRC the VCE BO is allocated by amdgpu_bo_create_kernel() and that
>> both pins and maps the BO.
> 
> I am asking because amdgpu_vce_resume calls amdgpu_bo_unreserve on the
> BO, but then vce_v1_0_load_fw_signature needs to access it. So I wonder
> if the CPU address of the BO will be still correct when it's
> unreserved. Also wonder, do I have to call amdgpu_bo_reserve before
> accessing the BO? Or is it enough that it's mapped?

It should be enough when it is mapped.

That reserve/unreserve code is most likely just a leftover from radeon. On radeon we pinned/unpinned the BO on suspend/resume because we didn't know that the FW is not relocateable once started. Well that was a long long time ago :)

So that reserve/unreserve can be removed as well.

Regards,
Christian.

> 
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>> Then put the memset_io() into amdgpu_vce_resume() like you had in
>>>> the
>>>> first series of the patch so that VCE1 behaves the same as VCE2-
>>>> 3.
>>>
>>> Ok
>>>
>>>>
>>>> And when the series has landed we can clean everything up
>>>> depending
>>>> on what Leo/Ruijing has decided to do with suspend/resume on
>>>> VCE1-3.
>>>
>>> Sounds good.
>>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I want to get initial VCE1 working and landed independent of
>>>>>> what
>>>>>> Leo/Ruijing say to suspend/resume.
>>>>>
>>>>> Agreed.
>>>>>
>>>>>>
>>>>>> Can be that suspend/resume is then still broken, but that is
>>>>>> also
>>>>>> the
>>>>>> case for VCE2-3 then.
>>>>>
>>>>> Yes, some extra work is going to be needed on top of this patch
>>>>> to
>>>>> make
>>>>> suspend/resume work (if it's possible at all).
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 44 ++++++++-
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c   | 52 ++++-----
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>>  2 files changed, 24 insertions(+), 72 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>>>> index 2297608c5191..4beec5b56c4f 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>>>>>> @@ -206,6 +206,10 @@ int amdgpu_vce_sw_init(struct
>>>>>>>>> amdgpu_device
>>>>>>>>> *adev, unsigned long size)
>>>>>>>>>  	if (!adev->vce.fw)
>>>>>>>>>  		return -ENOENT;
>>>>>>>>>  
>>>>>>>>> +	adev->vce.saved_bo = kvmalloc(size,
>>>>>>>>> GFP_KERNEL);
>>>>>>>>> +	if (!adev->vce.saved_bo)
>>>>>>>>> +		return -ENOMEM;
>>>>>>>>> +
>>>>>>>>>  	r = amdgpu_bo_create_kernel(adev, size,
>>>>>>>>> PAGE_SIZE,
>>>>>>>>>  				   
>>>>>>>>> AMDGPU_GEM_DOMAIN_VRAM
>>>>>>>>>>
>>>>>>>>>  				   
>>>>>>>>> AMDGPU_GEM_DOMAIN_GTT,
>>>>>>>>> @@ -254,6 +258,9 @@ int amdgpu_vce_sw_fini(struct
>>>>>>>>> amdgpu_device
>>>>>>>>> *adev)
>>>>>>>>>  	amdgpu_bo_free_kernel(&adev->vce.vcpu_bo,
>>>>>>>>> &adev-
>>>>>>>>>> vce.gpu_addr,
>>>>>>>>>  		(void **)&adev->vce.cpu_addr);
>>>>>>>>>  
>>>>>>>>> +	kvfree(adev->vce.saved_bo);
>>>>>>>>> +	adev->vce.saved_bo = NULL;
>>>>>>>>> +
>>>>>>>>>  	return 0;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> @@ -290,13 +297,19 @@ int amdgpu_vce_entity_init(struct
>>>>>>>>> amdgpu_device *adev, struct amdgpu_ring *ring)
>>>>>>>>>   */
>>>>>>>>>  int amdgpu_vce_suspend(struct amdgpu_device *adev)
>>>>>>>>>  {
>>>>>>>>> -	int i;
>>>>>>>>> +	int i, idx;
>>>>>>>>>  
>>>>>>>>>  	cancel_delayed_work_sync(&adev-
>>>>>>>>>> vce.idle_work);
>>>>>>>>>  
>>>>>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>>>>>  		return 0;
>>>>>>>>>  
>>>>>>>>> +	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>>>> +		memcpy_fromio(adev->vce.saved_bo,
>>>>>>>>> adev-
>>>>>>>>>> vce.cpu_addr,
>>>>>>>>> +			      amdgpu_bo_size(adev-
>>>>>>>>>> vce.vcpu_bo));
>>>>>>>>> +		drm_dev_exit(idx);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>>  	for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i)
>>>>>>>>>  		if (atomic_read(&adev-
>>>>>>>>>> vce.handles[i]))
>>>>>>>>>  			break;
>>>>>>>>> @@ -316,40 +329,17 @@ int amdgpu_vce_suspend(struct
>>>>>>>>> amdgpu_device
>>>>>>>>> *adev)
>>>>>>>>>   */
>>>>>>>>>  int amdgpu_vce_resume(struct amdgpu_device *adev)
>>>>>>>>>  {
>>>>>>>>> -	void *cpu_addr;
>>>>>>>>> -	const struct common_firmware_header *hdr;
>>>>>>>>> -	unsigned int offset;
>>>>>>>>> -	int r, idx;
>>>>>>>>> +	int idx;
>>>>>>>>>  
>>>>>>>>>  	if (adev->vce.vcpu_bo == NULL)
>>>>>>>>>  		return -EINVAL;
>>>>>>>>>  
>>>>>>>>> -	r = amdgpu_bo_reserve(adev->vce.vcpu_bo,
>>>>>>>>> false);
>>>>>>>>> -	if (r) {
>>>>>>>>> -		dev_err(adev->dev, "(%d) failed to
>>>>>>>>> reserve
>>>>>>>>> VCE
>>>>>>>>> bo\n", r);
>>>>>>>>> -		return r;
>>>>>>>>> -	}
>>>>>>>>> -
>>>>>>>>> -	r = amdgpu_bo_kmap(adev->vce.vcpu_bo,
>>>>>>>>> &cpu_addr);
>>>>>>>>> -	if (r) {
>>>>>>>>> -		amdgpu_bo_unreserve(adev-
>>>>>>>>>> vce.vcpu_bo);
>>>>>>>>> -		dev_err(adev->dev, "(%d) VCE map
>>>>>>>>> failed\n",
>>>>>>>>> r);
>>>>>>>>> -		return r;
>>>>>>>>> -	}
>>>>>>>>> -
>>>>>>>>> -	hdr = (const struct common_firmware_header
>>>>>>>>> *)adev-
>>>>>>>>>> vce.fw-
>>>>>>>>>> data;
>>>>>>>>> -	offset = le32_to_cpu(hdr-
>>>>>>>>>> ucode_array_offset_bytes);
>>>>>>>>> -
>>>>>>>>>  	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>>>> -		memcpy_toio(cpu_addr, adev->vce.fw-
>>>>>>>>>> data +
>>>>>>>>> offset,
>>>>>>>>> -			    adev->vce.fw->size -
>>>>>>>>> offset);
>>>>>>>>> +		memcpy_toio(adev->vce.cpu_addr, adev-
>>>>>>>>>> vce.saved_bo,
>>>>>>>>> +			    amdgpu_bo_size(adev-
>>>>>>>>>> vce.vcpu_bo));
>>>>>>>>>  		drm_dev_exit(idx);
>>>>>>>>>  	}
>>>>>>>>>  
>>>>>>>>> -	amdgpu_bo_kunmap(adev->vce.vcpu_bo);
>>>>>>>>> -
>>>>>>>>> -	amdgpu_bo_unreserve(adev->vce.vcpu_bo);
>>>>>>>>> -
>>>>>>>>>  	return 0;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>>>> index 2d64002bed61..21b6656b2f41 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
>>>>>>>>> @@ -448,14 +448,8 @@ static int vce_v4_0_sw_init(struct
>>>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>>>  		return r;
>>>>>>>>>  
>>>>>>>>>  	if (adev->firmware.load_type ==
>>>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>>>> -		const struct common_firmware_header
>>>>>>>>> *hdr;
>>>>>>>>> -		unsigned size = amdgpu_bo_size(adev-
>>>>>>>>>> vce.vcpu_bo);
>>>>>>>>> -
>>>>>>>>> -		adev->vce.saved_bo = kvmalloc(size,
>>>>>>>>> GFP_KERNEL);
>>>>>>>>> -		if (!adev->vce.saved_bo)
>>>>>>>>> -			return -ENOMEM;
>>>>>>>>> -
>>>>>>>>> -		hdr = (const struct
>>>>>>>>> common_firmware_header
>>>>>>>>> *)adev-
>>>>>>>>>> vce.fw->data;
>>>>>>>>> +		const struct common_firmware_header
>>>>>>>>> *hdr =
>>>>>>>>> +			(const struct
>>>>>>>>> common_firmware_header
>>>>>>>>> *)adev->vce.fw->data;
>>>>>>>>>  		adev-
>>>>>>>>>> firmware.ucode[AMDGPU_UCODE_ID_VCE].ucode_id
>>>>>>>>> = AMDGPU_UCODE_ID_VCE;
>>>>>>>>>  		adev-
>>>>>>>>>> firmware.ucode[AMDGPU_UCODE_ID_VCE].fw =
>>>>>>>>> adev->vce.fw;
>>>>>>>>>  		adev->firmware.fw_size +=
>>>>>>>>> @@ -506,11 +500,6 @@ static int vce_v4_0_sw_fini(struct
>>>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>>>  	/* free MM table */
>>>>>>>>>  	amdgpu_virt_free_mm_table(adev);
>>>>>>>>>  
>>>>>>>>> -	if (adev->firmware.load_type ==
>>>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>>>> -		kvfree(adev->vce.saved_bo);
>>>>>>>>> -		adev->vce.saved_bo = NULL;
>>>>>>>>> -	}
>>>>>>>>> -
>>>>>>>>>  	r = amdgpu_vce_suspend(adev);
>>>>>>>>>  	if (r)
>>>>>>>>>  		return r;
>>>>>>>>> @@ -561,20 +550,7 @@ static int vce_v4_0_hw_fini(struct
>>>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>>>  static int vce_v4_0_suspend(struct amdgpu_ip_block
>>>>>>>>> *ip_block)
>>>>>>>>>  {
>>>>>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>>>>>> -	int r, idx;
>>>>>>>>> -
>>>>>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>>>>>> -		return 0;
>>>>>>>>> -
>>>>>>>>> -	if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>>>>>>>>> -		if (adev->firmware.load_type ==
>>>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>>>> -			unsigned size =
>>>>>>>>> amdgpu_bo_size(adev-
>>>>>>>>>> vce.vcpu_bo);
>>>>>>>>> -			void *ptr = adev-
>>>>>>>>>> vce.cpu_addr;
>>>>>>>>> -
>>>>>>>>> -			memcpy_fromio(adev-
>>>>>>>>>> vce.saved_bo,
>>>>>>>>> ptr,
>>>>>>>>> size);
>>>>>>>>> -		}
>>>>>>>>> -		drm_dev_exit(idx);
>>>>>>>>> -	}
>>>>>>>>> +	int r;
>>>>>>>>>  
>>>>>>>>>  	/*
>>>>>>>>>  	 * Proper cleanups before halting the HW
>>>>>>>>> engine:
>>>>>>>>> @@ -609,25 +585,11 @@ static int
>>>>>>>>> vce_v4_0_suspend(struct
>>>>>>>>> amdgpu_ip_block *ip_block)
>>>>>>>>>  static int vce_v4_0_resume(struct amdgpu_ip_block
>>>>>>>>> *ip_block)
>>>>>>>>>  {
>>>>>>>>>  	struct amdgpu_device *adev = ip_block->adev;
>>>>>>>>> -	int r, idx;
>>>>>>>>> -
>>>>>>>>> -	if (adev->vce.vcpu_bo == NULL)
>>>>>>>>> -		return -EINVAL;
>>>>>>>>> -
>>>>>>>>> -	if (adev->firmware.load_type ==
>>>>>>>>> AMDGPU_FW_LOAD_PSP) {
>>>>>>>>> -
>>>>>>>>> -		if (drm_dev_enter(adev_to_drm(adev),
>>>>>>>>> &idx)) {
>>>>>>>>> -			unsigned size =
>>>>>>>>> amdgpu_bo_size(adev-
>>>>>>>>>> vce.vcpu_bo);
>>>>>>>>> -			void *ptr = adev-
>>>>>>>>>> vce.cpu_addr;
>>>>>>>>> +	int r;
>>>>>>>>>  
>>>>>>>>> -			memcpy_toio(ptr, adev-
>>>>>>>>>> vce.saved_bo,
>>>>>>>>> size);
>>>>>>>>> -			drm_dev_exit(idx);
>>>>>>>>> -		}
>>>>>>>>> -	} else {
>>>>>>>>> -		r = amdgpu_vce_resume(adev);
>>>>>>>>> -		if (r)
>>>>>>>>> -			return r;
>>>>>>>>> -	}
>>>>>>>>> +	r = amdgpu_vce_resume(adev);
>>>>>>>>> +	if (r)
>>>>>>>>> +		return r;
>>>>>>>>>  
>>>>>>>>>  	return vce_v4_0_hw_init(ip_block);
>>>>>>>>>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 07/13] drm/amdgpu/vce1: Clean up register definitions
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (5 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 08/13] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

The sid.h header contained some VCE1 register definitions, but
they were using byte offsets (probably copied from the old radeon
driver). Move all of these to the proper VCE1 headers and ensure
they are in dword offsets.

Also add the register definitions that we need for the
firmware validation mechanism in VCE1.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sid.h              | 40 -------------------
 .../drm/amd/include/asic_reg/vce/vce_1_0_d.h  |  5 +++
 .../include/asic_reg/vce/vce_1_0_sh_mask.h    | 10 +++++
 3 files changed, 15 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sid.h b/drivers/gpu/drm/amd/amdgpu/sid.h
index cbd4f8951cfa..561462a8332e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sid.h
+++ b/drivers/gpu/drm/amd/amdgpu/sid.h
@@ -582,45 +582,6 @@
 #define	DMA_PACKET_NOP					  0xf
 
 /* VCE */
-#define VCE_STATUS					0x20004
-#define VCE_VCPU_CNTL					0x20014
-#define		VCE_CLK_EN				(1 << 0)
-#define VCE_VCPU_CACHE_OFFSET0				0x20024
-#define VCE_VCPU_CACHE_SIZE0				0x20028
-#define VCE_VCPU_CACHE_OFFSET1				0x2002c
-#define VCE_VCPU_CACHE_SIZE1				0x20030
-#define VCE_VCPU_CACHE_OFFSET2				0x20034
-#define VCE_VCPU_CACHE_SIZE2				0x20038
-#define VCE_SOFT_RESET					0x20120
-#define 	VCE_ECPU_SOFT_RESET			(1 << 0)
-#define 	VCE_FME_SOFT_RESET			(1 << 2)
-#define VCE_RB_BASE_LO2					0x2016c
-#define VCE_RB_BASE_HI2					0x20170
-#define VCE_RB_SIZE2					0x20174
-#define VCE_RB_RPTR2					0x20178
-#define VCE_RB_WPTR2					0x2017c
-#define VCE_RB_BASE_LO					0x20180
-#define VCE_RB_BASE_HI					0x20184
-#define VCE_RB_SIZE					0x20188
-#define VCE_RB_RPTR					0x2018c
-#define VCE_RB_WPTR					0x20190
-#define VCE_CLOCK_GATING_A				0x202f8
-#define VCE_CLOCK_GATING_B				0x202fc
-#define VCE_UENC_CLOCK_GATING				0x205bc
-#define VCE_UENC_REG_CLOCK_GATING			0x205c0
-#define VCE_FW_REG_STATUS				0x20e10
-#	define VCE_FW_REG_STATUS_BUSY			(1 << 0)
-#	define VCE_FW_REG_STATUS_PASS			(1 << 3)
-#	define VCE_FW_REG_STATUS_DONE			(1 << 11)
-#define VCE_LMI_FW_START_KEYSEL				0x20e18
-#define VCE_LMI_FW_PERIODIC_CTRL			0x20e20
-#define VCE_LMI_CTRL2					0x20e74
-#define VCE_LMI_CTRL					0x20e98
-#define VCE_LMI_VM_CTRL					0x20ea0
-#define VCE_LMI_SWAP_CNTL				0x20eb4
-#define VCE_LMI_SWAP_CNTL1				0x20eb8
-#define VCE_LMI_CACHE_CTRL				0x20ef4
-
 #define VCE_CMD_NO_OP					0x00000000
 #define VCE_CMD_END					0x00000001
 #define VCE_CMD_IB					0x00000002
@@ -629,7 +590,6 @@
 #define VCE_CMD_IB_AUTO					0x00000005
 #define VCE_CMD_SEMAPHORE				0x00000006
 
-
 //#dce stupp
 /* display controller offsets used for crtc/cur/lut/grph/viewport/etc. */
 #define CRTC0_REGISTER_OFFSET                 (0x1b7c - 0x1b7c) //(0x6df0 - 0x6df0)/4
diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
index 2176548e9203..9778822dd2a0 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_d.h
@@ -60,5 +60,10 @@
 #define mmVCE_VCPU_CACHE_SIZE1 0x800C
 #define mmVCE_VCPU_CACHE_SIZE2 0x800E
 #define mmVCE_VCPU_CNTL 0x8005
+#define mmVCE_VCPU_SCRATCH7 0x8037
+#define mmVCE_FW_REG_STATUS 0x8384
+#define mmVCE_LMI_FW_PERIODIC_CTRL 0x8388
+#define mmVCE_LMI_FW_START_KEYSEL 0x8386
+
 
 #endif
diff --git a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
index ea5b26b11cb1..1f82d6f5abde 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/vce/vce_1_0_sh_mask.h
@@ -61,6 +61,8 @@
 #define VCE_RB_WPTR__RB_WPTR__SHIFT 0x00000004
 #define VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK 0x00000001L
 #define VCE_SOFT_RESET__ECPU_SOFT_RESET__SHIFT 0x00000000
+#define VCE_SOFT_RESET__FME_SOFT_RESET_MASK 0x00000004L
+#define VCE_SOFT_RESET__FME_SOFT_RESET__SHIFT 0x00000002
 #define VCE_STATUS__JOB_BUSY_MASK 0x00000001L
 #define VCE_STATUS__JOB_BUSY__SHIFT 0x00000000
 #define VCE_STATUS__UENC_BUSY_MASK 0x00000100L
@@ -95,5 +97,13 @@
 #define VCE_VCPU_CNTL__CLK_EN__SHIFT 0x00000000
 #define VCE_VCPU_CNTL__RBBM_SOFT_RESET_MASK 0x00040000L
 #define VCE_VCPU_CNTL__RBBM_SOFT_RESET__SHIFT 0x00000012
+#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK 0x00010000
+#define VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_SHIFT 0x00000010
+#define VCE_FW_REG_STATUS__BUSY_MASK 0x0000001
+#define VCE_FW_REG_STATUS__BUSY__SHIFT 0x0000001
+#define VCE_FW_REG_STATUS__PASS_MASK 0x0000008
+#define VCE_FW_REG_STATUS__PASS__SHIFT 0x0000003
+#define VCE_FW_REG_STATUS__DONE_MASK 0x0000800
+#define VCE_FW_REG_STATUS__DONE__SHIFT 0x000000b
 
 #endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/13] drm/amdgpu/vce1: Load VCE1 firmware
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (6 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 07/13] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 09/13] drm/amdgpu/vce1: Implement VCE1 IP block (v2) Timur Kristóf
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Load VCE1 firmware using amdgpu_ucode_request, just like
it is done for other VCE versions.

All SI chips share the same VCE1 firmware file: vce_1_0_0.bin
which will be sent to linux-firmware soon.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 4beec5b56c4f..2761c096c4cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -41,6 +41,9 @@
 #define VCE_IDLE_TIMEOUT	msecs_to_jiffies(1000)
 
 /* Firmware Names */
+#ifdef CONFIG_DRM_AMDGPU_SI
+#define FIRMWARE_VCE_V1_0	"amdgpu/vce_1_0_0.bin"
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 #define FIRMWARE_BONAIRE	"amdgpu/bonaire_vce.bin"
 #define FIRMWARE_KABINI	"amdgpu/kabini_vce.bin"
@@ -61,6 +64,9 @@
 #define FIRMWARE_VEGA12		"amdgpu/vega12_vce.bin"
 #define FIRMWARE_VEGA20		"amdgpu/vega20_vce.bin"
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+MODULE_FIRMWARE(FIRMWARE_VCE_V1_0);
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 MODULE_FIRMWARE(FIRMWARE_BONAIRE);
 MODULE_FIRMWARE(FIRMWARE_KABINI);
@@ -99,6 +105,12 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 static const char *amdgpu_vce_firmware_name(struct amdgpu_device *adev)
 {
 	switch (adev->asic_type) {
+#ifdef CONFIG_DRM_AMDGPU_SI
+	case CHIP_PITCAIRN:
+	case CHIP_TAHITI:
+	case CHIP_VERDE:
+		return FIRMWARE_VCE_V1_0;
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 	case CHIP_BONAIRE:
 		return FIRMWARE_BONAIRE;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/13] drm/amdgpu/vce1: Implement VCE1 IP block (v2)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (7 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 08/13] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3) Timur Kristóf
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Implement the necessary functionality to support the VCE1.
This implementation is based on:

- VCE2 code from amdgpu
- VCE1 code from radeon (the old driver)
- Some trial and error

A subsequent commit will ensure correct mapping for
the VCPU BO, which will make this actually work.

v2:
- Use memset_io more.
- Use memcpy_toio more.
- Remove __func__ from warnings.
- Don't reserve and map the VCPU BO anymore.
- Add empty line to multi-line comments

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h |   1 +
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c   | 784 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h   |  32 +
 4 files changed, 818 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/vce_v1_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index ebe08947c5a3..c88760fb52ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -78,7 +78,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o \
 	dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o vce_v2_0.o
 
 amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o dce_v6_0.o \
-	uvd_v3_1.o
+	uvd_v3_1.o vce_v1_0.o
 
 amdgpu-y += \
 	vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 22acd7b35945..050783802623 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -51,6 +51,7 @@ struct amdgpu_vce {
 	struct drm_sched_entity	entity;
 	uint32_t                srbm_soft_reset;
 	unsigned		num_rings;
+	uint32_t		keyselect;
 };
 
 int amdgpu_vce_early_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
new file mode 100644
index 000000000000..bf9f943852cb
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -0,0 +1,784 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2013 Advanced Micro Devices, Inc.
+ * Copyright 2025 Valve Corporation
+ * Copyright 2025 Alexandre Demers
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * Authors: Christian König <christian.koenig@amd.com>
+ *          Timur Kristóf <timur.kristof@gmail.com>
+ *          Alexandre Demers <alexandre.f.demers@gmail.com>
+ */
+
+#include <linux/firmware.h>
+
+#include "amdgpu.h"
+#include "amdgpu_vce.h"
+#include "sid.h"
+#include "vce_v1_0.h"
+#include "vce/vce_1_0_d.h"
+#include "vce/vce_1_0_sh_mask.h"
+#include "oss/oss_1_0_d.h"
+#include "oss/oss_1_0_sh_mask.h"
+
+#define VCE_V1_0_FW_SIZE	(256 * 1024)
+#define VCE_V1_0_STACK_SIZE	(64 * 1024)
+#define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
+#define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
+
+static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
+static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
+
+struct vce_v1_0_fw_signature {
+	int32_t offset;
+	uint32_t length;
+	int32_t number;
+	struct {
+		uint32_t chip_id;
+		uint32_t keyselect;
+		uint32_t nonce[4];
+		uint32_t sigval[4];
+	} val[8];
+};
+
+/**
+ * vce_v1_0_ring_get_rptr - get read pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Returns the current hardware read pointer
+ */
+static uint64_t vce_v1_0_ring_get_rptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		return RREG32(mmVCE_RB_RPTR);
+	else
+		return RREG32(mmVCE_RB_RPTR2);
+}
+
+/**
+ * vce_v1_0_ring_get_wptr - get write pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Returns the current hardware write pointer
+ */
+static uint64_t vce_v1_0_ring_get_wptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		return RREG32(mmVCE_RB_WPTR);
+	else
+		return RREG32(mmVCE_RB_WPTR2);
+}
+
+/**
+ * vce_v1_0_ring_set_wptr - set write pointer
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Commits the write pointer to the hardware
+ */
+static void vce_v1_0_ring_set_wptr(struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->me == 0)
+		WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
+	else
+		WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
+}
+
+static int vce_v1_0_lmi_clean(struct amdgpu_device *adev)
+{
+	int i, j;
+
+	for (i = 0; i < 10; ++i) {
+		for (j = 0; j < 100; ++j) {
+			if (RREG32(mmVCE_LMI_STATUS) & 0x337f)
+				return 0;
+
+			mdelay(10);
+		}
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int vce_v1_0_firmware_loaded(struct amdgpu_device *adev)
+{
+	int i, j;
+
+	for (i = 0; i < 10; ++i) {
+		for (j = 0; j < 100; ++j) {
+			if (RREG32(mmVCE_STATUS) & VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK)
+				return 0;
+			mdelay(10);
+		}
+
+		dev_err(adev->dev, "VCE not responding, trying to reset the ECPU\n");
+
+		WREG32_P(mmVCE_SOFT_RESET,
+			VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK,
+			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
+		mdelay(10);
+		WREG32_P(mmVCE_SOFT_RESET, 0,
+			~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK);
+		mdelay(10);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static void vce_v1_0_init_cg(struct amdgpu_device *adev)
+{
+	u32 tmp;
+
+	tmp = RREG32(mmVCE_CLOCK_GATING_A);
+	tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+	WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+	tmp = RREG32(mmVCE_CLOCK_GATING_B);
+	tmp |= 0x1e;
+	tmp &= ~0xe100e1;
+	WREG32(mmVCE_CLOCK_GATING_B, tmp);
+
+	tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+	tmp &= ~0xff9ff000;
+	WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+	tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+	tmp &= ~0x3ff;
+	WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+}
+
+/**
+ * vce_v1_0_load_fw_signature - load firmware signature into VCPU BO
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * The VCE1 firmware validation mechanism needs a firmware signature.
+ * This function finds the signature appropriate for the current
+ * ASIC and writes that into the VCPU BO.
+ */
+static int vce_v1_0_load_fw_signature(struct amdgpu_device *adev)
+{
+	const struct common_firmware_header *hdr;
+	struct vce_v1_0_fw_signature *sign;
+	unsigned int ucode_offset;
+	uint32_t chip_id;
+	u32 *cpu_addr;
+	int i;
+
+	hdr = (const struct common_firmware_header *)adev->vce.fw->data;
+	ucode_offset = le32_to_cpu(hdr->ucode_array_offset_bytes);
+	cpu_addr = adev->vce.cpu_addr;
+
+	sign = (void *)adev->vce.fw->data + ucode_offset;
+
+	switch (adev->asic_type) {
+	case CHIP_TAHITI:
+		chip_id = 0x01000014;
+		break;
+	case CHIP_VERDE:
+		chip_id = 0x01000015;
+		break;
+	case CHIP_PITCAIRN:
+		chip_id = 0x01000016;
+		break;
+	default:
+		dev_err(adev->dev, "asic_type %#010x was not found!", adev->asic_type);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < le32_to_cpu(sign->number); ++i) {
+		if (le32_to_cpu(sign->val[i].chip_id) == chip_id)
+			break;
+	}
+
+	if (i == le32_to_cpu(sign->number)) {
+		dev_err(adev->dev, "chip_id 0x%x for %s was not found in VCE firmware",
+			chip_id, amdgpu_asic_name[adev->asic_type]);
+		return -EINVAL;
+	}
+
+	cpu_addr += (256 - 64) / 4;
+	memcpy_toio(&cpu_addr[0], &sign->val[i].nonce[0], 16);
+	cpu_addr[4] = cpu_to_le32(le32_to_cpu(sign->length) + 64);
+
+	memset_io(&cpu_addr[5], 0, 44);
+	memcpy_toio(&cpu_addr[16], &sign[1], hdr->ucode_size_bytes - sizeof(*sign));
+
+	cpu_addr += (le32_to_cpu(sign->length) + 64) / 4;
+	memcpy_toio(&cpu_addr[0], &sign->val[i].sigval[0], 16);
+
+	adev->vce.keyselect = le32_to_cpu(sign->val[i].keyselect);
+
+	return 0;
+}
+
+static int vce_v1_0_wait_for_fw_validation(struct amdgpu_device *adev)
+{
+	int i;
+
+	dev_dbg(adev->dev, "VCE keyselect: %d", adev->vce.keyselect);
+	WREG32(mmVCE_LMI_FW_START_KEYSEL, adev->vce.keyselect);
+
+	for (i = 0; i < 10; ++i) {
+		mdelay(10);
+		if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)
+			break;
+	}
+
+	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__DONE_MASK)) {
+		dev_err(adev->dev, "VCE FW validation timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__PASS_MASK)) {
+		dev_err(adev->dev, "VCE FW validation failed\n");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < 10; ++i) {
+		mdelay(10);
+		if (!(RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK))
+			break;
+	}
+
+	if (RREG32(mmVCE_FW_REG_STATUS) & VCE_FW_REG_STATUS__BUSY_MASK) {
+		dev_err(adev->dev, "VCE FW busy timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_mc_resume(struct amdgpu_device *adev)
+{
+	uint32_t offset;
+	uint32_t size;
+
+	/*
+	 * When the keyselect is already set, don't perturb VCE FW.
+	 * Validation seems to always fail the second time.
+	 */
+	if (RREG32(mmVCE_LMI_FW_START_KEYSEL)) {
+		dev_dbg(adev->dev, "keyselect already set: 0x%x (on CPU: 0x%x)\n",
+			RREG32(mmVCE_LMI_FW_START_KEYSEL), adev->vce.keyselect);
+
+		WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
+		return 0;
+	}
+
+	WREG32_P(mmVCE_CLOCK_GATING_A, 0, ~(1 << 16));
+	WREG32_P(mmVCE_UENC_CLOCK_GATING, 0x1FF000, ~0xFF9FF000);
+	WREG32_P(mmVCE_UENC_REG_CLOCK_GATING, 0x3F, ~0x3F);
+	WREG32(mmVCE_CLOCK_GATING_B, 0);
+
+	WREG32_P(mmVCE_LMI_FW_PERIODIC_CTRL, 0x4, ~0x4);
+
+	WREG32(mmVCE_LMI_CTRL, 0x00398000);
+
+	WREG32_P(mmVCE_LMI_CACHE_CTRL, 0x0, ~0x1);
+	WREG32(mmVCE_LMI_SWAP_CNTL, 0);
+	WREG32(mmVCE_LMI_SWAP_CNTL1, 0);
+	WREG32(mmVCE_LMI_VM_CTRL, 0);
+
+	WREG32(mmVCE_VCPU_SCRATCH7, AMDGPU_MAX_VCE_HANDLES);
+
+	offset =  adev->vce.gpu_addr + AMDGPU_VCE_FIRMWARE_OFFSET;
+	size = VCE_V1_0_FW_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET0, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE0, size);
+
+	offset += size;
+	size = VCE_V1_0_STACK_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET1, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE1, size);
+
+	offset += size;
+	size = VCE_V1_0_DATA_SIZE;
+	WREG32(mmVCE_VCPU_CACHE_OFFSET2, offset & 0x7fffffff);
+	WREG32(mmVCE_VCPU_CACHE_SIZE2, size);
+
+	WREG32_P(mmVCE_LMI_CTRL2, 0x0, ~0x100);
+
+	return vce_v1_0_wait_for_fw_validation(adev);
+}
+
+/**
+ * vce_v1_0_is_idle() - Check idle status of VCE1 IP block
+ *
+ * @ip_block: amdgpu_ip_block pointer
+ *
+ * Check whether VCE is busy according to VCE_STATUS.
+ * Also check whether the SRBM thinks VCE is busy, although
+ * SRBM_STATUS.VCE_BUSY seems to be bogus because it
+ * appears to mirror the VCE_STATUS.VCPU_REPORT_FW_LOADED bit.
+ */
+static bool vce_v1_0_is_idle(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	bool busy =
+		(RREG32(mmVCE_STATUS) & (VCE_STATUS__JOB_BUSY_MASK | VCE_STATUS__UENC_BUSY_MASK)) ||
+		(RREG32(mmSRBM_STATUS2) & SRBM_STATUS2__VCE_BUSY_MASK);
+
+	return !busy;
+}
+
+static int vce_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	unsigned int i;
+
+	for (i = 0; i < adev->usec_timeout; i++) {
+		udelay(1);
+		if (vce_v1_0_is_idle(ip_block))
+			return 0;
+	}
+	return -ETIMEDOUT;
+}
+
+/**
+ * vce_v1_0_start - start VCE block
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Setup and start the VCE block
+ */
+static int vce_v1_0_start(struct amdgpu_device *adev)
+{
+	struct amdgpu_ring *ring;
+	int r;
+
+	WREG32_P(mmVCE_STATUS, 1, ~1);
+
+	r = vce_v1_0_mc_resume(adev);
+	if (r)
+		return r;
+
+	ring = &adev->vce.ring[0];
+	WREG32(mmVCE_RB_RPTR, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_WPTR, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_BASE_LO, lower_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_BASE_HI, upper_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_SIZE, ring->ring_size / 4);
+
+	ring = &adev->vce.ring[1];
+	WREG32(mmVCE_RB_RPTR2, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_WPTR2, lower_32_bits(ring->wptr));
+	WREG32(mmVCE_RB_BASE_LO2, lower_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_BASE_HI2, upper_32_bits(ring->gpu_addr));
+	WREG32(mmVCE_RB_SIZE2, ring->ring_size / 4);
+
+	WREG32_P(mmVCE_VCPU_CNTL, VCE_VCPU_CNTL__CLK_EN_MASK,
+		 ~VCE_VCPU_CNTL__CLK_EN_MASK);
+
+	WREG32_P(mmVCE_SOFT_RESET,
+		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	mdelay(100);
+
+	WREG32_P(mmVCE_SOFT_RESET, 0,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	r = vce_v1_0_firmware_loaded(adev);
+
+	/* Clear VCE_STATUS, otherwise SRBM thinks VCE1 is busy. */
+	WREG32(mmVCE_STATUS, 0);
+
+	if (r) {
+		dev_err(adev->dev, "VCE not responding, giving up\n");
+		return r;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_stop(struct amdgpu_device *adev)
+{
+	struct amdgpu_ip_block *ip_block;
+	int status;
+	int i;
+
+	ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCE);
+	if (!ip_block)
+		return -EINVAL;
+
+	if (vce_v1_0_lmi_clean(adev))
+		dev_warn(adev->dev, "VCE not idle\n");
+
+	if (vce_v1_0_wait_for_idle(ip_block))
+		dev_warn(adev->dev, "VCE busy: VCE_STATUS=0x%x, SRBM_STATUS2=0x%x\n",
+			RREG32(mmVCE_STATUS), RREG32(mmSRBM_STATUS2));
+
+	/* Stall UMC and register bus before resetting VCPU */
+	WREG32_P(mmVCE_LMI_CTRL2, 1 << 8, ~(1 << 8));
+
+	for (i = 0; i < 100; ++i) {
+		status = RREG32(mmVCE_LMI_STATUS);
+		if (status & 0x240)
+			break;
+		mdelay(1);
+	}
+
+	WREG32_P(mmVCE_VCPU_CNTL, 0, ~VCE_VCPU_CNTL__CLK_EN_MASK);
+
+	WREG32_P(mmVCE_SOFT_RESET,
+		VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		VCE_SOFT_RESET__FME_SOFT_RESET_MASK,
+		~(VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK |
+		  VCE_SOFT_RESET__FME_SOFT_RESET_MASK));
+
+	WREG32(mmVCE_STATUS, 0);
+
+	return 0;
+}
+
+static void vce_v1_0_enable_mgcg(struct amdgpu_device *adev, bool enable)
+{
+	u32 tmp;
+
+	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_VCE_MGCG)) {
+		tmp = RREG32(mmVCE_CLOCK_GATING_A);
+		tmp |= VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+		WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+		tmp &= ~0x1ff000;
+		tmp |= 0xff800000;
+		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+		tmp &= ~0x3ff;
+		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+	} else {
+		tmp = RREG32(mmVCE_CLOCK_GATING_A);
+		tmp &= ~VCE_CLOCK_GATING_A__CGC_DYN_CLOCK_MODE_MASK;
+		WREG32(mmVCE_CLOCK_GATING_A, tmp);
+
+		tmp = RREG32(mmVCE_UENC_CLOCK_GATING);
+		tmp |= 0x1ff000;
+		tmp &= ~0xff800000;
+		WREG32(mmVCE_UENC_CLOCK_GATING, tmp);
+
+		tmp = RREG32(mmVCE_UENC_REG_CLOCK_GATING);
+		tmp |= 0x3ff;
+		WREG32(mmVCE_UENC_REG_CLOCK_GATING, tmp);
+	}
+}
+
+static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_early_init(adev);
+	if (r)
+		return r;
+
+	adev->vce.num_rings = 2;
+
+	vce_v1_0_set_ring_funcs(adev);
+	vce_v1_0_set_irq_funcs(adev);
+
+	return 0;
+}
+
+static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	struct amdgpu_ring *ring;
+	int r, i;
+
+	r = amdgpu_irq_add_id(adev, AMDGPU_IRQ_CLIENTID_LEGACY, 167, &adev->vce.irq);
+	if (r)
+		return r;
+
+	r = amdgpu_vce_sw_init(adev, VCE_V1_0_FW_SIZE +
+		VCE_V1_0_STACK_SIZE + VCE_V1_0_DATA_SIZE);
+	if (r)
+		return r;
+
+	r = amdgpu_vce_resume(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		enum amdgpu_ring_priority_level hw_prio = amdgpu_vce_get_ring_prio(i);
+
+		ring = &adev->vce.ring[i];
+		sprintf(ring->name, "vce%d", i);
+		r = amdgpu_ring_init(adev, ring, 512, &adev->vce.irq, 0,
+				     hw_prio, NULL);
+		if (r)
+			return r;
+	}
+
+	return r;
+}
+
+static int vce_v1_0_sw_fini(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_suspend(adev);
+	if (r)
+		return r;
+
+	return amdgpu_vce_sw_fini(adev);
+}
+
+/**
+ * vce_v1_0_hw_init - start and test VCE block
+ *
+ * @ip_block: Pointer to the amdgpu_ip_block for this hw instance.
+ *
+ * Initialize the hardware, boot up the VCPU and do some testing
+ */
+static int vce_v1_0_hw_init(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int i, r;
+
+	if (adev->pm.dpm_enabled)
+		amdgpu_dpm_enable_vce(adev, true);
+	else
+		amdgpu_asic_set_vce_clocks(adev, 10000, 10000);
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		r = amdgpu_ring_test_helper(&adev->vce.ring[i]);
+		if (r)
+			return r;
+	}
+
+	dev_info(adev->dev, "VCE initialized successfully.\n");
+
+	return 0;
+}
+
+static int vce_v1_0_hw_fini(struct amdgpu_ip_block *ip_block)
+{
+	int r;
+
+	r = vce_v1_0_stop(ip_block->adev);
+	if (r)
+		return r;
+
+	cancel_delayed_work_sync(&ip_block->adev->vce.idle_work);
+	return 0;
+}
+
+static int vce_v1_0_suspend(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	/*
+	 * Proper cleanups before halting the HW engine:
+	 *   - cancel the delayed idle work
+	 *   - enable powergating
+	 *   - enable clockgating
+	 *   - disable dpm
+	 *
+	 * TODO: to align with the VCN implementation, move the
+	 * jobs for clockgating/powergating/dpm setting to
+	 * ->set_powergating_state().
+	 */
+	cancel_delayed_work_sync(&adev->vce.idle_work);
+
+	if (adev->pm.dpm_enabled) {
+		amdgpu_dpm_enable_vce(adev, false);
+	} else {
+		amdgpu_asic_set_vce_clocks(adev, 0, 0);
+		amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
+						       AMD_PG_STATE_GATE);
+		amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_VCE,
+						       AMD_CG_STATE_GATE);
+	}
+
+	r = vce_v1_0_hw_fini(ip_block);
+	if (r) {
+		dev_err(adev->dev, "vce_v1_0_hw_fini() failed with error %i", r);
+		return r;
+	}
+
+	return amdgpu_vce_suspend(adev);
+}
+
+static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+	int r;
+
+	r = amdgpu_vce_resume(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+
+	return vce_v1_0_hw_init(ip_block);
+}
+
+static int vce_v1_0_set_interrupt_state(struct amdgpu_device *adev,
+					struct amdgpu_irq_src *source,
+					unsigned int type,
+					enum amdgpu_interrupt_state state)
+{
+	uint32_t val = 0;
+
+	if (state == AMDGPU_IRQ_STATE_ENABLE)
+		val |= VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK;
+
+	WREG32_P(mmVCE_SYS_INT_EN, val,
+		 ~VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK);
+	return 0;
+}
+
+static int vce_v1_0_process_interrupt(struct amdgpu_device *adev,
+				      struct amdgpu_irq_src *source,
+				      struct amdgpu_iv_entry *entry)
+{
+	dev_dbg(adev->dev, "IH: VCE\n");
+	switch (entry->src_data[0]) {
+	case 0:
+	case 1:
+		amdgpu_fence_process(&adev->vce.ring[entry->src_data[0]]);
+		break;
+	default:
+		dev_err(adev->dev, "Unhandled interrupt: %d %d\n",
+			  entry->src_id, entry->src_data[0]);
+		break;
+	}
+
+	return 0;
+}
+
+static int vce_v1_0_set_clockgating_state(struct amdgpu_ip_block *ip_block,
+					  enum amd_clockgating_state state)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+
+	vce_v1_0_init_cg(adev);
+	vce_v1_0_enable_mgcg(adev, state == AMD_CG_STATE_GATE);
+
+	return 0;
+}
+
+static int vce_v1_0_set_powergating_state(struct amdgpu_ip_block *ip_block,
+					  enum amd_powergating_state state)
+{
+	struct amdgpu_device *adev = ip_block->adev;
+
+	/*
+	 * This doesn't actually powergate the VCE block.
+	 * That's done in the dpm code via the SMC.  This
+	 * just re-inits the block as necessary.  The actual
+	 * gating still happens in the dpm code.  We should
+	 * revisit this when there is a cleaner line between
+	 * the smc and the hw blocks
+	 */
+	if (state == AMD_PG_STATE_GATE)
+		return vce_v1_0_stop(adev);
+	else
+		return vce_v1_0_start(adev);
+}
+
+static const struct amd_ip_funcs vce_v1_0_ip_funcs = {
+	.name = "vce_v1_0",
+	.early_init = vce_v1_0_early_init,
+	.sw_init = vce_v1_0_sw_init,
+	.sw_fini = vce_v1_0_sw_fini,
+	.hw_init = vce_v1_0_hw_init,
+	.hw_fini = vce_v1_0_hw_fini,
+	.suspend = vce_v1_0_suspend,
+	.resume = vce_v1_0_resume,
+	.is_idle = vce_v1_0_is_idle,
+	.wait_for_idle = vce_v1_0_wait_for_idle,
+	.set_clockgating_state = vce_v1_0_set_clockgating_state,
+	.set_powergating_state = vce_v1_0_set_powergating_state,
+};
+
+static const struct amdgpu_ring_funcs vce_v1_0_ring_funcs = {
+	.type = AMDGPU_RING_TYPE_VCE,
+	.align_mask = 0xf,
+	.nop = VCE_CMD_NO_OP,
+	.support_64bit_ptrs = false,
+	.no_user_fence = true,
+	.get_rptr = vce_v1_0_ring_get_rptr,
+	.get_wptr = vce_v1_0_ring_get_wptr,
+	.set_wptr = vce_v1_0_ring_set_wptr,
+	.parse_cs = amdgpu_vce_ring_parse_cs,
+	.emit_frame_size = 6, /* amdgpu_vce_ring_emit_fence  x1 no user fence */
+	.emit_ib_size = 4, /* amdgpu_vce_ring_emit_ib */
+	.emit_ib = amdgpu_vce_ring_emit_ib,
+	.emit_fence = amdgpu_vce_ring_emit_fence,
+	.test_ring = amdgpu_vce_ring_test_ring,
+	.test_ib = amdgpu_vce_ring_test_ib,
+	.insert_nop = amdgpu_ring_insert_nop,
+	.pad_ib = amdgpu_ring_generic_pad_ib,
+	.begin_use = amdgpu_vce_ring_begin_use,
+	.end_use = amdgpu_vce_ring_end_use,
+};
+
+static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev)
+{
+	int i;
+
+	for (i = 0; i < adev->vce.num_rings; i++) {
+		adev->vce.ring[i].funcs = &vce_v1_0_ring_funcs;
+		adev->vce.ring[i].me = i;
+	}
+};
+
+static const struct amdgpu_irq_src_funcs vce_v1_0_irq_funcs = {
+	.set = vce_v1_0_set_interrupt_state,
+	.process = vce_v1_0_process_interrupt,
+};
+
+static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev)
+{
+	adev->vce.irq.num_types = 1;
+	adev->vce.irq.funcs = &vce_v1_0_irq_funcs;
+};
+
+const struct amdgpu_ip_block_version vce_v1_0_ip_block = {
+	.type = AMD_IP_BLOCK_TYPE_VCE,
+	.major = 1,
+	.minor = 0,
+	.rev = 0,
+	.funcs = &vce_v1_0_ip_funcs,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
new file mode 100644
index 000000000000..206e7bec897f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ * Copyright 2025 Valve Corporation
+ * Copyright 2025 Alexandre Demers
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __VCE_V1_0_H__
+#define __VCE_V1_0_H__
+
+extern const struct amdgpu_ip_block_version vce_v1_0_ip_block;
+
+#endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3)
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (8 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 09/13] drm/amdgpu/vce1: Implement VCE1 IP block (v2) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-07  9:40   ` Christian König
  2025-11-06 18:44 ` [PATCH 11/13] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Based on research and ideas by Alexandre and Christian.

VCE1 actually executes its code from the VCPU BO.
Due to various hardware limitations, the VCE1 requires
the VCPU BO to be in the low 32 bit address range.
However, VRAM is typically mapped at the high address range,
which means the VCPU can't access VRAM through the FB aperture.

To solve this, we write a few page table entries to
map the VCPU BO in the GART address range. And we make sure
that the GART is located at the low address range.
That way the VCE1 can access the VCPU BO.

v2:
- Adjust to v2 of the GART helper commit.
- Add empty line to multi-line comment.

v3:
- Instead of relying on gmc_v6 to set the GART space before GTT,
  add a new function amdgpu_vce_required_gart_pages() which is
  called from amdgpu_gtt_mgr_init() directly.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     | 18 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h     |  1 +
 drivers/gpu/drm/amd/amdgpu/vce_v1_0.c       | 55 +++++++++++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 0760e70402ec..895c1e4c6747 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -284,6 +284,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
 	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
 
 	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
+	start += amdgpu_vce_required_gart_pages(adev);
 	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
 	drm_mm_init(&mgr->mm, start, size);
 	spin_lock_init(&mgr->lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 2761c096c4cd..e825184f244c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -459,6 +459,24 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
 	}
 }
 
+/**
+ * amdgpu_vce_required_gart_pages() - gets number of GART pages required by VCE
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Returns how many GART pages we need before GTT for the VCE IP block.
+ * For VCE1, see vce_v1_0_ensure_vcpu_bo_32bit_addr for details.
+ * For VCE2+, this is not needed so return zero.
+ */
+u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev)
+{
+	/* VCE IP block not added yet, so can't use amdgpu_ip_version */
+	if (adev->family == AMDGPU_FAMILY_SI)
+		return 512;
+
+	return 0;
+}
+
 /**
  * amdgpu_vce_get_create_msg - generate a VCE create msg
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 050783802623..1c3464ce5037 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -61,6 +61,7 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
 int amdgpu_vce_suspend(struct amdgpu_device *adev);
 int amdgpu_vce_resume(struct amdgpu_device *adev);
 void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp);
+u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev);
 int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
 			     struct amdgpu_ib *ib);
 int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
index bf9f943852cb..9ae424618556 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
@@ -34,6 +34,7 @@
 
 #include "amdgpu.h"
 #include "amdgpu_vce.h"
+#include "amdgpu_gart.h"
 #include "sid.h"
 #include "vce_v1_0.h"
 #include "vce/vce_1_0_d.h"
@@ -46,6 +47,11 @@
 #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
 #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
 
+#define VCE_V1_0_GART_PAGE_START \
+	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
+#define VCE_V1_0_GART_ADDR_START \
+	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
+
 static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
 static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
 
@@ -513,6 +519,49 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
 	return 0;
 }
 
+/**
+ * vce_v1_0_ensure_vcpu_bo_32bit_addr() - ensure the VCPU BO has a 32-bit address
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Due to various hardware limitations, the VCE1 requires
+ * the VCPU BO to be in the low 32 bit address range.
+ * Ensure that the VCPU BO has a 32-bit GPU address,
+ * or return an error code when that isn't possible.
+ *
+ * To accomodate that, we put GART to the LOW address range
+ * and reserve some GART pages where we map the VCPU BO,
+ * so that it gets a 32-bit address.
+ */
+static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
+{
+	u64 gpu_addr = amdgpu_bo_gpu_offset(adev->vce.vcpu_bo);
+	u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
+	u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
+	u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
+	u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
+	u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
+
+	/*
+	 * Check if the VCPU BO already has a 32-bit address.
+	 * Eg. if MC is configured to put VRAM in the low address range.
+	 */
+	if (gpu_addr <= max_vcpu_bo_addr)
+		return 0;
+
+	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
+	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
+		return -EINVAL;
+
+	amdgpu_gart_map_vram_range(adev, pa, VCE_V1_0_GART_PAGE_START,
+				   num_pages, flags, adev->gart.ptr);
+	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
+	if (adev->vce.gpu_addr > max_vcpu_bo_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -532,6 +581,9 @@ static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
 	if (r)
 		return r;
 	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
 	if (r)
 		return r;
 
@@ -647,6 +699,9 @@ static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
 	if (r)
 		return r;
 	r = vce_v1_0_load_fw_signature(adev);
+	if (r)
+		return r;
+	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
 	if (r)
 		return r;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3)
  2025-11-06 18:44 ` [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3) Timur Kristóf
@ 2025-11-07  9:40   ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2025-11-07  9:40 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu

On 11/6/25 19:44, Timur Kristóf wrote:
> Based on research and ideas by Alexandre and Christian.
> 
> VCE1 actually executes its code from the VCPU BO.
> Due to various hardware limitations, the VCE1 requires
> the VCPU BO to be in the low 32 bit address range.
> However, VRAM is typically mapped at the high address range,
> which means the VCPU can't access VRAM through the FB aperture.
> 
> To solve this, we write a few page table entries to
> map the VCPU BO in the GART address range. And we make sure
> that the GART is located at the low address range.
> That way the VCE1 can access the VCPU BO.
> 
> v2:
> - Adjust to v2 of the GART helper commit.
> - Add empty line to multi-line comment.
> 
> v3:
> - Instead of relying on gmc_v6 to set the GART space before GTT,
>   add a new function amdgpu_vce_required_gart_pages() which is
>   called from amdgpu_gtt_mgr_init() directly.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     | 18 +++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h     |  1 +
>  drivers/gpu/drm/amd/amdgpu/vce_v1_0.c       | 55 +++++++++++++++++++++
>  4 files changed, 75 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 0760e70402ec..895c1e4c6747 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -284,6 +284,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
>  	ttm_resource_manager_init(man, &adev->mman.bdev, gtt_size);
>  
>  	start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
> +	start += amdgpu_vce_required_gart_pages(adev);
>  	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
>  	drm_mm_init(&mgr->mm, start, size);
>  	spin_lock_init(&mgr->lock);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index 2761c096c4cd..e825184f244c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -459,6 +459,24 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
>  	}
>  }
>  
> +/**
> + * amdgpu_vce_required_gart_pages() - gets number of GART pages required by VCE
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Returns how many GART pages we need before GTT for the VCE IP block.
> + * For VCE1, see vce_v1_0_ensure_vcpu_bo_32bit_addr for details.
> + * For VCE2+, this is not needed so return zero.
> + */
> +u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev)
> +{
> +	/* VCE IP block not added yet, so can't use amdgpu_ip_version */
> +	if (adev->family == AMDGPU_FAMILY_SI)
> +		return 512;
> +
> +	return 0;
> +}
> +
>  /**
>   * amdgpu_vce_get_create_msg - generate a VCE create msg
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> index 050783802623..1c3464ce5037 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
> @@ -61,6 +61,7 @@ int amdgpu_vce_entity_init(struct amdgpu_device *adev, struct amdgpu_ring *ring)
>  int amdgpu_vce_suspend(struct amdgpu_device *adev);
>  int amdgpu_vce_resume(struct amdgpu_device *adev);
>  void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file *filp);
> +u32 amdgpu_vce_required_gart_pages(struct amdgpu_device *adev);
>  int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
>  			     struct amdgpu_ib *ib);
>  int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser *p,
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> index bf9f943852cb..9ae424618556 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v1_0.c
> @@ -34,6 +34,7 @@
>  
>  #include "amdgpu.h"
>  #include "amdgpu_vce.h"
> +#include "amdgpu_gart.h"
>  #include "sid.h"
>  #include "vce_v1_0.h"
>  #include "vce/vce_1_0_d.h"
> @@ -46,6 +47,11 @@
>  #define VCE_V1_0_DATA_SIZE	(7808 * (AMDGPU_MAX_VCE_HANDLES + 1))
>  #define VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK	0x02
>  
> +#define VCE_V1_0_GART_PAGE_START \
> +	(AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS)
> +#define VCE_V1_0_GART_ADDR_START \
> +	(VCE_V1_0_GART_PAGE_START * AMDGPU_GPU_PAGE_SIZE)
> +
>  static void vce_v1_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void vce_v1_0_set_irq_funcs(struct amdgpu_device *adev);
>  
> @@ -513,6 +519,49 @@ static int vce_v1_0_early_init(struct amdgpu_ip_block *ip_block)
>  	return 0;
>  }
>  
> +/**
> + * vce_v1_0_ensure_vcpu_bo_32bit_addr() - ensure the VCPU BO has a 32-bit address
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Due to various hardware limitations, the VCE1 requires
> + * the VCPU BO to be in the low 32 bit address range.
> + * Ensure that the VCPU BO has a 32-bit GPU address,
> + * or return an error code when that isn't possible.
> + *
> + * To accomodate that, we put GART to the LOW address range
> + * and reserve some GART pages where we map the VCPU BO,
> + * so that it gets a 32-bit address.
> + */
> +static int vce_v1_0_ensure_vcpu_bo_32bit_addr(struct amdgpu_device *adev)
> +{
> +	u64 gpu_addr = amdgpu_bo_gpu_offset(adev->vce.vcpu_bo);
> +	u64 bo_size = amdgpu_bo_size(adev->vce.vcpu_bo);
> +	u64 max_vcpu_bo_addr = 0xffffffff - bo_size;
> +	u64 num_pages = ALIGN(bo_size, AMDGPU_GPU_PAGE_SIZE) / AMDGPU_GPU_PAGE_SIZE;
> +	u64 pa = amdgpu_gmc_vram_pa(adev, adev->vce.vcpu_bo);
> +	u64 flags = AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE | AMDGPU_PTE_VALID;
> +
> +	/*
> +	 * Check if the VCPU BO already has a 32-bit address.
> +	 * Eg. if MC is configured to put VRAM in the low address range.
> +	 */
> +	if (gpu_addr <= max_vcpu_bo_addr)
> +		return 0;
> +
> +	/* Check if we can map the VCPU BO in GART to a 32-bit address. */
> +	if (adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START > max_vcpu_bo_addr)
> +		return -EINVAL;
> +
> +	amdgpu_gart_map_vram_range(adev, pa, VCE_V1_0_GART_PAGE_START,
> +				   num_pages, flags, adev->gart.ptr);
> +	adev->vce.gpu_addr = adev->gmc.gart_start + VCE_V1_0_GART_ADDR_START;
> +	if (adev->vce.gpu_addr > max_vcpu_bo_addr)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -532,6 +581,9 @@ static int vce_v1_0_sw_init(struct amdgpu_ip_block *ip_block)
>  	if (r)
>  		return r;
>  	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
>  	if (r)
>  		return r;
>  
> @@ -647,6 +699,9 @@ static int vce_v1_0_resume(struct amdgpu_ip_block *ip_block)
>  	if (r)
>  		return r;
>  	r = vce_v1_0_load_fw_signature(adev);
> +	if (r)
> +		return r;
> +	r = vce_v1_0_ensure_vcpu_bo_32bit_addr(adev);
>  	if (r)
>  		return r;
>  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 11/13] drm/amd/pm/si: Hook up VCE1 to SI DPM
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (9 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3) Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 12/13] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
  2025-11-06 18:44 ` [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000 Timur Kristóf
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

On SI GPUs, the SMC needs to be aware of whether or not the VCE1
is used. The VCE1 is enabled/disabled through the DPM code.

Also print VCE clocks in amdgpu_pm_info.
Users can inspect the current power state using:
cat /sys/kernel/debug/dri/<card>/amdgpu_pm_info

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index 020e05c137e4..1f539cc65f41 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7046,13 +7046,20 @@ static void si_set_vce_clock(struct amdgpu_device *adev,
 	if ((old_rps->evclk != new_rps->evclk) ||
 	    (old_rps->ecclk != new_rps->ecclk)) {
 		/* Turn the clocks on when encoding, off otherwise */
+		dev_dbg(adev->dev, "set VCE clocks: %u, %u\n", new_rps->evclk, new_rps->ecclk);
+
 		if (new_rps->evclk || new_rps->ecclk) {
-			/* Place holder for future VCE1.0 porting to amdgpu
-			vce_v1_0_enable_mgcg(adev, false, false);*/
+			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);
+			amdgpu_device_ip_set_clockgating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_UNGATE);
+			amdgpu_device_ip_set_powergating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_UNGATE);
 		} else {
-			/* Place holder for future VCE1.0 porting to amdgpu
-			vce_v1_0_enable_mgcg(adev, true, false);
-			amdgpu_asic_set_vce_clocks(adev, new_rps->evclk, new_rps->ecclk);*/
+			amdgpu_device_ip_set_powergating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_PG_STATE_GATE);
+			amdgpu_device_ip_set_clockgating_state(
+				adev, AMD_IP_BLOCK_TYPE_VCE, AMD_CG_STATE_GATE);
+			amdgpu_asic_set_vce_clocks(adev, 0, 0);
 		}
 	}
 }
@@ -7574,6 +7581,7 @@ static void si_dpm_debugfs_print_current_performance_level(void *handle,
 	} else {
 		pl = &ps->performance_levels[current_index];
 		seq_printf(m, "uvd    vclk: %d dclk: %d\n", rps->vclk, rps->dclk);
+		seq_printf(m, "vce    evclk: %d ecclk: %d\n", rps->evclk, rps->ecclk);
 		seq_printf(m, "power level %d    sclk: %u mclk: %u vddc: %u vddci: %u pcie gen: %u\n",
 			   current_index, pl->sclk, pl->mclk, pl->vddc, pl->vddci, pl->pcie_gen + 1);
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/13] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (10 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 11/13] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-06 18:44 ` [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000 Timur Kristóf
  12 siblings, 0 replies; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Add the VCE1 IP block to the SI GPUs that have it.
Advertise the encoder capabilities corresponding to VCE1,
so the userspace applications can detect and use it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Co-developed-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/si.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index e0f139de7991..9d769222784c 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -45,6 +45,7 @@
 #include "dce_v6_0.h"
 #include "si.h"
 #include "uvd_v3_1.h"
+#include "vce_v1_0.h"
 
 #include "uvd/uvd_4_0_d.h"
 
@@ -921,8 +922,6 @@ static const u32 hainan_mgcg_cgcg_init[] =
 	0x3630, 0xfffffff0, 0x00000100,
 };
 
-/* XXX: update when we support VCE */
-#if 0
 /* tahiti, pitcairn, verde */
 static const struct amdgpu_video_codec_info tahiti_video_codecs_encode_array[] =
 {
@@ -940,13 +939,7 @@ static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
 	.codec_count = ARRAY_SIZE(tahiti_video_codecs_encode_array),
 	.codec_array = tahiti_video_codecs_encode_array,
 };
-#else
-static const struct amdgpu_video_codecs tahiti_video_codecs_encode =
-{
-	.codec_count = 0,
-	.codec_array = NULL,
-};
-#endif
+
 /* oland and hainan don't support encode */
 static const struct amdgpu_video_codecs hainan_video_codecs_encode =
 {
@@ -2717,7 +2710,7 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
 		else
 			amdgpu_device_ip_block_add(adev, &dce_v6_0_ip_block);
 		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
-		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
+		amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block);
 		break;
 	case CHIP_OLAND:
 		amdgpu_device_ip_block_add(adev, &si_common_ip_block);
@@ -2735,7 +2728,6 @@ int si_set_ip_blocks(struct amdgpu_device *adev)
 		else
 			amdgpu_device_ip_block_add(adev, &dce_v6_4_ip_block);
 		amdgpu_device_ip_block_add(adev, &uvd_v3_1_ip_block);
-		/* amdgpu_device_ip_block_add(adev, &vce_v1_0_ip_block); */
 		break;
 	case CHIP_HAINAN:
 		amdgpu_device_ip_block_add(adev, &si_common_ip_block);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000
  2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
                   ` (11 preceding siblings ...)
  2025-11-06 18:44 ` [PATCH 12/13] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
@ 2025-11-06 18:44 ` Timur Kristóf
  2025-11-07  9:41   ` Christian König
  12 siblings, 1 reply; 27+ messages in thread
From: Timur Kristóf @ 2025-11-06 18:44 UTC (permalink / raw)
  To: amd-gfx, Alex Deucher, Christian König, Alexandre Demers,
	Timur Kristóf, Rodrigo Siqueira, Leo Liu

Sometimes the VCE PLL times out waiting for CTLACK/CTLACK2.
When it happens, the VCE still works, but much slower.
Observed on a Tahiti GPU, but not all:
- FirePro W9000 has the issue
- Radeon R9 280X not affected
- Radeon HD 7990 not affected

As a workaround, on the affected chip just don't put the
VCE PLL in sleep mode. Leaving the VCE PLL in bypass mode
or reset mode both work. Using bypass mode is simpler.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/si.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 9d769222784c..f7288372ee61 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -1918,6 +1918,14 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
 		     ~VCEPLL_BYPASS_EN_MASK);
 
 	if (!evclk || !ecclk) {
+		/*
+		 * On some chips, the PLL takes way too long to get out of
+		 * sleep mode, causing a timeout waiting on CTLACK/CTLACK2.
+		 * Leave the PLL running in bypass mode.
+		 */
+		if (adev->pdev->device == 0x6780)
+			return 0;
+
 		/* Keep the Bypass mode, put PLL to sleep */
 		WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, VCEPLL_SLEEP_MASK,
 			     ~VCEPLL_SLEEP_MASK);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000
  2025-11-06 18:44 ` [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000 Timur Kristóf
@ 2025-11-07  9:41   ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2025-11-07  9:41 UTC (permalink / raw)
  To: Timur Kristóf, amd-gfx, Alex Deucher, Alexandre Demers,
	Rodrigo Siqueira, Leo Liu

On 11/6/25 19:44, Timur Kristóf wrote:
> Sometimes the VCE PLL times out waiting for CTLACK/CTLACK2.
> When it happens, the VCE still works, but much slower.
> Observed on a Tahiti GPU, but not all:
> - FirePro W9000 has the issue
> - Radeon R9 280X not affected
> - Radeon HD 7990 not affected
> 
> As a workaround, on the affected chip just don't put the
> VCE PLL in sleep mode. Leaving the VCE PLL in bypass mode
> or reset mode both work. Using bypass mode is simpler.
> 
> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/si.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
> index 9d769222784c..f7288372ee61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si.c
> @@ -1918,6 +1918,14 @@ static int si_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
>  		     ~VCEPLL_BYPASS_EN_MASK);
>  
>  	if (!evclk || !ecclk) {
> +		/*
> +		 * On some chips, the PLL takes way too long to get out of
> +		 * sleep mode, causing a timeout waiting on CTLACK/CTLACK2.
> +		 * Leave the PLL running in bypass mode.
> +		 */
> +		if (adev->pdev->device == 0x6780)
> +			return 0;
> +
>  		/* Keep the Bypass mode, put PLL to sleep */
>  		WREG32_SMC_P(CG_VCEPLL_FUNC_CNTL, VCEPLL_SLEEP_MASK,
>  			     ~VCEPLL_SLEEP_MASK);


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-11-07 13:45 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 18:44 [PATCH 00/13] drm/amdgpu: Support VCE1 IP block (v3) Timur Kristóf
2025-11-06 18:44 ` [PATCH 01/13] drm/amdgpu/gmc6: Place gart at low address range Timur Kristóf
2025-11-06 18:44 ` [PATCH 02/13] drm/amdgpu/gart: Add helper to bind VRAM pages (v2) Timur Kristóf
2025-11-06 18:44 ` [PATCH 03/13] drm/amdgpu/ttm: Use GART helper to map " Timur Kristóf
2025-11-06 18:44 ` [PATCH 04/13] drm/amdgpu/vce: Clear VCPU BO before copying firmware to it (v2) Timur Kristóf
2025-11-07  9:25   ` Christian König
2025-11-07  9:39     ` Timur Kristóf
2025-11-06 18:44 ` [PATCH 05/13] drm/amdgpu/vce: Move firmware load to amdgpu_vce_early_init Timur Kristóf
2025-11-06 18:44 ` [PATCH 06/13] drm/amdgpu/vce: Save/restore and pin VCPU BO for all VCE (v2) Timur Kristóf
2025-11-07  9:49   ` Christian König
2025-11-07  9:53     ` Timur Kristóf
2025-11-07 10:01       ` Christian König
2025-11-07 10:47         ` Timur Kristóf
2025-11-07 13:14           ` Christian König
2025-11-07 13:31             ` Timur Kristóf
2025-11-07 13:33               ` Christian König
2025-11-07 13:39                 ` Timur Kristóf
2025-11-07 13:45                   ` Christian König
2025-11-06 18:44 ` [PATCH 07/13] drm/amdgpu/vce1: Clean up register definitions Timur Kristóf
2025-11-06 18:44 ` [PATCH 08/13] drm/amdgpu/vce1: Load VCE1 firmware Timur Kristóf
2025-11-06 18:44 ` [PATCH 09/13] drm/amdgpu/vce1: Implement VCE1 IP block (v2) Timur Kristóf
2025-11-06 18:44 ` [PATCH 10/13] drm/amdgpu/vce1: Ensure VCPU BO is in lower 32-bit address space (v3) Timur Kristóf
2025-11-07  9:40   ` Christian König
2025-11-06 18:44 ` [PATCH 11/13] drm/amd/pm/si: Hook up VCE1 to SI DPM Timur Kristóf
2025-11-06 18:44 ` [PATCH 12/13] drm/amdgpu/vce1: Enable VCE1 on Tahiti, Pitcairn, Cape Verde GPUs Timur Kristóf
2025-11-06 18:44 ` [PATCH 13/13] drm/amdgpu/vce1: Workaround PLL timeout on FirePro W9000 Timur Kristóf
2025-11-07  9:41   ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).