All of lore.kernel.org
 help / color / mirror / Atom feed
* [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp
@ 2025-02-07  6:28 Jiang Liu
  2025-02-07  6:28 ` [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Jiang Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Jiang Liu @ 2025-02-07  6:28 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

Fix some bugs in error handling path in psp subsystem:
1) fix possible bugs in error handling path in psp_sw_init()
2) fix a bug in error handling path in psp_init_cap_microcode()
3) reduce duplicated code related to psp_ta_init_shared_buf()

Jiang Liu (4):
  drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
  drm/amdgpu: enhance error handling of psp_sw_init()
  drm/amdgpu: bail out when failed to load fw in
    psp_init_cap_microcode()
  drm/amdgpu: simplify invoke of psp_ta_init_shared_buf()

 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 147 +++++++++++++-----------
 1 file changed, 81 insertions(+), 66 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
  2025-02-07  6:28 [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp Jiang Liu
@ 2025-02-07  6:28 ` Jiang Liu
  2025-02-07 10:38   ` Lazar, Lijo
  2025-02-07  6:28 ` [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init() Jiang Liu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Jiang Liu @ 2025-02-07  6:28 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index babe94ade247..4e9766a1d421 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -533,7 +533,6 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
 	struct psp_context *psp = &adev->psp;
-	struct psp_gfx_cmd_resp *cmd = psp->cmd;
 
 	psp_memory_training_fini(psp);
 
@@ -543,8 +542,8 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
 	amdgpu_ucode_release(&psp->cap_fw);
 	amdgpu_ucode_release(&psp->toc_fw);
 
-	kfree(cmd);
-	cmd = NULL;
+	kfree(psp->cmd);
+	psp->cmd = NULL;
 
 	psp_free_shared_bufs(psp);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init()
  2025-02-07  6:28 [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp Jiang Liu
  2025-02-07  6:28 ` [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Jiang Liu
@ 2025-02-07  6:28 ` Jiang Liu
  2025-02-07 10:50   ` Lazar, Lijo
  2025-02-07  6:28 ` [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() Jiang Liu
  2025-02-07  6:28 ` [v1 4/4] drm/amdgpu: simplify invoke of psp_ta_init_shared_buf() Jiang Liu
  3 siblings, 1 reply; 10+ messages in thread
From: Jiang Liu @ 2025-02-07  6:28 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

Enhance error handling in function psp_sw_init() by:
1) bail out when failed to allocate memory
2) release allocated resource on error
3) introduce helper function psp_bo_init()

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 84 ++++++++++++++++---------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 4e9766a1d421..0d1eb7b8e59b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -423,6 +423,50 @@ static bool psp_get_runtime_db_entry(struct amdgpu_device *adev,
 	return ret;
 }
 
+static int psp_bo_init(struct amdgpu_device *adev, struct psp_context *psp)
+{
+	int ret;
+
+	ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
+				      (amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ?
+				      AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
+				      &psp->fw_pri_bo,
+				      &psp->fw_pri_mc_addr,
+				      &psp->fw_pri_buf);
+	if (ret)
+		goto failed1;
+
+	ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
+				      AMDGPU_GEM_DOMAIN_VRAM |
+				      AMDGPU_GEM_DOMAIN_GTT,
+				      &psp->fence_buf_bo,
+				      &psp->fence_buf_mc_addr,
+				      &psp->fence_buf);
+	if (ret)
+		goto failed2;
+
+	ret = amdgpu_bo_create_kernel(adev, PSP_CMD_BUFFER_SIZE, PAGE_SIZE,
+				      AMDGPU_GEM_DOMAIN_VRAM |
+				      AMDGPU_GEM_DOMAIN_GTT,
+				      &psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
+				      (void **)&psp->cmd_buf_mem);
+	if (ret)
+		goto failed3;
+
+	return 0;
+
+failed3:
+	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
+			      &psp->fence_buf_mc_addr, &psp->fence_buf);
+	psp->fence_buf_bo = NULL;
+failed2:
+	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
+			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
+	psp->fw_pri_bo = NULL;
+failed1:
+	return ret;
+}
+
 static int psp_sw_init(struct amdgpu_ip_block *ip_block)
 {
 	struct amdgpu_device *adev = ip_block->adev;
@@ -435,7 +479,7 @@ static int psp_sw_init(struct amdgpu_ip_block *ip_block)
 	psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
 	if (!psp->cmd) {
 		dev_err(adev->dev, "Failed to allocate memory to command buffer!\n");
-		ret = -ENOMEM;
+		return -ENOMEM;
 	}
 
 	adev->psp.xgmi_context.supports_extended_data =
@@ -482,50 +526,27 @@ static int psp_sw_init(struct amdgpu_ip_block *ip_block)
 		ret = psp_memory_training_init(psp);
 		if (ret) {
 			dev_err(adev->dev, "Failed to initialize memory training!\n");
-			return ret;
+			goto failed1;
 		}
 
 		ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
 		if (ret) {
 			dev_err(adev->dev, "Failed to process memory training!\n");
-			return ret;
+			goto failed2;
 		}
 	}
 
-	ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
-				      (amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ?
-				      AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
-				      &psp->fw_pri_bo,
-				      &psp->fw_pri_mc_addr,
-				      &psp->fw_pri_buf);
-	if (ret)
-		return ret;
-
-	ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_VRAM |
-				      AMDGPU_GEM_DOMAIN_GTT,
-				      &psp->fence_buf_bo,
-				      &psp->fence_buf_mc_addr,
-				      &psp->fence_buf);
-	if (ret)
-		goto failed1;
-
-	ret = amdgpu_bo_create_kernel(adev, PSP_CMD_BUFFER_SIZE, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_VRAM |
-				      AMDGPU_GEM_DOMAIN_GTT,
-				      &psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
-				      (void **)&psp->cmd_buf_mem);
+	ret = psp_bo_init(adev, psp);
 	if (ret)
 		goto failed2;
 
 	return 0;
 
 failed2:
-	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
-			      &psp->fence_buf_mc_addr, &psp->fence_buf);
+	psp_memory_training_fini(psp);
 failed1:
-	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
-			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
+	kfree(psp->cmd);
+	psp->cmd = NULL;
 	return ret;
 }
 
@@ -554,10 +575,13 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
 
 	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
 			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
+	psp->fw_pri_bo = NULL;
 	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
 			      &psp->fence_buf_mc_addr, &psp->fence_buf);
+	psp->fence_buf_bo = NULL;
 	amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
 			      (void **)&psp->cmd_buf_mem);
+	psp->cmd_buf_bo = NULL;
 
 	return 0;
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()
  2025-02-07  6:28 [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp Jiang Liu
  2025-02-07  6:28 ` [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Jiang Liu
  2025-02-07  6:28 ` [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init() Jiang Liu
@ 2025-02-07  6:28 ` Jiang Liu
  2025-02-07 11:05   ` Lazar, Lijo
  2025-02-07  6:28 ` [v1 4/4] drm/amdgpu: simplify invoke of psp_ta_init_shared_buf() Jiang Liu
  3 siblings, 1 reply; 10+ messages in thread
From: Jiang Liu @ 2025-02-07  6:28 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

In function psp_init_cap_microcode(), it should bail out when failed to
load firmware, otherwise it may cause invalid memory access.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 0d1eb7b8e59b..952da6c7943d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -3838,9 +3838,10 @@ int psp_init_cap_microcode(struct psp_context *psp, const char *chip_name)
 		if (err == -ENODEV) {
 			dev_warn(adev->dev, "cap microcode does not exist, skip\n");
 			err = 0;
-			goto out;
+		} else {
+			dev_err(adev->dev, "fail to initialize cap microcode\n");
 		}
-		dev_err(adev->dev, "fail to initialize cap microcode\n");
+		goto out;
 	}
 
 	info = &adev->firmware.ucode[AMDGPU_UCODE_ID_CAP];
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v1 4/4] drm/amdgpu: simplify invoke of psp_ta_init_shared_buf()
  2025-02-07  6:28 [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp Jiang Liu
                   ` (2 preceding siblings ...)
  2025-02-07  6:28 ` [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() Jiang Liu
@ 2025-02-07  6:28 ` Jiang Liu
  3 siblings, 0 replies; 10+ messages in thread
From: Jiang Liu @ 2025-02-07  6:28 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

Enhance psp_ta_init_shared_buf() to check whether the shared buffer has
already been allocated, and return success if it's allocated. So caller
doesn't need to check the initialized flag.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 ++++++++++---------------
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 952da6c7943d..407f4a3bb3e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1232,6 +1232,10 @@ static void psp_prep_ta_load_cmd_buf(struct psp_gfx_cmd_resp *cmd,
 int psp_ta_init_shared_buf(struct psp_context *psp,
 				  struct ta_mem_context *mem_ctx)
 {
+	if (mem_ctx->shared_bo) {
+		return 0;
+	}
+
 	/*
 	 * Allocate 16k memory aligned to 4k from Frame Buffer (local
 	 * physical) for ta to host memory
@@ -1339,11 +1343,9 @@ int psp_xgmi_initialize(struct psp_context *psp, bool set_extended_data, bool lo
 	psp->xgmi_context.context.mem_context.shared_mem_size = PSP_XGMI_SHARED_MEM_SIZE;
 	psp->xgmi_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->xgmi_context.context.mem_context.shared_buf) {
-		ret = psp_ta_init_shared_buf(psp, &psp->xgmi_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->xgmi_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	/* Load XGMI TA */
 	ret = psp_ta_load(psp, &psp->xgmi_context.context);
@@ -1844,11 +1846,9 @@ int psp_ras_initialize(struct psp_context *psp)
 	psp->ras_context.context.mem_context.shared_mem_size = PSP_RAS_SHARED_MEM_SIZE;
 	psp->ras_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->ras_context.context.mem_context.shared_buf) {
-		ret = psp_ta_init_shared_buf(psp, &psp->ras_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->ras_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	ras_cmd = (struct ta_ras_shared_memory *)psp->ras_context.context.mem_context.shared_buf;
 	memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory));
@@ -1972,11 +1972,9 @@ static int psp_hdcp_initialize(struct psp_context *psp)
 	psp->hdcp_context.context.mem_context.shared_mem_size = PSP_HDCP_SHARED_MEM_SIZE;
 	psp->hdcp_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->hdcp_context.context.mem_context.shared_buf) {
-		ret = psp_ta_init_shared_buf(psp, &psp->hdcp_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->hdcp_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	ret = psp_ta_load(psp, &psp->hdcp_context.context);
 	if (!ret) {
@@ -2046,11 +2044,9 @@ static int psp_dtm_initialize(struct psp_context *psp)
 	psp->dtm_context.context.mem_context.shared_mem_size = PSP_DTM_SHARED_MEM_SIZE;
 	psp->dtm_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->dtm_context.context.mem_context.shared_buf) {
-		ret = psp_ta_init_shared_buf(psp, &psp->dtm_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->dtm_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	ret = psp_ta_load(psp, &psp->dtm_context.context);
 	if (!ret) {
@@ -2117,11 +2113,9 @@ static int psp_rap_initialize(struct psp_context *psp)
 	psp->rap_context.context.mem_context.shared_mem_size = PSP_RAP_SHARED_MEM_SIZE;
 	psp->rap_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->rap_context.context.mem_context.shared_buf) {
-		ret = psp_ta_init_shared_buf(psp, &psp->rap_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->rap_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	ret = psp_ta_load(psp, &psp->rap_context.context);
 	if (!ret) {
@@ -2220,12 +2214,9 @@ static int psp_securedisplay_initialize(struct psp_context *psp)
 		PSP_SECUREDISPLAY_SHARED_MEM_SIZE;
 	psp->securedisplay_context.context.ta_load_type = GFX_CMD_ID_LOAD_TA;
 
-	if (!psp->securedisplay_context.context.initialized) {
-		ret = psp_ta_init_shared_buf(psp,
-					     &psp->securedisplay_context.context.mem_context);
-		if (ret)
-			return ret;
-	}
+	ret = psp_ta_init_shared_buf(psp, &psp->securedisplay_context.context.mem_context);
+	if (ret)
+		return ret;
 
 	ret = psp_ta_load(psp, &psp->securedisplay_context.context);
 	if (!ret) {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
  2025-02-07  6:28 ` [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Jiang Liu
@ 2025-02-07 10:38   ` Lazar, Lijo
  2025-02-07 18:22     ` Alex Deucher
  0 siblings, 1 reply; 10+ messages in thread
From: Lazar, Lijo @ 2025-02-07 10:38 UTC (permalink / raw)
  To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, simona, sunil.khatri, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx



On 2/7/2025 11:58 AM, Jiang Liu wrote:
> Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini().
> 
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>

	Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Thanks,
Lijo

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index babe94ade247..4e9766a1d421 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -533,7 +533,6 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
>  	struct psp_context *psp = &adev->psp;
> -	struct psp_gfx_cmd_resp *cmd = psp->cmd;
>  
>  	psp_memory_training_fini(psp);
>  
> @@ -543,8 +542,8 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
>  	amdgpu_ucode_release(&psp->cap_fw);
>  	amdgpu_ucode_release(&psp->toc_fw);
>  
> -	kfree(cmd);
> -	cmd = NULL;
> +	kfree(psp->cmd);
> +	psp->cmd = NULL;
>  
>  	psp_free_shared_bufs(psp);
>  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init()
  2025-02-07  6:28 ` [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init() Jiang Liu
@ 2025-02-07 10:50   ` Lazar, Lijo
  0 siblings, 0 replies; 10+ messages in thread
From: Lazar, Lijo @ 2025-02-07 10:50 UTC (permalink / raw)
  To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, simona, sunil.khatri, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx



On 2/7/2025 11:58 AM, Jiang Liu wrote:
> Enhance error handling in function psp_sw_init() by:
> 1) bail out when failed to allocate memory
> 2) release allocated resource on error
> 3) introduce helper function psp_bo_init()
> 
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 84 ++++++++++++++++---------
>  1 file changed, 54 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 4e9766a1d421..0d1eb7b8e59b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -423,6 +423,50 @@ static bool psp_get_runtime_db_entry(struct amdgpu_device *adev,
>  	return ret;
>  }
>  
> +static int psp_bo_init(struct amdgpu_device *adev, struct psp_context *psp)
> +{
> +	int ret;
> +
> +	ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> +				      (amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ?
> +				      AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> +				      &psp->fw_pri_bo,
> +				      &psp->fw_pri_mc_addr,
> +				      &psp->fw_pri_buf);
> +	if (ret)
> +		goto failed1;

Better keep it as return ret, will avoid another label.

> +
> +	ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
> +				      AMDGPU_GEM_DOMAIN_VRAM |
> +				      AMDGPU_GEM_DOMAIN_GTT,
> +				      &psp->fence_buf_bo,
> +				      &psp->fence_buf_mc_addr,
> +				      &psp->fence_buf);
> +	if (ret)
> +		goto failed2;
> +
> +	ret = amdgpu_bo_create_kernel(adev, PSP_CMD_BUFFER_SIZE, PAGE_SIZE,
> +				      AMDGPU_GEM_DOMAIN_VRAM |
> +				      AMDGPU_GEM_DOMAIN_GTT,
> +				      &psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> +				      (void **)&psp->cmd_buf_mem);
> +	if (ret)
> +		goto failed3;
> +
> +	return 0;
> +
> +failed3:
> +	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> +			      &psp->fence_buf_mc_addr, &psp->fence_buf);
> +	psp->fence_buf_bo = NULL;

This NULL assignment is not required, same below as well.

> +failed2:
> +	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> +			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> +	psp->fw_pri_bo = NULL;
> +failed1:
> +	return ret;
> +}
> +
>  static int psp_sw_init(struct amdgpu_ip_block *ip_block)
>  {
>  	struct amdgpu_device *adev = ip_block->adev;
> @@ -435,7 +479,7 @@ static int psp_sw_init(struct amdgpu_ip_block *ip_block)
>  	psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
>  	if (!psp->cmd) {
>  		dev_err(adev->dev, "Failed to allocate memory to command buffer!\n");
> -		ret = -ENOMEM;
> +		return -ENOMEM;
>  	}
>  
>  	adev->psp.xgmi_context.supports_extended_data =
> @@ -482,50 +526,27 @@ static int psp_sw_init(struct amdgpu_ip_block *ip_block)
>  		ret = psp_memory_training_init(psp);
>  		if (ret) {
>  			dev_err(adev->dev, "Failed to initialize memory training!\n");
> -			return ret;
> +			goto failed1;
>  		}
>  
>  		ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
>  		if (ret) {
>  			dev_err(adev->dev, "Failed to process memory training!\n");
> -			return ret;
> +			goto failed2;
>  		}
>  	}
>  
> -	ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> -				      (amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ?
> -				      AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> -				      &psp->fw_pri_bo,
> -				      &psp->fw_pri_mc_addr,
> -				      &psp->fw_pri_buf);
> -	if (ret)
> -		return ret;
> -
> -	ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
> -				      AMDGPU_GEM_DOMAIN_VRAM |
> -				      AMDGPU_GEM_DOMAIN_GTT,
> -				      &psp->fence_buf_bo,
> -				      &psp->fence_buf_mc_addr,
> -				      &psp->fence_buf);
> -	if (ret)
> -		goto failed1;
> -
> -	ret = amdgpu_bo_create_kernel(adev, PSP_CMD_BUFFER_SIZE, PAGE_SIZE,
> -				      AMDGPU_GEM_DOMAIN_VRAM |
> -				      AMDGPU_GEM_DOMAIN_GTT,
> -				      &psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> -				      (void **)&psp->cmd_buf_mem);
> +	ret = psp_bo_init(adev, psp);
>  	if (ret)
>  		goto failed2;
>  
>  	return 0;
>  
>  failed2:
> -	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> -			      &psp->fence_buf_mc_addr, &psp->fence_buf);
> +	psp_memory_training_fini(psp);
>  failed1:
> -	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> -			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> +	kfree(psp->cmd);
> +	psp->cmd = NULL;
>  	return ret;
>  }
>  
> @@ -554,10 +575,13 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
>  
>  	amdgpu_bo_free_kernel(&psp->fw_pri_bo,
>  			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> +	psp->fw_pri_bo = NULL;
>  	amdgpu_bo_free_kernel(&psp->fence_buf_bo,
>  			      &psp->fence_buf_mc_addr, &psp->fence_buf);
> +	psp->fence_buf_bo = NULL;
>  	amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
>  			      (void **)&psp->cmd_buf_mem);
> +	psp->cmd_buf_bo = NULL;

This is already taken care by amdgpu_bo_free_kernel

Thanks,
Lijo

>  
>  	return 0;
>  }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()
  2025-02-07  6:28 ` [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() Jiang Liu
@ 2025-02-07 11:05   ` Lazar, Lijo
  2025-02-07 18:27     ` Alex Deucher
  0 siblings, 1 reply; 10+ messages in thread
From: Lazar, Lijo @ 2025-02-07 11:05 UTC (permalink / raw)
  To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, simona, sunil.khatri, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx



On 2/7/2025 11:58 AM, Jiang Liu wrote:
> In function psp_init_cap_microcode(), it should bail out when failed to
> load firmware, otherwise it may cause invalid memory access.
> 
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>

You may also add

	Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP")

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Thanks,
Lijo

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 0d1eb7b8e59b..952da6c7943d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -3838,9 +3838,10 @@ int psp_init_cap_microcode(struct psp_context *psp, const char *chip_name)
>  		if (err == -ENODEV) {
>  			dev_warn(adev->dev, "cap microcode does not exist, skip\n");
>  			err = 0;
> -			goto out;
> +		} else {
> +			dev_err(adev->dev, "fail to initialize cap microcode\n");
>  		}
> -		dev_err(adev->dev, "fail to initialize cap microcode\n");
> +		goto out;
>  	}
>  
>  	info = &adev->firmware.ucode[AMDGPU_UCODE_ID_CAP];


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
  2025-02-07 10:38   ` Lazar, Lijo
@ 2025-02-07 18:22     ` Alex Deucher
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Deucher @ 2025-02-07 18:22 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, simona, sunil.khatri, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx

Applied.  Thanks!

On Fri, Feb 7, 2025 at 6:17 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 2/7/2025 11:58 AM, Jiang Liu wrote:
> > Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini().
> >
> > Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
>
>         Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
>
> Thanks,
> Lijo
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index babe94ade247..4e9766a1d421 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -533,7 +533,6 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
> >  {
> >       struct amdgpu_device *adev = ip_block->adev;
> >       struct psp_context *psp = &adev->psp;
> > -     struct psp_gfx_cmd_resp *cmd = psp->cmd;
> >
> >       psp_memory_training_fini(psp);
> >
> > @@ -543,8 +542,8 @@ static int psp_sw_fini(struct amdgpu_ip_block *ip_block)
> >       amdgpu_ucode_release(&psp->cap_fw);
> >       amdgpu_ucode_release(&psp->toc_fw);
> >
> > -     kfree(cmd);
> > -     cmd = NULL;
> > +     kfree(psp->cmd);
> > +     psp->cmd = NULL;
> >
> >       psp_free_shared_bufs(psp);
> >
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()
  2025-02-07 11:05   ` Lazar, Lijo
@ 2025-02-07 18:27     ` Alex Deucher
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Deucher @ 2025-02-07 18:27 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
	airlied, simona, sunil.khatri, Hawking.Zhang, mario.limonciello,
	xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx

Added the fixes tag and applied.  Thanks!

On Fri, Feb 7, 2025 at 6:07 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 2/7/2025 11:58 AM, Jiang Liu wrote:
> > In function psp_init_cap_microcode(), it should bail out when failed to
> > load firmware, otherwise it may cause invalid memory access.
> >
> > Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
>
> You may also add
>
>         Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP")
>
> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
>
> Thanks,
> Lijo
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 0d1eb7b8e59b..952da6c7943d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -3838,9 +3838,10 @@ int psp_init_cap_microcode(struct psp_context *psp, const char *chip_name)
> >               if (err == -ENODEV) {
> >                       dev_warn(adev->dev, "cap microcode does not exist, skip\n");
> >                       err = 0;
> > -                     goto out;
> > +             } else {
> > +                     dev_err(adev->dev, "fail to initialize cap microcode\n");
> >               }
> > -             dev_err(adev->dev, "fail to initialize cap microcode\n");
> > +             goto out;
> >       }
> >
> >       info = &adev->firmware.ucode[AMDGPU_UCODE_ID_CAP];
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-02-07 18:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07  6:28 [v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp Jiang Liu
2025-02-07  6:28 ` [v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Jiang Liu
2025-02-07 10:38   ` Lazar, Lijo
2025-02-07 18:22     ` Alex Deucher
2025-02-07  6:28 ` [v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init() Jiang Liu
2025-02-07 10:50   ` Lazar, Lijo
2025-02-07  6:28 ` [v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() Jiang Liu
2025-02-07 11:05   ` Lazar, Lijo
2025-02-07 18:27     ` Alex Deucher
2025-02-07  6:28 ` [v1 4/4] drm/amdgpu: simplify invoke of psp_ta_init_shared_buf() Jiang Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.