All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: initialize ret in UMC error record fill
@ 2026-06-17 18:37 Ruoyu Wang
  2026-06-17 18:40 ` Christian König
  2026-06-17 20:50 ` sashiko-bot
  0 siblings, 2 replies; 3+ messages in thread
From: Ruoyu Wang @ 2026-06-17 18:37 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	amd-gfx, dri-devel, linux-kernel

umc_v12_0_fill_error_record() returns ret after walking the pages
reported by amdgpu_umc_lookup_bad_pages_in_a_row(). That helper can
return zero, including when its temporary allocation fails, leaving the
loop skipped and ret uninitialized.

Initialize ret to 0 so the zero-page path reports a deterministic status
instead of returning stack data.

Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 14092150336a5..77a9f6a47d428 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
@@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
 {
 	struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
 	uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
-	int ret, i, count;
+	int ret = 0, i, count;
 
 	if (!err_data || !ecc_err)
 		return -EINVAL;
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: initialize ret in UMC error record fill
  2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
@ 2026-06-17 18:40 ` Christian König
  2026-06-17 20:50 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: Christian König @ 2026-06-17 18:40 UTC (permalink / raw)
  To: Ruoyu Wang, Alex Deucher, David Airlie, Simona Vetter, amd-gfx,
	dri-devel, linux-kernel

On 6/17/26 20:37, Ruoyu Wang wrote:
> umc_v12_0_fill_error_record() returns ret after walking the pages
> reported by amdgpu_umc_lookup_bad_pages_in_a_row(). That helper can
> return zero, including when its temporary allocation fails, leaving the
> loop skipped and ret uninitialized.
> 
> Initialize ret to 0 so the zero-page path reports a deterministic status
> instead of returning stack data.
> 
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> index 14092150336a5..77a9f6a47d428 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> @@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
>  {
>         struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
>         uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
> -       int ret, i, count;
> +       int ret = 0, i, count;

That is usually considered bad coding style. Better initialize the variable directly before the loop.

Regards,
Christian.

> 
>         if (!err_data || !ecc_err)
>                 return -EINVAL;
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: initialize ret in UMC error record fill
  2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
  2026-06-17 18:40 ` Christian König
@ 2026-06-17 20:50 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-17 20:50 UTC (permalink / raw)
  To: Ruoyu Wang; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Initializing `ret` to 0 masks bad page lookup failures, causing hardware memory errors to be permanently discarded.
--

commit 70c0c3121204f18673da4770e06143179e0b0b6c
Author: Ruoyu Wang <ruoyuw560@gmail.com>

drm/amdgpu: initialize ret in UMC error record fill

This commit initializes the ret variable to 0 in umc_v12_0_fill_error_record()
to avoid returning uninitialized stack data when the bad page lookup loop
is skipped.

> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> index a6df6a778f50b..3d84223acfa20 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> @@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
>  {
>  	struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
>  	uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
> -	int ret, i, count;
> +	int ret = 0, i, count;

[Severity: High]
Does initializing this to 0 introduce a regression that masks bad page
lookup failures?

If amdgpu_umc_lookup_bad_pages_in_a_row() returns 0 (e.g. kcalloc failure)
or a negative error code (e.g. address translation failure), count will be
<= 0. The for loop will be skipped, and this function will return 0 (success).

If this returns success on a failure, won't the caller incorrectly clear
the error tag?

drivers/gpu/drm/amd/amdgpu/umc_v12_0.c:umc_v12_0_query_ras_ecc_err_addr() {
    ...
		ret = umc_v12_0_fill_error_record(adev, entries[i], ras_error_status);
		if (ret) {
			dev_err(adev->dev, "Fail to fill umc error record, ret:%d\n", ret);
			break;
		}
		radix_tree_tag_clear(ecc_tree,
				entries[i]->pa_pfn, UMC_ECC_NEW_DETECTED_TAG);
    ...
}

This appears to permanently drop the hardware ECC error event since it clears
the UMC_ECC_NEW_DETECTED_TAG without actually recording the bad pages for
retirement.

>  
>  	if (!err_data || !ecc_err)
>  		return -EINVAL;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617183737.1101458-1-ruoyuw560@gmail.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-17 20:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
2026-06-17 18:40 ` Christian König
2026-06-17 20:50 ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.