* [PATCH] drm/amdgpu: initialize ret in UMC error record fill
@ 2026-06-17 18:37 Ruoyu Wang
2026-06-17 18:40 ` Christian König
2026-06-17 20:50 ` sashiko-bot
0 siblings, 2 replies; 3+ messages in thread
From: Ruoyu Wang @ 2026-06-17 18:37 UTC (permalink / raw)
To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
amd-gfx, dri-devel, linux-kernel
umc_v12_0_fill_error_record() returns ret after walking the pages
reported by amdgpu_umc_lookup_bad_pages_in_a_row(). That helper can
return zero, including when its temporary allocation fails, leaving the
loop skipped and ret uninitialized.
Initialize ret to 0 so the zero-page path reports a deterministic status
instead of returning stack data.
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 14092150336a5..77a9f6a47d428 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
@@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
{
struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
- int ret, i, count;
+ int ret = 0, i, count;
if (!err_data || !ecc_err)
return -EINVAL;
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amdgpu: initialize ret in UMC error record fill
2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
@ 2026-06-17 18:40 ` Christian König
2026-06-17 20:50 ` sashiko-bot
1 sibling, 0 replies; 3+ messages in thread
From: Christian König @ 2026-06-17 18:40 UTC (permalink / raw)
To: Ruoyu Wang, Alex Deucher, David Airlie, Simona Vetter, amd-gfx,
dri-devel, linux-kernel
On 6/17/26 20:37, Ruoyu Wang wrote:
> umc_v12_0_fill_error_record() returns ret after walking the pages
> reported by amdgpu_umc_lookup_bad_pages_in_a_row(). That helper can
> return zero, including when its temporary allocation fails, leaving the
> loop skipped and ret uninitialized.
>
> Initialize ret to 0 so the zero-page path reports a deterministic status
> instead of returning stack data.
>
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
> ---
> drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> index 14092150336a5..77a9f6a47d428 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> @@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
> {
> struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
> uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
> - int ret, i, count;
> + int ret = 0, i, count;
That is usually considered bad coding style. Better initialize the variable directly before the loop.
Regards,
Christian.
>
> if (!err_data || !ecc_err)
> return -EINVAL;
> --
> 2.51.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amdgpu: initialize ret in UMC error record fill
2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
2026-06-17 18:40 ` Christian König
@ 2026-06-17 20:50 ` sashiko-bot
1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-17 20:50 UTC (permalink / raw)
To: Ruoyu Wang; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Initializing `ret` to 0 masks bad page lookup failures, causing hardware memory errors to be permanently discarded.
--
commit 70c0c3121204f18673da4770e06143179e0b0b6c
Author: Ruoyu Wang <ruoyuw560@gmail.com>
drm/amdgpu: initialize ret in UMC error record fill
This commit initializes the ret variable to 0 in umc_v12_0_fill_error_record()
to avoid returning uninitialized stack data when the bad page lookup loop
is skipped.
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> index a6df6a778f50b..3d84223acfa20 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
> @@ -681,7 +681,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev,
> {
> struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
> uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL];
> - int ret, i, count;
> + int ret = 0, i, count;
[Severity: High]
Does initializing this to 0 introduce a regression that masks bad page
lookup failures?
If amdgpu_umc_lookup_bad_pages_in_a_row() returns 0 (e.g. kcalloc failure)
or a negative error code (e.g. address translation failure), count will be
<= 0. The for loop will be skipped, and this function will return 0 (success).
If this returns success on a failure, won't the caller incorrectly clear
the error tag?
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c:umc_v12_0_query_ras_ecc_err_addr() {
...
ret = umc_v12_0_fill_error_record(adev, entries[i], ras_error_status);
if (ret) {
dev_err(adev->dev, "Fail to fill umc error record, ret:%d\n", ret);
break;
}
radix_tree_tag_clear(ecc_tree,
entries[i]->pa_pfn, UMC_ECC_NEW_DETECTED_TAG);
...
}
This appears to permanently drop the hardware ECC error event since it clears
the UMC_ECC_NEW_DETECTED_TAG without actually recording the bad pages for
retirement.
>
> if (!err_data || !ecc_err)
> return -EINVAL;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260617183737.1101458-1-ruoyuw560@gmail.com?part=1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-17 20:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 18:37 [PATCH] drm/amdgpu: initialize ret in UMC error record fill Ruoyu Wang
2026-06-17 18:40 ` Christian König
2026-06-17 20:50 ` sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.