All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted
@ 2026-06-17  6:24 Jakob Linke
  2026-06-17  6:45 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: Jakob Linke @ 2026-06-17  6:24 UTC (permalink / raw)
  To: Alex Deucher, Christian König, amd-gfx
  Cc: Lijo Lazar, dri-devel, linux-kernel

For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
S3 suspend got aborted, the same issue already handled for SOC21 and SOC15:

  commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted")
  commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of suspend abort")

The aborted resume fails with:

  amdgpu: SMU: No response msg_reg: 6 resp_reg: 0
  amdgpu: Failed to enable requested dpm features!
  amdgpu: resume of IP block <smu> failed -62

Apply the same workaround for soc24: detect the aborted-suspend state at
resume via the sign-of-life register and reset the device before re-init.

This is a workaround till a proper solution is finalized.

Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)")
Cc: stable@vger.kernel.org
Signed-off-by: Jakob Linke <jakob@linke.cx>
---
Tested on Navi 44 (RX 9060 XT): recovers the deep->s2idle fallback and pure
s2idle resumes that otherwise fail with "resume of IP block <smu> failed -62".
It did not recover every case: one resume still failed under sustained rapid
s2idle cycling, so like the SOC21/SOC15 versions this is a mitigation, not a
complete fix. Single suspends in normal use recover.

 drivers/gpu/drm/amd/amdgpu/soc24.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
index ecb6c3fcfbd1..a970d8a76302 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc24.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
@@ -521,8 +521,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block *ip_block)
 	return soc24_common_hw_fini(ip_block);
 }
 
+static bool soc24_need_reset_on_resume(struct amdgpu_device *adev)
+{
+	u32 sol_reg1, sol_reg2;
+
+	/* Will reset for the following suspend abort cases.
+	 * 1) Only reset dGPU side.
+	 * 2) S3 suspend got aborted and TOS is active.
+	 *    As for dGPU suspend abort cases the SOL value
+	 *    will be kept as zero at this resume point.
+	 */
+	if (!(adev->flags & AMD_IS_APU) && adev->in_s3) {
+		sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
+		msleep(100);
+		sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
+
+		return (sol_reg1 != sol_reg2);
+	}
+
+	return false;
+}
+
 static int soc24_common_resume(struct amdgpu_ip_block *ip_block)
 {
+	struct amdgpu_device *adev = ip_block->adev;
+
+	if (soc24_need_reset_on_resume(adev)) {
+		dev_info(adev->dev, "S3 suspend aborted, resetting...");
+		soc24_asic_reset(adev);
+	}
+
 	return soc24_common_hw_init(ip_block);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted
  2026-06-17  6:24 [PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted Jakob Linke
@ 2026-06-17  6:45 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-17  6:45 UTC (permalink / raw)
  To: Jakob Linke; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Unconditional 100ms delay added to all normal S3 resumes due to a missing suspend-abort check.
- [Medium] Flawed condition or contradictory documentation for detecting suspend abort.
--

commit f65d16d9bd092a9093fda22e85afb7762ebbe6a5
Author: Jakob Linke <jakob@linke.cx>

drm/amdgpu/soc24: reset dGPU if suspend got aborted

For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
S3 suspend got aborted. Apply the same workaround as soc15 and soc21 to
detect the aborted-suspend state and reset the device before re-init.

> diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
> index 265db9331d0bb..9dce30d2bb8d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc24.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
[ ... ]
> @@ -496,8 +496,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block *ip_block)
>  	return soc24_common_hw_fini(ip_block);
>  }
>  
> +static bool soc24_need_reset_on_resume(struct amdgpu_device *adev)
> +{
> +	u32 sol_reg1, sol_reg2;
> +
> +	/* Will reset for the following suspend abort cases.
> +	 * 1) Only reset dGPU side.
> +	 * 2) S3 suspend got aborted and TOS is active.
> +	 *    As for dGPU suspend abort cases the SOL value
> +	 *    will be kept as zero at this resume point.
> +	 */
> +	if (!(adev->flags & AMD_IS_APU) && adev->in_s3) {
> +		sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> +		msleep(100);

[Severity: Medium]
Does this add an unconditional 100ms delay to all S3 resumes?

Since adev->in_s3 is true during all S3 resumes, not just aborted ones, this
path seems to execute and sleep every time the system normally wakes up.

Could this use a fast-path check, like skipping the sleep if sol_reg1 is
non-zero, or perhaps checking !pm_resume_via_firmware() similar to what the
soc15 workaround does?

> +		sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> +
> +		return (sol_reg1 != sol_reg2);

[Severity: Medium]
Is the block comment above accurate about the SOL value behavior?

The comment states that for dGPU suspend abort cases, the SOL value will be
kept as zero at this resume point. If both sol_reg1 and sol_reg2 read as zero,
the condition (0 != 0) will evaluate to false, and the reset will not be
triggered.

If the SOL registers stay zero during an abort, does this logic actually fail
to detect the abort case entirely?

> +	}
> +
> +	return false;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617062415.19898-1-jakob@linke.cx?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-17  6:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17  6:24 [PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted Jakob Linke
2026-06-17  6:45 ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.