From: Mario Limonciello <mario.limonciello@amd.com>
To: Jiang Liu <gerry@linux.alibaba.com>,
alexander.deucher@amd.com, christian.koenig@amd.com,
Xinhui.Pan@amd.com, airlied@gmail.com, simona@ffwll.ch,
sunil.khatri@amd.com, lijo.lazar@amd.com, Hawking.Zhang@amd.com,
Jun.Ma2@amd.com, xiaogang.chen@amd.com, Kent.Russell@amd.com,
shuox.liu@linux.alibaba.com, amd-gfx@lists.freedesktop.org
Subject: Re: [RFC PATCH 10/13] drm/admgpu: make device state machine work in stack like way
Date: Wed, 8 Jan 2025 11:13:30 -0600 [thread overview]
Message-ID: <71bbc63e-55bf-496b-8574-2fbfcfe40e17@amd.com> (raw)
In-Reply-To: <3d3920095879123b7261c7529ad4a61ee5e56259.1736344725.git.gerry@linux.alibaba.com>
On 1/8/2025 08:00, Jiang Liu wrote:
> Make the device state machine work in stack like way to better support
> suspend/resume by following changes:
>
> 1. amdgpu_driver_load_kms()
> amdgpu_device_init()
> amdgpu_device_ip_early_init()
> ip_blocks[i].early_init()
> ip_blocks[i].status.valid = true
> amdgpu_device_ip_init()
> amdgpu_ras_init()
> ip_blocks[i].sw_init()
> ip_blocks[i].status.sw = true
> ip_blocks[i].hw_init()
> ip_blocks[i].status.hw = true
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
>
> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
> amdgpu_device_suspend()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> amdgpu_ras_feature_disable()
> amdgpu_ras_suspend()
> amdgpu_ras_disable_all_features()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].suspend()
>
> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
> amdgpu_device_resume()
> amdgpu_device_ip_resume()
> ip_blocks[i].resume()
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
> amdgpu_ras_resume()
> amdgpu_ras_enable_all_features()
>
> 4. amdgpu_driver_unload_kms()
> amdgpu_device_fini_hw()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].hw_fini()
> ip_blocks[i].status.hw = false
>
> 5. amdgpu_driver_release_kms()
> amdgpu_device_fini_sw()
> amdgpu_device_ip_fini()
> ip_blocks[i].sw_fini()
> ip_blocks[i].status.sw = false
> --- ip_blocks[i].status.valid = false
> +++ amdgpu_ras_fini()
> ip_blocks[i].late_fini()
> +++ ip_blocks[i].status.valid = false
> --- ip_blocks[i].status.late_initialized = false
> --- amdgpu_ras_fini()
>
> The main changes include:
> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
> Currently there's only one ip block which provides `early_fini`
> callback. We have add a check of `in_s3` to keep current behavior in
> function amdgpu_dm_early_fini(). So there should be no functional
> changes.
FWIW You added more than just the in_s3 (which is correct, so update
commit message!).
> 2) set ip_blocks[i].status.late_initialized to false after calling
> callback `early_fini`. We have auditted all usages of the
> late_initialized flag and no functional changes found.
> 3) only set ip_blocks[i].status.valid = false after calling the
> `late_fini` callback.
> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
>
> There's one more task left to analyze GPU reset related state machine
> transitions.
>
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 +++++++++++++++++--
> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
> 2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 36a33a391411..5c6b39e5cdaa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3411,6 +3411,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> adev->ip_blocks[i].status.sw = false;
> }
>
> + amdgpu_ras_fini(adev);
> +
> for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> if (!adev->ip_blocks[i].status.valid)
> continue;
> @@ -3419,8 +3421,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> adev->ip_blocks[i].status.valid = false;
> }
>
> - amdgpu_ras_fini(adev);
> -
> return 0;
> }
>
> @@ -3478,6 +3478,24 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
> if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> dev_warn(adev->dev, "Failed to disallow df cstate");
>
> + for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
This is the 37th time we have a for loop that walks the IP blocks.
I'm thinking it would be good to have for_each_ip_block macro, what do
you think?
> + if (!adev->ip_blocks[i].status.valid)
> + continue;
> + if (!adev->ip_blocks[i].status.late_initialized)
> + continue;
If you take my idea in the cover letter of moving the state machine into
a single variable I think that some of these cases can be a little bit
cleaner. IE if it was never valid it wouldn't have progressed to 'hw'
or 'sw' states.
This check (and other similar ones) could turn into something like this:
if (adev->ip_blocks[i].status != AMDGPU_STATE_LATE_INIT)
continue;
> +
> + if (adev->ip_blocks[i].version->funcs->early_fini) {
> + r = adev->ip_blocks[i].version->funcs->early_fini(&adev->ip_blocks[i]);
> + if (r) {
> + DRM_ERROR(" of IP block <%s> failed %d\n",
> + adev->ip_blocks[i].version->funcs->name, r);
> + return r;
> + }
> + }
> +
> + adev->ip_blocks[i].status.late_initialized = false;
> + }
> +
> for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> if (!adev->ip_blocks[i].status.valid)
> continue;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index f622eb1551df..33a1a795c761 100755
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2175,6 +2175,9 @@ static int amdgpu_dm_early_fini(struct amdgpu_ip_block *ip_block)
> {
> struct amdgpu_device *adev = ip_block->adev;
>
> + if (adev->in_s0ix || adev->in_s3 || adev->in_s4 || adev->in_suspend)
> + return 0;
> +
I think this set of changes to display code (amdgpu_dm) should split to
it's own patch and stand on it's own.
> amdgpu_dm_audio_fini(adev);
>
> return 0;
next prev parent reply other threads:[~2025-01-08 17:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 13:59 Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
2025-01-08 15:02 ` Alex Deucher
2025-01-08 13:59 ` [RFC PATCH 02/13] drm/admgpu: add helper functions to track status for ras manager Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status Jiang Liu
2025-01-08 17:19 ` Mario Limonciello
2025-01-10 3:19 ` Gerry Liu
2025-01-10 16:58 ` Mario Limonciello
2025-01-10 17:16 ` Alex Deucher
2025-01-08 13:59 ` [RFC PATCH 04/13] drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init() Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 05/13] drm/amdgpu: introduce a flag to track refcount held for features Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 06/13] drm/amdgpu: enhance amdgpu_ras_block_late_fini() Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 07/13] drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 08/13] drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way Jiang Liu
2025-01-08 17:04 ` Mario Limonciello
2025-01-08 14:00 ` [RFC PATCH 10/13] drm/admgpu: make device state machine work " Jiang Liu
2025-01-08 17:13 ` Mario Limonciello [this message]
2025-01-08 14:00 ` [RFC PATCH 11/13] drm/amdgpu/sdma: improve the way to manage irq reference count Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 12/13] drm/amdgpu/nbio: " Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 13/13] drm/amdgpu/asic: make ip block operations symmetric by .early_fini() Jiang Liu
2025-01-08 14:10 ` Christian König
2025-01-08 16:33 ` Re: Mario Limonciello
2025-01-09 5:34 ` Re: Gerry Liu
2025-01-09 17:10 ` Re: Mario Limonciello
2025-01-13 1:19 ` Re: Gerry Liu
2025-01-13 21:59 ` Re: Mario Limonciello
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71bbc63e-55bf-496b-8574-2fbfcfe40e17@amd.com \
--to=mario.limonciello@amd.com \
--cc=Hawking.Zhang@amd.com \
--cc=Jun.Ma2@amd.com \
--cc=Kent.Russell@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=gerry@linux.alibaba.com \
--cc=lijo.lazar@amd.com \
--cc=shuox.liu@linux.alibaba.com \
--cc=simona@ffwll.ch \
--cc=sunil.khatri@amd.com \
--cc=xiaogang.chen@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.