From: Mario Limonciello <superm1@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"open list:RADEON and AMDGPU DRM DRIVERS"
<amd-gfx@lists.freedesktop.org>,
"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
"open list:HIBERNATION (aka Software Suspend,
aka swsusp)" <linux-pm@vger.kernel.org>,
"Mario Limonciello" <mario.limonciello@amd.com>
Subject: Re: [RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications
Date: Wed, 7 May 2025 14:45:53 -0500 [thread overview]
Message-ID: <74428a0f-754b-4f85-bca3-48216613c208@kernel.org> (raw)
In-Reply-To: <CAJZ5v0i=9fpg2YxJhd+2rAx1gkqaquoExHvgMiFefn6YqVieOA@mail.gmail.com>
On 5/7/2025 2:39 PM, Rafael J. Wysocki wrote:
> On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
>>> On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>
>>>> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
>>>> callback support") introduced a VRAM eviction earlier in the PM
>>>> sequences when swap was still available for evicting to. This helped
>>>> to fix a number of memory pressure related bugs but also exposed a
>>>> new one.
>>>>
>>>> If a userspace process is actively using the GPU when suspend starts
>>>> then a deadlock could occur.
>>>>
>>>> Instead of going off the prepare notifier, use the PM notifiers that
>>>> occur after processes have been frozen to do evictions.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
>>>> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support")
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 7f354cd532dc1..cad311b9fd834 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mo
>>>> int r;
>>>>
>>>> switch (mode) {
>>>> - case PM_HIBERNATION_PREPARE:
>>>> + case PM_HIBERNATION_POST_FREEZE:
>>>> adev->in_s4 = true;
>>>> fallthrough;
>>>> - case PM_SUSPEND_PREPARE:
>>>> + case PM_SUSPEND_POST_FREEZE:
>>>> r = amdgpu_device_evict_resources(adev);
>>>> /*
>>>> * This is considered non-fatal at this time because
>>>> --
>>>
>>> Why do you need a notifier for this?
>>>
>>> It looks like this could be done from amdgpu_device_prepare(), but if
>>> there is a reason why it cannot be done from there, it should be
>>> mentioned in the changelog.
>>
>> It's actually done in amdgpu_device_prepare() "as well" already, but the
>> reason that it's being done earlier is because swap still needs to be
>> available, especially with heavy memory fragmentation.
>
> Swap should be still available when amdgpu_device_prepare() runs.
No; it's not. The basic call trace (for suspend) looks like this:
enter_state(state) {
suspend_prepare(state);
...
pm_restrict_gfp_mask(); // disable swap
suspend_devices_and_enter(state) → dpm_suspend_start() {
dpm_prepare() {
amdgpu_pmops_prepare();
}
dpm_suspend() {
amdgpu_pmops_suspend();
}
}
}
If the intention was for it to be available, it would be better to move
the pm_restrict_gfp_mask() call "into" suspend_devices_and_enter()
between dpm_prepare() and dpm_suspend() calls.
>
>> I'll add more detail about this to the commit for the next spin if
>> you're relatively happy with the new notifier from the first patch.
>
> I need to have a look at it, but adding it for just one user seems a
> bit over the top. I'd prefer to avoid doing this.
next prev parent reply other threads:[~2025-05-07 19:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 21:17 [RFC 0/2] Evict VRAM after processes are frozen Mario Limonciello
2025-05-01 21:17 ` [RFC 1/2] PM: Add suspend and hibernate notifications for after freeze Mario Limonciello
2025-05-02 16:37 ` Mario Limonciello
2025-05-01 21:17 ` [RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications Mario Limonciello
2025-05-07 19:14 ` Rafael J. Wysocki
2025-05-07 19:17 ` Mario Limonciello
2025-05-07 19:39 ` Rafael J. Wysocki
2025-05-07 19:42 ` Alex Deucher
2025-05-07 19:45 ` Mario Limonciello [this message]
2025-05-07 19:51 ` Rafael J. Wysocki
2025-05-02 14:47 ` [RFC 0/2] Evict VRAM after processes are frozen Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74428a0f-754b-4f85-bca3-48216613c208@kernel.org \
--to=superm1@kernel.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-pm@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox