All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mario Limonciello <superm1@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"open list:RADEON and AMDGPU DRM DRIVERS"
	<amd-gfx@lists.freedesktop.org>,
	"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
	"open list:HIBERNATION (aka Software Suspend,
	aka swsusp)" <linux-pm@vger.kernel.org>,
	"Mario Limonciello" <mario.limonciello@amd.com>
Subject: Re: [RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications
Date: Wed, 7 May 2025 14:45:53 -0500	[thread overview]
Message-ID: <74428a0f-754b-4f85-bca3-48216613c208@kernel.org> (raw)
In-Reply-To: <CAJZ5v0i=9fpg2YxJhd+2rAx1gkqaquoExHvgMiFefn6YqVieOA@mail.gmail.com>

On 5/7/2025 2:39 PM, Rafael J. Wysocki wrote:
> On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
>>> On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>
>>>> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
>>>> callback support") introduced a VRAM eviction earlier in the PM
>>>> sequences when swap was still available for evicting to. This helped
>>>> to fix a number of memory pressure related bugs but also exposed a
>>>> new one.
>>>>
>>>> If a userspace process is actively using the GPU when suspend starts
>>>> then a deadlock could occur.
>>>>
>>>> Instead of going off the prepare notifier, use the PM notifiers that
>>>> occur after processes have been frozen to do evictions.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
>>>> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support")
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 7f354cd532dc1..cad311b9fd834 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mo
>>>>           int r;
>>>>
>>>>           switch (mode) {
>>>> -       case PM_HIBERNATION_PREPARE:
>>>> +       case PM_HIBERNATION_POST_FREEZE:
>>>>                   adev->in_s4 = true;
>>>>                   fallthrough;
>>>> -       case PM_SUSPEND_PREPARE:
>>>> +       case PM_SUSPEND_POST_FREEZE:
>>>>                   r = amdgpu_device_evict_resources(adev);
>>>>                   /*
>>>>                    * This is considered non-fatal at this time because
>>>> --
>>>
>>> Why do you need a notifier for this?
>>>
>>> It looks like this could be done from amdgpu_device_prepare(), but if
>>> there is a reason why it cannot be done from there, it should be
>>> mentioned in the changelog.
>>
>> It's actually done in amdgpu_device_prepare() "as well" already, but the
>> reason that it's being done earlier is because swap still needs to be
>> available, especially with heavy memory fragmentation.
> 
> Swap should be still available when amdgpu_device_prepare() runs.

No; it's not.  The basic call trace (for suspend) looks like this:

enter_state(state) {
     suspend_prepare(state);
     ...
     pm_restrict_gfp_mask();  // disable swap
     suspend_devices_and_enter(state) → dpm_suspend_start() {
         dpm_prepare() {
             amdgpu_pmops_prepare();
         }
         dpm_suspend() {
             amdgpu_pmops_suspend();
         }
     }
}

If the intention was for it to be available, it would be better to move 
the pm_restrict_gfp_mask() call "into" suspend_devices_and_enter() 
between dpm_prepare() and dpm_suspend() calls.

> 
>> I'll add more detail about this to the commit for the next spin if
>> you're relatively happy with the new notifier from the first patch.
> 
> I need to have a look at it, but adding it for just one user seems a
> bit over the top.  I'd prefer to avoid doing this.


  parent reply	other threads:[~2025-05-07 19:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-01 21:17 [RFC 0/2] Evict VRAM after processes are frozen Mario Limonciello
2025-05-01 21:17 ` [RFC 1/2] PM: Add suspend and hibernate notifications for after freeze Mario Limonciello
2025-05-02 16:37   ` Mario Limonciello
2025-05-01 21:17 ` [RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications Mario Limonciello
2025-05-07 19:14   ` Rafael J. Wysocki
2025-05-07 19:17     ` Mario Limonciello
2025-05-07 19:39       ` Rafael J. Wysocki
2025-05-07 19:42         ` Alex Deucher
2025-05-07 19:45         ` Mario Limonciello [this message]
2025-05-07 19:51           ` Rafael J. Wysocki
2025-05-02 14:47 ` [RFC 0/2] Evict VRAM after processes are frozen Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74428a0f-754b-4f85-bca3-48216613c208@kernel.org \
    --to=superm1@kernel.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.