From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 886BFC3ABBF for ; Wed, 7 May 2025 19:46:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B172C10E888; Wed, 7 May 2025 19:45:57 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="Fakkwfbq"; dkim-atps=neutral Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by gabe.freedesktop.org (Postfix) with ESMTPS id AC8D910E884; Wed, 7 May 2025 19:45:56 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 964AE629D8; Wed, 7 May 2025 19:45:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA117C4CEE2; Wed, 7 May 2025 19:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746647155; bh=WgSebmhN43frxJW5nnVmkO3qM8pnX9hlJyAGZ4H6jUk=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=FakkwfbqNa7criGhemy8k/vdD6lIcFMzAMsICjzWwYUU6EgvgheUeQNY0TLnoO0QM gyQu1n8Pjvn0KW6morbQo0P9txgQrZIUNrh9vzdaulF7voDI4kg9g2qqPXXregpcAh rljU1g9GeUz8Ieu/Av+Ds4VCAZmyfmEHLTC4E/bX5rO+iAwHuyuhLFNhkcD7SxyC2m 6op/IrrFwzyzNym9yhTF/dtAEq9q/ip2cBvBik96QyLTZO7MDqW1I2RXcBmAka0S6y j3+BrEPyEgfAPJmYH23fLQT+EAP0WyvlA9vZ8hoTKAjZRs5417eVdm3zbd7KBUZnHi ggZl1zv7B0YtA== Message-ID: <74428a0f-754b-4f85-bca3-48216613c208@kernel.org> Date: Wed, 7 May 2025 14:45:53 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications To: "Rafael J. Wysocki" Cc: Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , "open list:RADEON and AMDGPU DRM DRIVERS" , "open list:DRM DRIVERS" , "open list:HIBERNATION (aka Software Suspend, aka swsusp)" , Mario Limonciello References: <20250501211734.2434369-1-superm1@kernel.org> <20250501211734.2434369-3-superm1@kernel.org> <98d527c6-a185-40f9-8ce3-46f5d7a67e81@kernel.org> Content-Language: en-US From: Mario Limonciello In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On 5/7/2025 2:39 PM, Rafael J. Wysocki wrote: > On Wed, May 7, 2025 at 9:17 PM Mario Limonciello wrote: >> >> On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote: >>> On Thu, May 1, 2025 at 11:17 PM Mario Limonciello wrote: >>>> >>>> From: Mario Limonciello >>>> >>>> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification >>>> callback support") introduced a VRAM eviction earlier in the PM >>>> sequences when swap was still available for evicting to. This helped >>>> to fix a number of memory pressure related bugs but also exposed a >>>> new one. >>>> >>>> If a userspace process is actively using the GPU when suspend starts >>>> then a deadlock could occur. >>>> >>>> Instead of going off the prepare notifier, use the PM notifiers that >>>> occur after processes have been frozen to do evictions. >>>> >>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 >>>> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") >>>> Signed-off-by: Mario Limonciello >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index 7f354cd532dc1..cad311b9fd834 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mo >>>> int r; >>>> >>>> switch (mode) { >>>> - case PM_HIBERNATION_PREPARE: >>>> + case PM_HIBERNATION_POST_FREEZE: >>>> adev->in_s4 = true; >>>> fallthrough; >>>> - case PM_SUSPEND_PREPARE: >>>> + case PM_SUSPEND_POST_FREEZE: >>>> r = amdgpu_device_evict_resources(adev); >>>> /* >>>> * This is considered non-fatal at this time because >>>> -- >>> >>> Why do you need a notifier for this? >>> >>> It looks like this could be done from amdgpu_device_prepare(), but if >>> there is a reason why it cannot be done from there, it should be >>> mentioned in the changelog. >> >> It's actually done in amdgpu_device_prepare() "as well" already, but the >> reason that it's being done earlier is because swap still needs to be >> available, especially with heavy memory fragmentation. > > Swap should be still available when amdgpu_device_prepare() runs. No; it's not. The basic call trace (for suspend) looks like this: enter_state(state) { suspend_prepare(state); ... pm_restrict_gfp_mask(); // disable swap suspend_devices_and_enter(state) → dpm_suspend_start() { dpm_prepare() { amdgpu_pmops_prepare(); } dpm_suspend() { amdgpu_pmops_suspend(); } } } If the intention was for it to be available, it would be better to move the pm_restrict_gfp_mask() call "into" suspend_devices_and_enter() between dpm_prepare() and dpm_suspend() calls. > >> I'll add more detail about this to the commit for the next spin if >> you're relatively happy with the new notifier from the first patch. > > I need to have a look at it, but adding it for just one user seems a > bit over the top. I'd prefer to avoid doing this.