Re: [PATCH v3 0/5] reduce system memory requirement for hibernation

Linux Power Management development
 help / color / mirror / Atom feed

From: Mario Limonciello <mario.limonciello@amd.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: lijo.lazar@amd.com, victor.zhao@amd.com, haijun.chang@amd.com,
	Qing.Ma@amd.com, Owen.Zhang2@amd.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	Samuel Zhang <guoqing.zhang@amd.com>,
	alexander.deucher@amd.com, christian.koenig@amd.com,
	len.brown@intel.com, pavel@kernel.org,
	gregkh@linuxfoundation.org, dakr@kernel.org, airlied@gmail.com,
	simona@ffwll.ch, ray.huang@amd.com, matthew.auld@intel.com,
	matthew.brost@intel.com, maarten.lankhorst@linux.intel.com,
	mripard@kernel.org, tzimmermann@suse.de
Subject: Re: [PATCH v3 0/5] reduce system memory requirement for hibernation
Date: Tue, 8 Jul 2025 10:31:01 -0400	[thread overview]
Message-ID: <6758cecc-d324-4ed5-b38e-2a4384a34d60@amd.com> (raw)
In-Reply-To: <20250708074248.1674924-1-guoqing.zhang@amd.com>

On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> Modern data center dGPUs are usually equipped with very large VRAM. On
> server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
> will fail due to no enough free memory.
> 
> The root cause is that during hibernation all VRAM memory get evicted to
> GTT or shmem. In both case, it is in system memory and kernel will try to
> copy the pages to hibernation image. In the worst case, this causes 2
> copies of VRAM memory in system memory, 2TB is not enough for the
> hibernation image. 192GB * 8 * 2 = 3TB > 2TB.
> 
> The fix includes following changes. With these changes, there's much less
> pages needed to be copied to hibernate image and hibernation can succeed.
> * patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT
>    pages can be freed.
> * patch 3: force write shmem pages to swap disk and free shmem pages.
> 
> After swapout GTT to shmem in hibernation prepare stage, the GPU will be
> resumed again in thaw stage. The swapin and restore BOs of resume takes
> lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
> writing hibernation image do not need GPU for hibernate successful case.
> * patch 4 and 5: skip resume of device in thaw stage for successful
>    hibernation case to reduce the hibernation time.
> 
> v2:
> * split first patch to 2 patches, 1 for ttm, 1 for amdgpu
> * refined the new ttm api
> * add more comments for shrink_shmem_memory() and its callsite
> * export variable pm_transition in kernel
> * skip resume in thaw() for successful hibernation case
> v3:
> * refined ttm_device_prepare_hibernation() to accept device argument
> * use guard(mutex) to replace mutex_lock and mutex_unlock
> * move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
> * add pm_transition_event(), instead of exporting pm_transition variable
> * refined amdgpu_pmops_thaw(), use switch-case for more clarity
> 
> Samuel Zhang (5):
> 1. drm/ttm: add ttm_device_prepare_hibernation() api
> 2. drm/amdgpu: move GTT to shmem after eviction for hibernation
> 3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
> 4. PM: hibernate: add new api pm_transition_event()
> 5. drm/amdgpu: do not resume device in thaw for normal hibernation
> 
>   drivers/base/power/main.c                  |  5 +++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 15 ++++++++++++-
>   drivers/gpu/drm/ttm/ttm_device.c           | 23 +++++++++++++++++++
>   include/drm/ttm/ttm_device.h               |  1 +
>   include/linux/pm.h                         | 16 +++++++++++++
>   kernel/power/hibernate.c                   | 26 ++++++++++++++++++++++
>   7 files changed, 94 insertions(+), 2 deletions(-)
> 

As there isn't a mention of intent on how to merge when this is fully 
reviewed, I wanted to ask Rafael what he thinks.

The options are either:
* the linux-pm changes go to linux-pm and an immutable branch for drm to 
merge
* everything goes through amd-staging-drm-next (and an amdgpu PR to drm 
later)
* everything goes through drm-misc-next

I think everything through drm-misc-next makes most sense if everyone is 
amenable.

     prev parent reply	other threads:[~2025-07-08 14:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-08  8:38   ` Christian König
2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
2025-07-08  8:41   ` Christian König
2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
2025-07-08 14:28   ` Mario Limonciello
2025-07-08 14:33     ` Rafael J. Wysocki
2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
2025-07-08 14:36   ` Mario Limonciello
2025-07-08 14:39     ` Rafael J. Wysocki
2025-07-08 16:07     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-08 10:28   ` Lazar, Lijo
2025-07-08 14:40   ` Mario Limonciello
2025-07-08 16:08     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08 14:31 ` Mario Limonciello [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6758cecc-d324-4ed5-b38e-2a4384a34d60@amd.com \
    --to=mario.limonciello@amd.com \
    --cc=Owen.Zhang2@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=guoqing.zhang@amd.com \
    --cc=haijun.chang@amd.com \
    --cc=len.brown@intel.com \
    --cc=lijo.lazar@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=pavel@kernel.org \
    --cc=rafael@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    --cc=victor.zhao@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox