Linux Power Management development
 help / color / mirror / Atom feed
From: Samuel Zhang <guoqing.zhang@amd.com>
To: <alexander.deucher@amd.com>, <christian.koenig@amd.com>,
	<rafael@kernel.org>, <len.brown@intel.com>, <pavel@kernel.org>,
	<gregkh@linuxfoundation.org>, <dakr@kernel.org>,
	<airlied@gmail.com>, <simona@ffwll.ch>, <ray.huang@amd.com>,
	<matthew.auld@intel.com>, <matthew.brost@intel.com>,
	<maarten.lankhorst@linux.intel.com>, <mripard@kernel.org>,
	<tzimmermann@suse.de>
Cc: <mario.limonciello@amd.com>, <lijo.lazar@amd.com>,
	<victor.zhao@amd.com>, <haijun.chang@amd.com>, <Qing.Ma@amd.com>,
	<Owen.Zhang2@amd.com>, <linux-pm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <amd-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>,
	"Samuel Zhang" <guoqing.zhang@amd.com>
Subject: [PATCH v3 0/5] reduce system memory requirement for hibernation
Date: Tue, 8 Jul 2025 15:42:43 +0800	[thread overview]
Message-ID: <20250708074248.1674924-1-guoqing.zhang@amd.com> (raw)

Modern data center dGPUs are usually equipped with very large VRAM. On
server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
will fail due to no enough free memory.

The root cause is that during hibernation all VRAM memory get evicted to
GTT or shmem. In both case, it is in system memory and kernel will try to 
copy the pages to hibernation image. In the worst case, this causes 2 
copies of VRAM memory in system memory, 2TB is not enough for the 
hibernation image. 192GB * 8 * 2 = 3TB > 2TB.

The fix includes following changes. With these changes, there's much less
pages needed to be copied to hibernate image and hibernation can succeed.
* patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT 
  pages can be freed.
* patch 3: force write shmem pages to swap disk and free shmem pages.

After swapout GTT to shmem in hibernation prepare stage, the GPU will be
resumed again in thaw stage. The swapin and restore BOs of resume takes
lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
writing hibernation image do not need GPU for hibernate successful case.
* patch 4 and 5: skip resume of device in thaw stage for successful
  hibernation case to reduce the hibernation time.

v2:
* split first patch to 2 patches, 1 for ttm, 1 for amdgpu
* refined the new ttm api
* add more comments for shrink_shmem_memory() and its callsite
* export variable pm_transition in kernel
* skip resume in thaw() for successful hibernation case
v3:
* refined ttm_device_prepare_hibernation() to accept device argument
* use guard(mutex) to replace mutex_lock and mutex_unlock
* move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
* add pm_transition_event(), instead of exporting pm_transition variable
* refined amdgpu_pmops_thaw(), use switch-case for more clarity

Samuel Zhang (5):
1. drm/ttm: add ttm_device_prepare_hibernation() api
2. drm/amdgpu: move GTT to shmem after eviction for hibernation
3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
4. PM: hibernate: add new api pm_transition_event()
5. drm/amdgpu: do not resume device in thaw for normal hibernation

 drivers/base/power/main.c                  |  5 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 15 ++++++++++++-
 drivers/gpu/drm/ttm/ttm_device.c           | 23 +++++++++++++++++++
 include/drm/ttm/ttm_device.h               |  1 +
 include/linux/pm.h                         | 16 +++++++++++++
 kernel/power/hibernate.c                   | 26 ++++++++++++++++++++++
 7 files changed, 94 insertions(+), 2 deletions(-)

-- 
2.43.5


             reply	other threads:[~2025-07-08  7:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08  7:42 Samuel Zhang [this message]
2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-08  8:38   ` Christian König
2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
2025-07-08  8:41   ` Christian König
2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
2025-07-08 14:28   ` Mario Limonciello
2025-07-08 14:33     ` Rafael J. Wysocki
2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
2025-07-08 14:36   ` Mario Limonciello
2025-07-08 14:39     ` Rafael J. Wysocki
2025-07-08 16:07     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-08 10:28   ` Lazar, Lijo
2025-07-08 14:40   ` Mario Limonciello
2025-07-08 16:08     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08 14:31 ` [PATCH v3 0/5] reduce system memory requirement for hibernation Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250708074248.1674924-1-guoqing.zhang@amd.com \
    --to=guoqing.zhang@amd.com \
    --cc=Owen.Zhang2@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=haijun.chang@amd.com \
    --cc=len.brown@intel.com \
    --cc=lijo.lazar@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mario.limonciello@amd.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=pavel@kernel.org \
    --cc=rafael@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    --cc=victor.zhao@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox