* [PATCH v6 0/5] reduce system memory requirement for hibernation
@ 2025-07-10 6:23 Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
Modern data center dGPUs are usually equipped with very large VRAM. On
server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
will fail due to no enough free memory.
The root cause is that during hibernation all VRAM memory get evicted to
GTT or shmem. In both case, it is in system memory and kernel will try to
copy the pages to hibernation image. In the worst case, this causes 2
copies of VRAM memory in system memory, 2TB is not enough for the
hibernation image. 192GB * 8 * 2 = 3TB > 2TB.
The fix includes following changes. With these changes, there's much less
pages needed to be copied to hibernate image and hibernation can succeed.
* patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT
pages can be freed.
* patch 3: force write shmem pages to swap disk and free shmem pages.
After swapout GTT to shmem in hibernation prepare stage, the GPU will be
resumed again in thaw stage. The swapin and restore BOs of resume takes
lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
writing hibernation image do not need GPU for hibernate successful case.
* patch 4 and 5: skip resume of device in thaw stage for successful
hibernation case to reduce the hibernation time.
v2:
* split first patch to 2 patches, 1 for ttm, 1 for amdgpu
* refined the new ttm api
* add more comments for shrink_shmem_memory() and its callsite
* export variable pm_transition in kernel
* skip resume in thaw() for successful hibernation case
v3:
* refined ttm_device_prepare_hibernation() to accept device argument
* use guard(mutex) to replace mutex_lock and mutex_unlock
* move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
* add pm_transition_event(), instead of exporting pm_transition variable
* refined amdgpu_pmops_thaw(), use switch-case for more clarity
v4:
* remove guard(mutex) and fix kdoc for ttm_device_prepare_hibernation
* refined kdoc for pm_transition_event() and PM_EVENT_ messages
* use dev_err in amdgpu_pmops_thaw()
* add Reviewed-by and Acked-by for patch 2 3 and 5
v5:
* add Reviewed-by for patch 1
* use pm_hibernate_is_recovering() to replace pm_transition_event()
* check in_suspend in amdgpu_pmops_prepare() and amdgpu_pmops_poweroff()
v6:
* move pm_hibernate_is_recovering() from pm.h to suspend.h
* rebase to next-20250709 tag of linux-next
* add Tested-by for patch 5
The merge options are either:
* the linux-pm changes go to linux-pm and an immutable branch for drm to
merge
* everything goes through amd-staging-drm-next (and an amdgpu PR to drm
later)
* everything goes through drm-misc-next
Mario Limonciello think everything through drm-misc-next makes most sense
if everyone is amenable.
Samuel Zhang (5):
1. drm/ttm: add new api ttm_device_prepare_hibernation()
2. drm/amdgpu: move GTT to shmem after eviction for hibernation
3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
4. PM: hibernate: add new api pm_hibernate_is_recovering()
5. drm/amdgpu: do not resume device in thaw for normal hibernation
drivers/base/power/main.c | 14 ++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 ++++++++++++++
drivers/gpu/drm/ttm/ttm_device.c | 23 +++++++++++++++++++
include/drm/ttm/ttm_device.h | 1 +
include/linux/suspend.h | 2 ++
kernel/power/hibernate.c | 26 ++++++++++++++++++++++
7 files changed, 92 insertions(+), 1 deletion(-)
--
2.43.5
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
@ 2025-07-10 6:23 ` Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
This new api is used for hibernation to move GTT BOs to shmem after
VRAM eviction. shmem will be flushed to swap disk later to reduce
the system memory usage for hibernation.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/ttm/ttm_device.c | 22 ++++++++++++++++++++++
include/drm/ttm/ttm_device.h | 1 +
2 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 816e2cba6016..c3e2fcbdd2cc 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -125,6 +125,28 @@ static int ttm_global_init(void)
return ret;
}
+/**
+ * ttm_device_prepare_hibernation - move GTT BOs to shmem for hibernation.
+ *
+ * @bdev: A pointer to a struct ttm_device to prepare hibernation for.
+ *
+ * Return: 0 on success, negative number on failure.
+ */
+int ttm_device_prepare_hibernation(struct ttm_device *bdev)
+{
+ struct ttm_operation_ctx ctx = {
+ .interruptible = false,
+ .no_wait_gpu = false,
+ };
+ int ret;
+
+ do {
+ ret = ttm_device_swapout(bdev, &ctx, GFP_KERNEL);
+ } while (ret > 0);
+ return ret;
+}
+EXPORT_SYMBOL(ttm_device_prepare_hibernation);
+
/*
* A buffer object shrink method that tries to swap out the first
* buffer object on the global::swap_lru list.
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 39b8636b1845..592b5f802859 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -272,6 +272,7 @@ struct ttm_device {
int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
gfp_t gfp_flags);
+int ttm_device_prepare_hibernation(struct ttm_device *bdev);
static inline struct ttm_resource_manager *
ttm_manager_type(struct ttm_device *bdev, int mem_type)
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
@ 2025-07-10 6:23 ` Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
When hibernate with data center dGPUs, huge number of VRAM BOs evicted
to GTT and takes too much system memory. This will cause hibernation
fail due to insufficient memory for creating the hibernation image.
Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
hibernation code to make room for hibernation image.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 45b44dec0d7f..f72c353bdbac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5021,8 +5021,16 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
return 0;
ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
- if (ret)
+ if (ret) {
dev_warn(adev->dev, "evicting device resources failed\n");
+ return ret;
+ }
+
+ if (adev->in_s4) {
+ ret = ttm_device_prepare_hibernation(&adev->mman.bdev);
+ if (ret)
+ dev_err(adev->dev, "prepare hibernation failed, %d\n", ret);
+ }
return ret;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
@ 2025-07-10 6:23 ` Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering() Samuel Zhang
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
When hibernate with data center dGPUs, huge number of VRAM data will be
moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
of system memory so that there's no enough free memory for creating the
hibernation image. This will cause hibernation fail and abort.
After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
pages to swap disk and reclaim the pages, so that there's enough system
memory for hibernation image and less pages needed to copy to the image.
This patch can only flush and free about half shmem pages. It will be
better to flush and free more pages, even all of shmem pages, so that
there're less pages to be copied to the hibernation image and the overall
hibernation time can be reduced.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
---
kernel/power/hibernate.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 9216e3b91d3b..1f1f30cca573 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -381,6 +381,23 @@ static int create_image(int platform_mode)
return error;
}
+static void shrink_shmem_memory(void)
+{
+ struct sysinfo info;
+ unsigned long nr_shmem_pages, nr_freed_pages;
+
+ si_meminfo(&info);
+ nr_shmem_pages = info.sharedram; /* current page count used for shmem */
+ /*
+ * The intent is to reclaim all shmem pages. Though shrink_all_memory() can
+ * only reclaim about half of them, it's enough for creating the hibernation
+ * image.
+ */
+ nr_freed_pages = shrink_all_memory(nr_shmem_pages);
+ pr_debug("requested to reclaim %lu shmem pages, actually freed %lu pages\n",
+ nr_shmem_pages, nr_freed_pages);
+}
+
/**
* hibernation_snapshot - Quiesce devices and create a hibernation image.
* @platform_mode: If set, use platform driver to prepare for the transition.
@@ -422,6 +439,15 @@ int hibernation_snapshot(int platform_mode)
goto Thaw;
}
+ /*
+ * Device drivers may move lots of data to shmem in dpm_prepare(). The shmem
+ * pages will use lots of system memory, causing hibernation image creation
+ * fail due to insufficient free memory.
+ * This call is to force flush the shmem pages to swap disk and reclaim
+ * the system memory so that image creation can succeed.
+ */
+ shrink_shmem_memory();
+
console_suspend_all();
error = dpm_suspend(PMSG_FREEZE);
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering()
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
` (2 preceding siblings ...)
2025-07-10 6:23 ` [PATCH v6 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
@ 2025-07-10 6:23 ` Samuel Zhang
2025-07-10 8:21 ` Rafael J. Wysocki
2025-07-10 6:23 ` [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-10 15:57 ` [PATCH v6 0/5] reduce system memory requirement for hibernation Mario Limonciello
5 siblings, 1 reply; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
dev_pm_ops.thaw() is called in following cases:
* normal case: after hibernation image has been created.
* error case 1: creation of a hibernation image has failed.
* error case 2: restoration from a hibernation image has failed.
For normal case, it is called mainly for resume storage devices for
saving the hibernation image. Other devices that are not involved
in the image saving do not need to resume the device. But since there's
no api to know which case thaw() is called, device drivers can't
conditionally resume device in thaw().
The new pm_hibernate_is_recovering() is such a api to query if thaw() is
called in normal case.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
drivers/base/power/main.c | 14 ++++++++++++++
include/linux/suspend.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 24ebe7a635a7..c4817b379230 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -66,6 +66,20 @@ static pm_message_t pm_transition;
static DEFINE_MUTEX(async_wip_mtx);
static int async_error;
+/**
+ * pm_hibernate_is_recovering - if recovering from hibernate due to error.
+ *
+ * Used to query if dev_pm_ops.thaw() is called for normal hibernation case or
+ * recovering from some error.
+ *
+ * Return: true for error case, false for normal case.
+ */
+bool pm_hibernate_is_recovering(void)
+{
+ return pm_transition.event == PM_EVENT_RECOVER;
+}
+EXPORT_SYMBOL_GPL(pm_hibernate_is_recovering);
+
static const char *pm_verb(int event)
{
switch (event) {
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 6a3f92098872..d11a124b7a91 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -426,6 +426,8 @@ int is_hibernate_resume_dev(dev_t dev);
static inline int is_hibernate_resume_dev(dev_t dev) { return 0; }
#endif
+bool pm_hibernate_is_recovering(void);
+
/* Hibernation and suspend events */
#define PM_HIBERNATION_PREPARE 0x0001 /* Going to hibernate */
#define PM_POST_HIBERNATION 0x0002 /* Hibernation finished */
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
` (3 preceding siblings ...)
2025-07-10 6:23 ` [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering() Samuel Zhang
@ 2025-07-10 6:23 ` Samuel Zhang
2025-07-10 12:13 ` Mario Limonciello
2025-07-10 13:22 ` Lazar, Lijo
2025-07-10 15:57 ` [PATCH v6 0/5] reduce system memory requirement for hibernation Mario Limonciello
5 siblings, 2 replies; 12+ messages in thread
From: Samuel Zhang @ 2025-07-10 6:23 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann
Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
Samuel Zhang
For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.
On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1c54b2e5a225..021defca9b61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
if (amdgpu_ras_intr_triggered())
return;
+ /* device maybe not resumed here, return immediately in this case */
+ if (adev->in_s4 && adev->in_suspend)
+ return;
+
/* if we are running in a VM, make sure the device
* torn down properly on reboot/shutdown.
* unfortunately we can't detect certain
@@ -2557,6 +2561,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
+ /* device maybe not resumed here, return immediately in this case */
+ if (adev->in_s4 && adev->in_suspend)
+ return 0;
+
/* Return a positive number here so
* DPM_FLAG_SMART_SUSPEND works properly
*/
@@ -2655,12 +2663,21 @@ static int amdgpu_pmops_thaw(struct device *dev)
{
struct drm_device *drm_dev = dev_get_drvdata(dev);
+ /* do not resume device if it's normal hibernation */
+ if (!pm_hibernate_is_recovering())
+ return 0;
+
return amdgpu_device_resume(drm_dev, true);
}
static int amdgpu_pmops_poweroff(struct device *dev)
{
struct drm_device *drm_dev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(drm_dev);
+
+ /* device maybe not resumed here, return immediately in this case */
+ if (adev->in_s4 && adev->in_suspend)
+ return 0;
return amdgpu_device_suspend(drm_dev, true);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering()
2025-07-10 6:23 ` [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering() Samuel Zhang
@ 2025-07-10 8:21 ` Rafael J. Wysocki
0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2025-07-10 8:21 UTC (permalink / raw)
To: Samuel Zhang
Cc: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
matthew.brost, maarten.lankhorst, mripard, tzimmermann,
mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel
On Thu, Jul 10, 2025 at 8:23 AM Samuel Zhang <guoqing.zhang@amd.com> wrote:
>
> dev_pm_ops.thaw() is called in following cases:
> * normal case: after hibernation image has been created.
> * error case 1: creation of a hibernation image has failed.
> * error case 2: restoration from a hibernation image has failed.
>
> For normal case, it is called mainly for resume storage devices for
> saving the hibernation image. Other devices that are not involved
> in the image saving do not need to resume the device. But since there's
> no api to know which case thaw() is called, device drivers can't
> conditionally resume device in thaw().
>
> The new pm_hibernate_is_recovering() is such a api to query if thaw() is
> called in normal case.
>
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
LGTM now, so
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> drivers/base/power/main.c | 14 ++++++++++++++
> include/linux/suspend.h | 2 ++
> 2 files changed, 16 insertions(+)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 24ebe7a635a7..c4817b379230 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -66,6 +66,20 @@ static pm_message_t pm_transition;
> static DEFINE_MUTEX(async_wip_mtx);
> static int async_error;
>
> +/**
> + * pm_hibernate_is_recovering - if recovering from hibernate due to error.
> + *
> + * Used to query if dev_pm_ops.thaw() is called for normal hibernation case or
> + * recovering from some error.
> + *
> + * Return: true for error case, false for normal case.
> + */
> +bool pm_hibernate_is_recovering(void)
> +{
> + return pm_transition.event == PM_EVENT_RECOVER;
> +}
> +EXPORT_SYMBOL_GPL(pm_hibernate_is_recovering);
> +
> static const char *pm_verb(int event)
> {
> switch (event) {
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 6a3f92098872..d11a124b7a91 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -426,6 +426,8 @@ int is_hibernate_resume_dev(dev_t dev);
> static inline int is_hibernate_resume_dev(dev_t dev) { return 0; }
> #endif
>
> +bool pm_hibernate_is_recovering(void);
> +
> /* Hibernation and suspend events */
> #define PM_HIBERNATION_PREPARE 0x0001 /* Going to hibernate */
> #define PM_POST_HIBERNATION 0x0002 /* Hibernation finished */
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
2025-07-10 6:23 ` [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
@ 2025-07-10 12:13 ` Mario Limonciello
2025-07-10 12:20 ` Christian König
2025-07-10 13:22 ` Lazar, Lijo
1 sibling, 1 reply; 12+ messages in thread
From: Mario Limonciello @ 2025-07-10 12:13 UTC (permalink / raw)
To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
matthew.auld, matthew.brost, maarten.lankhorst, mripard,
tzimmermann
Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
linux-pm, linux-kernel, amd-gfx, dri-devel
On 7/10/2025 2:23 AM, Samuel Zhang wrote:
> For normal hibernation, GPU do not need to be resumed in thaw since it is
> not involved in writing the hibernation image. Skip resume in this case
> can reduce the hibernation time.
>
> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
> this can save 50 minutes.
>
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1c54b2e5a225..021defca9b61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
> if (amdgpu_ras_intr_triggered())
> return;
>
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return;
> +
> /* if we are running in a VM, make sure the device
> * torn down properly on reboot/shutdown.
> * unfortunately we can't detect certain
> @@ -2557,6 +2561,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return 0;
> +
> /* Return a positive number here so
> * DPM_FLAG_SMART_SUSPEND works properly
> */
> @@ -2655,12 +2663,21 @@ static int amdgpu_pmops_thaw(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
>
> + /* do not resume device if it's normal hibernation */
> + if (!pm_hibernate_is_recovering())
> + return 0;
> +
> return amdgpu_device_resume(drm_dev, true);
> }
>
> static int amdgpu_pmops_poweroff(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
> +
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return 0;
>
> return amdgpu_device_suspend(drm_dev, true);
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
2025-07-10 12:13 ` Mario Limonciello
@ 2025-07-10 12:20 ` Christian König
2025-07-10 12:22 ` Mario Limonciello
0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2025-07-10 12:20 UTC (permalink / raw)
To: Mario Limonciello, Samuel Zhang, alexander.deucher, rafael,
len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
matthew.auld, matthew.brost, maarten.lankhorst, mripard,
tzimmermann
Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
linux-pm, linux-kernel, amd-gfx, dri-devel
On 10.07.25 14:13, Mario Limonciello wrote:
> On 7/10/2025 2:23 AM, Samuel Zhang wrote:
>> For normal hibernation, GPU do not need to be resumed in thaw since it is
>> not involved in writing the hibernation image. Skip resume in this case
>> can reduce the hibernation time.
>>
>> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
>> this can save 50 minutes.
>>
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
I think we now have reviews and acks for all patches, don't we?
What was the conclusion on how this should go upstream? Through drm-misc-next?
I've seen that you asked Mario, but I think I missed the response.
Regards,
Christian.
>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 1c54b2e5a225..021defca9b61 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>> if (amdgpu_ras_intr_triggered())
>> return;
>> + /* device maybe not resumed here, return immediately in this case */
>> + if (adev->in_s4 && adev->in_suspend)
>> + return;
>> +
>> /* if we are running in a VM, make sure the device
>> * torn down properly on reboot/shutdown.
>> * unfortunately we can't detect certain
>> @@ -2557,6 +2561,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>> struct amdgpu_device *adev = drm_to_adev(drm_dev);
>> + /* device maybe not resumed here, return immediately in this case */
>> + if (adev->in_s4 && adev->in_suspend)
>> + return 0;
>> +
>> /* Return a positive number here so
>> * DPM_FLAG_SMART_SUSPEND works properly
>> */
>> @@ -2655,12 +2663,21 @@ static int amdgpu_pmops_thaw(struct device *dev)
>> {
>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>> + /* do not resume device if it's normal hibernation */
>> + if (!pm_hibernate_is_recovering())
>> + return 0;
>> +
>> return amdgpu_device_resume(drm_dev, true);
>> }
>> static int amdgpu_pmops_poweroff(struct device *dev)
>> {
>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
>> +
>> + /* device maybe not resumed here, return immediately in this case */
>> + if (adev->in_s4 && adev->in_suspend)
>> + return 0;
>> return amdgpu_device_suspend(drm_dev, true);
>> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
2025-07-10 12:20 ` Christian König
@ 2025-07-10 12:22 ` Mario Limonciello
0 siblings, 0 replies; 12+ messages in thread
From: Mario Limonciello @ 2025-07-10 12:22 UTC (permalink / raw)
To: Christian König, Samuel Zhang, alexander.deucher, rafael,
len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
matthew.auld, matthew.brost, maarten.lankhorst, mripard,
tzimmermann
Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
linux-pm, linux-kernel, amd-gfx, dri-devel
On 7/10/2025 8:20 AM, Christian König wrote:
> On 10.07.25 14:13, Mario Limonciello wrote:
>> On 7/10/2025 2:23 AM, Samuel Zhang wrote:
>>> For normal hibernation, GPU do not need to be resumed in thaw since it is
>>> not involved in writing the hibernation image. Skip resume in this case
>>> can reduce the hibernation time.
>>>
>>> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
>>> this can save 50 minutes.
>>>
>>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
>
> I think we now have reviews and acks for all patches, don't we?
Yeah we do.
>
> What was the conclusion on how this should go upstream? Through drm-misc-next?
>
> I've seen that you asked Mario, but I think I missed the response.
>
It hasn't been discussed, but we have Acks on all of the non-drm patches
so it seems most reasonable to me.
I'll give it a few hours today and If there is no opposition I'll commit
it there later today.
> Regards,
> Christian.
>
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index 1c54b2e5a225..021defca9b61 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>>> if (amdgpu_ras_intr_triggered())
>>> return;
>>> + /* device maybe not resumed here, return immediately in this case */
>>> + if (adev->in_s4 && adev->in_suspend)
>>> + return;
>>> +
>>> /* if we are running in a VM, make sure the device
>>> * torn down properly on reboot/shutdown.
>>> * unfortunately we can't detect certain
>>> @@ -2557,6 +2561,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
>>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>> + /* device maybe not resumed here, return immediately in this case */
>>> + if (adev->in_s4 && adev->in_suspend)
>>> + return 0;
>>> +
>>> /* Return a positive number here so
>>> * DPM_FLAG_SMART_SUSPEND works properly
>>> */
>>> @@ -2655,12 +2663,21 @@ static int amdgpu_pmops_thaw(struct device *dev)
>>> {
>>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> + /* do not resume device if it's normal hibernation */
>>> + if (!pm_hibernate_is_recovering())
>>> + return 0;
>>> +
>>> return amdgpu_device_resume(drm_dev, true);
>>> }
>>> static int amdgpu_pmops_poweroff(struct device *dev)
>>> {
>>> struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>> +
>>> + /* device maybe not resumed here, return immediately in this case */
>>> + if (adev->in_s4 && adev->in_suspend)
>>> + return 0;
>>> return amdgpu_device_suspend(drm_dev, true);
>>> }
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
2025-07-10 6:23 ` [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-10 12:13 ` Mario Limonciello
@ 2025-07-10 13:22 ` Lazar, Lijo
1 sibling, 0 replies; 12+ messages in thread
From: Lazar, Lijo @ 2025-07-10 13:22 UTC (permalink / raw)
To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
matthew.auld, matthew.brost, maarten.lankhorst, mripard,
tzimmermann
Cc: mario.limonciello, victor.zhao, haijun.chang, Qing.Ma,
Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel
On 7/10/2025 11:53 AM, Samuel Zhang wrote:
> For normal hibernation, GPU do not need to be resumed in thaw since it is
> not involved in writing the hibernation image. Skip resume in this case
> can reduce the hibernation time.
>
> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
> this can save 50 minutes.
>
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Thanks,
Lijo
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1c54b2e5a225..021defca9b61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
> if (amdgpu_ras_intr_triggered())
> return;
>
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return;
> +
> /* if we are running in a VM, make sure the device
> * torn down properly on reboot/shutdown.
> * unfortunately we can't detect certain
> @@ -2557,6 +2561,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return 0;
> +
> /* Return a positive number here so
> * DPM_FLAG_SMART_SUSPEND works properly
> */
> @@ -2655,12 +2663,21 @@ static int amdgpu_pmops_thaw(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
>
> + /* do not resume device if it's normal hibernation */
> + if (!pm_hibernate_is_recovering())
> + return 0;
> +
> return amdgpu_device_resume(drm_dev, true);
> }
>
> static int amdgpu_pmops_poweroff(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
> +
> + /* device maybe not resumed here, return immediately in this case */
> + if (adev->in_s4 && adev->in_suspend)
> + return 0;
>
> return amdgpu_device_suspend(drm_dev, true);
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/5] reduce system memory requirement for hibernation
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
` (4 preceding siblings ...)
2025-07-10 6:23 ` [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
@ 2025-07-10 15:57 ` Mario Limonciello
5 siblings, 0 replies; 12+ messages in thread
From: Mario Limonciello @ 2025-07-10 15:57 UTC (permalink / raw)
To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
matthew.auld, matthew.brost, maarten.lankhorst, mripard,
tzimmermann
Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
linux-pm, linux-kernel, amd-gfx, dri-devel
On 7/10/2025 2:23 AM, Samuel Zhang wrote:
>
> Modern data center dGPUs are usually equipped with very large VRAM. On
> server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
> will fail due to no enough free memory.
>
> The root cause is that during hibernation all VRAM memory get evicted to
> GTT or shmem. In both case, it is in system memory and kernel will try to
> copy the pages to hibernation image. In the worst case, this causes 2
> copies of VRAM memory in system memory, 2TB is not enough for the
> hibernation image. 192GB * 8 * 2 = 3TB > 2TB.
>
> The fix includes following changes. With these changes, there's much less
> pages needed to be copied to hibernate image and hibernation can succeed.
> * patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT
> pages can be freed.
> * patch 3: force write shmem pages to swap disk and free shmem pages.
>
>
> After swapout GTT to shmem in hibernation prepare stage, the GPU will be
> resumed again in thaw stage. The swapin and restore BOs of resume takes
> lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
> writing hibernation image do not need GPU for hibernate successful case.
> * patch 4 and 5: skip resume of device in thaw stage for successful
> hibernation case to reduce the hibernation time.
>
>
> v2:
> * split first patch to 2 patches, 1 for ttm, 1 for amdgpu
> * refined the new ttm api
> * add more comments for shrink_shmem_memory() and its callsite
> * export variable pm_transition in kernel
> * skip resume in thaw() for successful hibernation case
> v3:
> * refined ttm_device_prepare_hibernation() to accept device argument
> * use guard(mutex) to replace mutex_lock and mutex_unlock
> * move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
> * add pm_transition_event(), instead of exporting pm_transition variable
> * refined amdgpu_pmops_thaw(), use switch-case for more clarity
> v4:
> * remove guard(mutex) and fix kdoc for ttm_device_prepare_hibernation
> * refined kdoc for pm_transition_event() and PM_EVENT_ messages
> * use dev_err in amdgpu_pmops_thaw()
> * add Reviewed-by and Acked-by for patch 2 3 and 5
> v5:
> * add Reviewed-by for patch 1
> * use pm_hibernate_is_recovering() to replace pm_transition_event()
> * check in_suspend in amdgpu_pmops_prepare() and amdgpu_pmops_poweroff()
> v6:
> * move pm_hibernate_is_recovering() from pm.h to suspend.h
> * rebase to next-20250709 tag of linux-next
> * add Tested-by for patch 5
>
>
> The merge options are either:
> * the linux-pm changes go to linux-pm and an immutable branch for drm to
> merge
> * everything goes through amd-staging-drm-next (and an amdgpu PR to drm
> later)
> * everything goes through drm-misc-next
>
> Mario Limonciello think everything through drm-misc-next makes most sense
> if everyone is amenable.
>
Applied, thanks.
530694f54dd5e (HEAD -> drm-misc-next, drm-misc/for-linux-next,
drm-misc/drm-misc-next) drm/amdgpu: do not resume device in thaw for
normal hibernation
c2aaddbd2dede PM: hibernate: add new api pm_hibernate_is_recovering()
2640e819474f4 PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
924dda024f3be drm/amdgpu: move GTT to shmem after eviction for hibernation
40b6a946d21ee drm/ttm: add new api ttm_device_prepare_hibernation()
>
> Samuel Zhang (5):
> 1. drm/ttm: add new api ttm_device_prepare_hibernation()
> 2. drm/amdgpu: move GTT to shmem after eviction for hibernation
> 3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
> 4. PM: hibernate: add new api pm_hibernate_is_recovering()
> 5. drm/amdgpu: do not resume device in thaw for normal hibernation
>
> drivers/base/power/main.c | 14 ++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 17 ++++++++++++++
> drivers/gpu/drm/ttm/ttm_device.c | 23 +++++++++++++++++++
> include/drm/ttm/ttm_device.h | 1 +
> include/linux/suspend.h | 2 ++
> kernel/power/hibernate.c | 26 ++++++++++++++++++++++
> 7 files changed, 92 insertions(+), 1 deletion(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-07-10 15:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 6:23 [PATCH v6 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
2025-07-10 6:23 ` [PATCH v6 4/5] PM: hibernate: add new api pm_hibernate_is_recovering() Samuel Zhang
2025-07-10 8:21 ` Rafael J. Wysocki
2025-07-10 6:23 ` [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-10 12:13 ` Mario Limonciello
2025-07-10 12:20 ` Christian König
2025-07-10 12:22 ` Mario Limonciello
2025-07-10 13:22 ` Lazar, Lijo
2025-07-10 15:57 ` [PATCH v6 0/5] reduce system memory requirement for hibernation Mario Limonciello
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).