[PATCH v4 0/5] reduce system memory requirement for hibernation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/5] reduce system memory requirement for hibernation
@ 2025-07-09  6:43 Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:43 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

Modern data center dGPUs are usually equipped with very large VRAM. On
server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
will fail due to no enough free memory.

The root cause is that during hibernation all VRAM memory get evicted to
GTT or shmem. In both case, it is in system memory and kernel will try to 
copy the pages to hibernation image. In the worst case, this causes 2 
copies of VRAM memory in system memory, 2TB is not enough for the 
hibernation image. 192GB * 8 * 2 = 3TB > 2TB.

The fix includes following changes. With these changes, there's much less
pages needed to be copied to hibernate image and hibernation can succeed.
* patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT 
  pages can be freed.
* patch 3: force write shmem pages to swap disk and free shmem pages.

After swapout GTT to shmem in hibernation prepare stage, the GPU will be
resumed again in thaw stage. The swapin and restore BOs of resume takes
lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
writing hibernation image do not need GPU for hibernate successful case.
* patch 4 and 5: skip resume of device in thaw stage for successful
  hibernation case to reduce the hibernation time.

v2:
* split first patch to 2 patches, 1 for ttm, 1 for amdgpu
* refined the new ttm api
* add more comments for shrink_shmem_memory() and its callsite
* export variable pm_transition in kernel
* skip resume in thaw() for successful hibernation case
v3:
* refined ttm_device_prepare_hibernation() to accept device argument
* use guard(mutex) to replace mutex_lock and mutex_unlock
* move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
* add pm_transition_event(), instead of exporting pm_transition variable
* refined amdgpu_pmops_thaw(), use switch-case for more clarity
v4:
* remove guard(mutex) and fix kdoc for ttm_device_prepare_hibernation
* refined kdoc for pm_transition_event() and PM_EVENT_ messages
* use dev_err in amdgpu_pmops_thaw()
* add Reviewed-by and Acked-by for patch 2 3 and 5

The merge options are either:
* the linux-pm changes go to linux-pm and an immutable branch for drm to 
merge
* everything goes through amd-staging-drm-next (and an amdgpu PR to drm 
later)
* everything goes through drm-misc-next

Mario Limonciello think everything through drm-misc-next makes most sense
if everyone is amenable.

Samuel Zhang (5):
1. drm/ttm: add new api ttm_device_prepare_hibernation()
2. drm/amdgpu: move GTT to shmem after eviction for hibernation
3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
4. PM: hibernate: add new api pm_transition_event()
5. drm/amdgpu: do not resume device in thaw for normal hibernation

 drivers/base/power/main.c                  |  5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 15 +++-
 drivers/gpu/drm/ttm/ttm_device.c           | 23 ++++++
 include/drm/ttm/ttm_device.h               |  1 +
 include/linux/pm.h                         | 87 +++++++++++++---------
 kernel/power/hibernate.c                   | 26 +++++++
 7 files changed, 130 insertions(+), 37 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()
  2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
@ 2025-07-09  6:44 ` Samuel Zhang
  2025-07-09  9:13   ` Christian König
  2025-07-09  6:44 ` [PATCH v4 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:44 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

This new api is used for hibernation to move GTT BOs to shmem after
VRAM eviction. shmem will be flushed to swap disk later to reduce
the system memory usage for hibernation.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 23 +++++++++++++++++++++++
 include/drm/ttm/ttm_device.h     |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 02e797fd1891..9c9ab1903919 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -123,6 +123,29 @@ static int ttm_global_init(void)
 	return ret;
 }
 
+/**
+ * ttm_device_prepare_hibernation - move GTT BOs to shmem for hibernation.
+ *
+ * @bdev: A pointer to a struct ttm_device to prepare hibernation for.
+ *
+ * Return: 0 on success, negative number on failure.
+ */
+int ttm_device_prepare_hibernation(struct ttm_device *bdev)
+{
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = false,
+		.force_alloc = true
+	};
+	int ret;
+
+	do {
+		ret = ttm_device_swapout(bdev, &ctx, GFP_KERNEL);
+	} while (ret > 0);
+	return ret;
+}
+EXPORT_SYMBOL(ttm_device_prepare_hibernation);
+
 /*
  * A buffer object shrink method that tries to swap out the first
  * buffer object on the global::swap_lru list.
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 39b8636b1845..592b5f802859 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -272,6 +272,7 @@ struct ttm_device {
 int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
 int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 		       gfp_t gfp_flags);
+int ttm_device_prepare_hibernation(struct ttm_device *bdev);
 
 static inline struct ttm_resource_manager *
 ttm_manager_type(struct ttm_device *bdev, int mem_type)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation
  2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
@ 2025-07-09  6:44 ` Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:44 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

When hibernate with data center dGPUs, huge number of VRAM BOs evicted
to GTT and takes too much system memory. This will cause hibernation
fail due to insufficient memory for creating the hibernation image.

Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
hibernation code to make room for hibernation image.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 684d66bc0b5f..2f977fece08f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5021,8 +5021,16 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
 		return 0;
 
 	ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
-	if (ret)
+	if (ret) {
 		dev_warn(adev->dev, "evicting device resources failed\n");
+		return ret;
+	}
+
+	if (adev->in_s4) {
+		ret = ttm_device_prepare_hibernation(&adev->mman.bdev);
+		if (ret)
+			dev_err(adev->dev, "prepare hibernation failed, %d\n", ret);
+	}
 	return ret;
 }
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
  2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
@ 2025-07-09  6:44 ` Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
  2025-07-09  6:44 ` [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
  4 siblings, 0 replies; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:44 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

When hibernate with data center dGPUs, huge number of VRAM data will be
moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
of system memory so that there's no enough free memory for creating the
hibernation image. This will cause hibernation fail and abort.

After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
pages to swap disk and reclaim the pages, so that there's enough system
memory for hibernation image and less pages needed to copy to the image.

This patch can only flush and free about half shmem pages. It will be
better to flush and free more pages, even all of shmem pages, so that
there're less pages to be copied to the hibernation image and the overall
hibernation time can be reduced.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
---
 kernel/power/hibernate.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 10a01af63a80..7ae9d9a7aa1d 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -370,6 +370,23 @@ static int create_image(int platform_mode)
 	return error;
 }
 
+static void shrink_shmem_memory(void)
+{
+	struct sysinfo info;
+	unsigned long nr_shmem_pages, nr_freed_pages;
+
+	si_meminfo(&info);
+	nr_shmem_pages = info.sharedram; /* current page count used for shmem */
+	/*
+	 * The intent is to reclaim all shmem pages. Though shrink_all_memory() can
+	 * only reclaim about half of them, it's enough for creating the hibernation
+	 * image.
+	 */
+	nr_freed_pages = shrink_all_memory(nr_shmem_pages);
+	pr_debug("requested to reclaim %lu shmem pages, actually freed %lu pages\n",
+			nr_shmem_pages, nr_freed_pages);
+}
+
 /**
  * hibernation_snapshot - Quiesce devices and create a hibernation image.
  * @platform_mode: If set, use platform driver to prepare for the transition.
@@ -411,6 +428,15 @@ int hibernation_snapshot(int platform_mode)
 		goto Thaw;
 	}
 
+	/*
+	 * Device drivers may move lots of data to shmem in dpm_prepare(). The shmem
+	 * pages will use lots of system memory, causing hibernation image creation
+	 * fail due to insufficient free memory.
+	 * This call is to force flush the shmem pages to swap disk and reclaim
+	 * the system memory so that image creation can succeed.
+	 */
+	shrink_shmem_memory();
+
 	suspend_console();
 	pm_restrict_gfp_mask();
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
                   ` (2 preceding siblings ...)
  2025-07-09  6:44 ` [PATCH v4 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
@ 2025-07-09  6:44 ` Samuel Zhang
  2025-07-09  7:07   ` Rafael J. Wysocki
  2025-07-09  6:44 ` [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
  4 siblings, 1 reply; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:44 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

dev_pm_ops.thaw() is called in following cases:
* normal case: after hibernation image has been created.
* error case 1: creation of a hibernation image has failed.
* error case 2: restoration from a hibernation image has failed.

For normal case, it is called mainly for resume storage devices for
saving the hibernation image. Other devices that are not involved
in the image saving do not need to resume the device. But since there's
no api to know which case thaw() is called, device drivers can't
conditionally resume device in thaw().

The new pm_transition_event() is such a api to query if thaw() is called
in normal case. The returned value in thaw() is:
* PM_EVENT_THAW: normal case, no need to resume non-storage devices.
* PM_EVENT_RECOVER: error case, need to resume devices.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/base/power/main.c |  5 +++
 include/linux/pm.h        | 85 +++++++++++++++++++++++----------------
 2 files changed, 56 insertions(+), 34 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 40e1d8d8a589..7e0982caa4d4 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
 static pm_message_t pm_transition;
+int pm_transition_event(void)
+{
+	return pm_transition.event;
+}
+EXPORT_SYMBOL_GPL(pm_transition_event);
 
 static int async_error;
 
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 78855d794342..7e7b843ba823 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -471,58 +471,59 @@ const struct dev_pm_ops name = { \
 #define pm_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM), (_ptr))
 #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
 
-/*
- * PM_EVENT_ messages
+/**
+ * pm_transition_event() - Query the current pm transition event value.
+ *
+ * One example is used to query the reason why thaw() is called.
+ * It will return one of 2 values in this usage:
+ * * %PM_EVENT_THAW: normal case.
+ * * %PM_EVENT_RECOVER: error case.
+ *
+ * For other usage, it may return other values. See :ref:`PM_EVENT_ messages`
+ * for all possible values.
+ *
+ * Return: One of the %PM_EVENT_ messages
+ */
+int pm_transition_event(void);
+
+/**
+ * DOC: PM_EVENT_ messages
  *
- * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * The possible return values of %pm_transition_event().
+ *
+ * The following PM_EVENT_ messages are defined for the use of drivers and PM
  * core, in order to provide a mechanism allowing the high level suspend and
  * hibernation code to convey the necessary information to the device PM core
  * code:
  *
- * ON		No transition.
+ * %PM_EVENT_ON:		No transition.
  *
- * FREEZE	System is going to hibernate, call ->prepare() and ->freeze()
- *		for all devices.
+ * %PM_EVENT_FREEZE:	System is going to hibernate, call ->prepare() and
+ *		->freeze() for all devices.
  *
- * SUSPEND	System is going to suspend, call ->prepare() and ->suspend()
- *		for all devices.
+ * %PM_EVENT_SUSPEND:	System is going to suspend, call ->prepare() and
+ *		->suspend() for all devices.
  *
- * HIBERNATE	Hibernation image has been saved, call ->prepare() and
+ * %PM_EVENT_HIBERNATE:	Hibernation image has been saved, call ->prepare() and
  *		->poweroff() for all devices.
  *
- * QUIESCE	Contents of main memory are going to be restored from a (loaded)
- *		hibernation image, call ->prepare() and ->freeze() for all
+ * %PM_EVENT_QUIESCE:	Contents of main memory are going to be restored from
+ *		a (loaded) hibernation image, call ->prepare() and ->freeze() for all
  *		devices.
  *
- * RESUME	System is resuming, call ->resume() and ->complete() for all
- *		devices.
+ * %PM_EVENT_RESUME:	System is resuming, call ->resume() and ->complete()
+ *		for all devices.
  *
- * THAW		Hibernation image has been created, call ->thaw() and
+ * %PM_EVENT_THAW:		Hibernation image has been created, call ->thaw() and
  *		->complete() for all devices.
  *
- * RESTORE	Contents of main memory have been restored from a hibernation
- *		image, call ->restore() and ->complete() for all devices.
+ * %PM_EVENT_RESTORE:	Contents of main memory have been restored from a
+ *		hibernation image, call ->restore() and ->complete() for all devices.
  *
- * RECOVER	Creation of a hibernation image or restoration of the main
- *		memory contents from a hibernation image has failed, call
+ * %PM_EVENT_RECOVER:	Creation of a hibernation image or restoration of the
+ *		main memory contents from a hibernation image has failed, call
  *		->thaw() and ->complete() for all devices.
- *
- * The following PM_EVENT_ messages are defined for internal use by
- * kernel subsystems.  They are never issued by the PM core.
- *
- * USER_SUSPEND		Manual selective suspend was issued by userspace.
- *
- * USER_RESUME		Manual selective resume was issued by userspace.
- *
- * REMOTE_WAKEUP	Remote-wakeup request was received from the device.
- *
- * AUTO_SUSPEND		Automatic (device idle) runtime suspend was
- *			initiated by the subsystem.
- *
- * AUTO_RESUME		Automatic (device needed) runtime resume was
- *			requested by a driver.
  */
-
 #define PM_EVENT_INVALID	(-1)
 #define PM_EVENT_ON		0x0000
 #define PM_EVENT_FREEZE		0x0001
@@ -537,6 +538,22 @@ const struct dev_pm_ops name = { \
 #define PM_EVENT_REMOTE		0x0200
 #define PM_EVENT_AUTO		0x0400
 
+/*
+ * The following PM_EVENT_ messages are defined for internal use by
+ * kernel subsystems.  They are never issued by the PM core.
+ *
+ * USER_SUSPEND	Manual selective suspend was issued by userspace.
+ *
+ * USER_RESUME	Manual selective resume was issued by userspace.
+ *
+ * REMOTE_WAKEUP	Remote-wakeup request was received from the device.
+ *
+ * AUTO_SUSPEND	Automatic (device idle) runtime suspend was
+ *			initiated by the subsystem.
+ *
+ * AUTO_RESUME	Automatic (device needed) runtime resume was
+ *			requested by a driver.
+ */
 #define PM_EVENT_SLEEP		(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
 #define PM_EVENT_USER_SUSPEND	(PM_EVENT_USER | PM_EVENT_SUSPEND)
 #define PM_EVENT_USER_RESUME	(PM_EVENT_USER | PM_EVENT_RESUME)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
                   ` (3 preceding siblings ...)
  2025-07-09  6:44 ` [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
@ 2025-07-09  6:44 ` Samuel Zhang
  2025-07-09  7:13   ` Rafael J. Wysocki
  4 siblings, 1 reply; 9+ messages in thread
From: Samuel Zhang @ 2025-07-09  6:44 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.

On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 4f8632737574..c37285a8b2c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
 	if (amdgpu_ras_intr_triggered())
 		return;
 
+	/* device maybe not resumed here, return immediately in this case */
+	if (adev->in_s4 && adev->in_suspend)
+		return;
+
 	/* if we are running in a VM, make sure the device
 	 * torn down properly on reboot/shutdown.
 	 * unfortunately we can't detect certain
@@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device *dev)
 static int amdgpu_pmops_thaw(struct device *dev)
 {
 	struct drm_device *drm_dev = dev_get_drvdata(dev);
+	int event = pm_transition_event();
 
-	return amdgpu_device_resume(drm_dev, true);
+	switch (event) {
+	case PM_EVENT_THAW: /* normal case */
+		return 0;
+	case PM_EVENT_RECOVER: /* error case */
+		return amdgpu_device_resume(drm_dev, true);
+	default:
+		dev_err(dev, "unknown pm_transition_event %d\n", event);
+		return -EOPNOTSUPP;
+	}
 }
 
 static int amdgpu_pmops_poweroff(struct device *dev)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-09  6:44 ` [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
@ 2025-07-09  7:07   ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2025-07-09  7:07 UTC (permalink / raw)
  To: Samuel Zhang, mario.limonciello
  Cc: alexander.deucher, christian.koenig, len.brown, pavel, gregkh,
	dakr, airlied, simona, ray.huang, matthew.auld, matthew.brost,
	maarten.lankhorst, mripard, tzimmermann, lijo.lazar, victor.zhao,
	haijun.chang, Qing.Ma, Owen.Zhang2, linux-pm, linux-kernel,
	amd-gfx, dri-devel

On Wed, Jul 9, 2025 at 8:44 AM Samuel Zhang <guoqing.zhang@amd.com> wrote:
>
> dev_pm_ops.thaw() is called in following cases:
> * normal case: after hibernation image has been created.
> * error case 1: creation of a hibernation image has failed.
> * error case 2: restoration from a hibernation image has failed.
>
> For normal case, it is called mainly for resume storage devices for
> saving the hibernation image. Other devices that are not involved
> in the image saving do not need to resume the device. But since there's
> no api to know which case thaw() is called, device drivers can't
> conditionally resume device in thaw().
>
> The new pm_transition_event() is such a api to query if thaw() is called
> in normal case. The returned value in thaw() is:
> * PM_EVENT_THAW: normal case, no need to resume non-storage devices.
> * PM_EVENT_RECOVER: error case, need to resume devices.
>
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>  drivers/base/power/main.c |  5 +++
>  include/linux/pm.h        | 85 +++++++++++++++++++++++----------------
>  2 files changed, 56 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 40e1d8d8a589..7e0982caa4d4 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
>
>  static DEFINE_MUTEX(dpm_list_mtx);
>  static pm_message_t pm_transition;
> +int pm_transition_event(void)
> +{
> +       return pm_transition.event;
> +}
> +EXPORT_SYMBOL_GPL(pm_transition_event);
>
>  static int async_error;
>
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 78855d794342..7e7b843ba823 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -471,58 +471,59 @@ const struct dev_pm_ops name = { \
>  #define pm_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM), (_ptr))
>  #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
>
> -/*
> - * PM_EVENT_ messages
> +/**
> + * pm_transition_event() - Query the current pm transition event value.
> + *
> + * One example is used to query the reason why thaw() is called.
> + * It will return one of 2 values in this usage:
> + * * %PM_EVENT_THAW: normal case.
> + * * %PM_EVENT_RECOVER: error case.
> + *
> + * For other usage, it may return other values. See :ref:`PM_EVENT_ messages`
> + * for all possible values.
> + *
> + * Return: One of the %PM_EVENT_ messages
> + */
> +int pm_transition_event(void);

Please move the kerneldoc to where the function is defined (that is, main.c).

Now, I've changed my mind regarding this wrapper, sorry.

I'm thinking now that "thaw" is exceptional and no other callback will
ever need to check why it was called because it will always do the
same thing.

So this should be hibernation-specific and Mario was right.

Please make it return bool, and in particular return "true" if
pm_transition.event == PM_EVENT_RECOVER and "false" otherwise.

Specifically

bool pm_hibernate_is_recovering(void)
{
        return pm_transition.event == PM_EVENT_RECOVER;
}

And the changes below won't be necessary then.

> +
> +/**
> + * DOC: PM_EVENT_ messages
>   *
> - * The following PM_EVENT_ messages are defined for the internal use of the PM
> + * The possible return values of %pm_transition_event().
> + *
> + * The following PM_EVENT_ messages are defined for the use of drivers and PM
>   * core, in order to provide a mechanism allowing the high level suspend and
>   * hibernation code to convey the necessary information to the device PM core
>   * code:
>   *
> - * ON          No transition.
> + * %PM_EVENT_ON:               No transition.
>   *
> - * FREEZE      System is going to hibernate, call ->prepare() and ->freeze()
> - *             for all devices.
> + * %PM_EVENT_FREEZE:   System is going to hibernate, call ->prepare() and
> + *             ->freeze() for all devices.
>   *
> - * SUSPEND     System is going to suspend, call ->prepare() and ->suspend()
> - *             for all devices.
> + * %PM_EVENT_SUSPEND:  System is going to suspend, call ->prepare() and
> + *             ->suspend() for all devices.
>   *
> - * HIBERNATE   Hibernation image has been saved, call ->prepare() and
> + * %PM_EVENT_HIBERNATE:        Hibernation image has been saved, call ->prepare() and
>   *             ->poweroff() for all devices.
>   *
> - * QUIESCE     Contents of main memory are going to be restored from a (loaded)
> - *             hibernation image, call ->prepare() and ->freeze() for all
> + * %PM_EVENT_QUIESCE:  Contents of main memory are going to be restored from
> + *             a (loaded) hibernation image, call ->prepare() and ->freeze() for all
>   *             devices.
>   *
> - * RESUME      System is resuming, call ->resume() and ->complete() for all
> - *             devices.
> + * %PM_EVENT_RESUME:   System is resuming, call ->resume() and ->complete()
> + *             for all devices.
>   *
> - * THAW                Hibernation image has been created, call ->thaw() and
> + * %PM_EVENT_THAW:             Hibernation image has been created, call ->thaw() and
>   *             ->complete() for all devices.
>   *
> - * RESTORE     Contents of main memory have been restored from a hibernation
> - *             image, call ->restore() and ->complete() for all devices.
> + * %PM_EVENT_RESTORE:  Contents of main memory have been restored from a
> + *             hibernation image, call ->restore() and ->complete() for all devices.
>   *
> - * RECOVER     Creation of a hibernation image or restoration of the main
> - *             memory contents from a hibernation image has failed, call
> + * %PM_EVENT_RECOVER:  Creation of a hibernation image or restoration of the
> + *             main memory contents from a hibernation image has failed, call
>   *             ->thaw() and ->complete() for all devices.
> - *
> - * The following PM_EVENT_ messages are defined for internal use by
> - * kernel subsystems.  They are never issued by the PM core.
> - *
> - * USER_SUSPEND                Manual selective suspend was issued by userspace.
> - *
> - * USER_RESUME         Manual selective resume was issued by userspace.
> - *
> - * REMOTE_WAKEUP       Remote-wakeup request was received from the device.
> - *
> - * AUTO_SUSPEND                Automatic (device idle) runtime suspend was
> - *                     initiated by the subsystem.
> - *
> - * AUTO_RESUME         Automatic (device needed) runtime resume was
> - *                     requested by a driver.
>   */
> -
>  #define PM_EVENT_INVALID       (-1)
>  #define PM_EVENT_ON            0x0000
>  #define PM_EVENT_FREEZE                0x0001
> @@ -537,6 +538,22 @@ const struct dev_pm_ops name = { \
>  #define PM_EVENT_REMOTE                0x0200
>  #define PM_EVENT_AUTO          0x0400
>
> +/*
> + * The following PM_EVENT_ messages are defined for internal use by
> + * kernel subsystems.  They are never issued by the PM core.
> + *
> + * USER_SUSPEND        Manual selective suspend was issued by userspace.
> + *
> + * USER_RESUME Manual selective resume was issued by userspace.
> + *
> + * REMOTE_WAKEUP       Remote-wakeup request was received from the device.
> + *
> + * AUTO_SUSPEND        Automatic (device idle) runtime suspend was
> + *                     initiated by the subsystem.
> + *
> + * AUTO_RESUME Automatic (device needed) runtime resume was
> + *                     requested by a driver.
> + */
>  #define PM_EVENT_SLEEP         (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
>  #define PM_EVENT_USER_SUSPEND  (PM_EVENT_USER | PM_EVENT_SUSPEND)
>  #define PM_EVENT_USER_RESUME   (PM_EVENT_USER | PM_EVENT_RESUME)
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-09  6:44 ` [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
@ 2025-07-09  7:13   ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2025-07-09  7:13 UTC (permalink / raw)
  To: Samuel Zhang
  Cc: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann,
	mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel

On Wed, Jul 9, 2025 at 8:44 AM Samuel Zhang <guoqing.zhang@amd.com> wrote:
>
> For normal hibernation, GPU do not need to be resumed in thaw since it is
> not involved in writing the hibernation image. Skip resume in this case
> can reduce the hibernation time.
>
> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
> this can save 50 minutes.
>
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4f8632737574..c37285a8b2c5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>         if (amdgpu_ras_intr_triggered())
>                 return;
>
> +       /* device maybe not resumed here, return immediately in this case */
> +       if (adev->in_s4 && adev->in_suspend)
> +               return;
> +

You need to do analogous things in amdgpu_pmops_prepare() and
amdgpu_pmops_poweroff() AFAICS in case hibernation is carried out in
the "platform" mode.

>         /* if we are running in a VM, make sure the device
>          * torn down properly on reboot/shutdown.
>          * unfortunately we can't detect certain
> @@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device *dev)
>  static int amdgpu_pmops_thaw(struct device *dev)
>  {
>         struct drm_device *drm_dev = dev_get_drvdata(dev);
> +       int event = pm_transition_event();
>
> -       return amdgpu_device_resume(drm_dev, true);
> +       switch (event) {
> +       case PM_EVENT_THAW: /* normal case */
> +               return 0;
> +       case PM_EVENT_RECOVER: /* error case */
> +               return amdgpu_device_resume(drm_dev, true);
> +       default:
> +               dev_err(dev, "unknown pm_transition_event %d\n", event);
> +               return -EOPNOTSUPP;
> +       }
>  }
>
>  static int amdgpu_pmops_poweroff(struct device *dev)
> --

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()
  2025-07-09  6:44 ` [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
@ 2025-07-09  9:13   ` Christian König
  0 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2025-07-09  9:13 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, rafael, len.brown, pavel, gregkh,
	dakr, airlied, simona, ray.huang, matthew.auld, matthew.brost,
	maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel

On 09.07.25 08:44, Samuel Zhang wrote:
> This new api is used for hibernation to move GTT BOs to shmem after
> VRAM eviction. shmem will be flushed to swap disk later to reduce
> the system memory usage for hibernation.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/ttm/ttm_device.c | 23 +++++++++++++++++++++++
>  include/drm/ttm/ttm_device.h     |  1 +
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 02e797fd1891..9c9ab1903919 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -123,6 +123,29 @@ static int ttm_global_init(void)
>  	return ret;
>  }
>  
> +/**
> + * ttm_device_prepare_hibernation - move GTT BOs to shmem for hibernation.
> + *
> + * @bdev: A pointer to a struct ttm_device to prepare hibernation for.
> + *
> + * Return: 0 on success, negative number on failure.
> + */
> +int ttm_device_prepare_hibernation(struct ttm_device *bdev)
> +{
> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = false,
> +		.no_wait_gpu = false,
> +		.force_alloc = true
> +	};
> +	int ret;
> +
> +	do {
> +		ret = ttm_device_swapout(bdev, &ctx, GFP_KERNEL);
> +	} while (ret > 0);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ttm_device_prepare_hibernation);
> +
>  /*
>   * A buffer object shrink method that tries to swap out the first
>   * buffer object on the global::swap_lru list.
> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
> index 39b8636b1845..592b5f802859 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -272,6 +272,7 @@ struct ttm_device {
>  int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
>  int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>  		       gfp_t gfp_flags);
> +int ttm_device_prepare_hibernation(struct ttm_device *bdev);
>  
>  static inline struct ttm_resource_manager *
>  ttm_manager_type(struct ttm_device *bdev, int mem_type)


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-09  9:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09  6:43 [PATCH v4 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-09  6:44 ` [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-09  9:13   ` Christian König
2025-07-09  6:44 ` [PATCH v4 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
2025-07-09  6:44 ` [PATCH v4 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
2025-07-09  6:44 ` [PATCH v4 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
2025-07-09  7:07   ` Rafael J. Wysocki
2025-07-09  6:44 ` [PATCH v4 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-09  7:13   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).