[PATCH v3 0/5] reduce system memory requirement for hibernation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/5] reduce system memory requirement for hibernation
@ 2025-07-08  7:42 Samuel Zhang
  2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

Modern data center dGPUs are usually equipped with very large VRAM. On
server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
will fail due to no enough free memory.

The root cause is that during hibernation all VRAM memory get evicted to
GTT or shmem. In both case, it is in system memory and kernel will try to 
copy the pages to hibernation image. In the worst case, this causes 2 
copies of VRAM memory in system memory, 2TB is not enough for the 
hibernation image. 192GB * 8 * 2 = 3TB > 2TB.

The fix includes following changes. With these changes, there's much less
pages needed to be copied to hibernate image and hibernation can succeed.
* patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT 
  pages can be freed.
* patch 3: force write shmem pages to swap disk and free shmem pages.

After swapout GTT to shmem in hibernation prepare stage, the GPU will be
resumed again in thaw stage. The swapin and restore BOs of resume takes
lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
writing hibernation image do not need GPU for hibernate successful case.
* patch 4 and 5: skip resume of device in thaw stage for successful
  hibernation case to reduce the hibernation time.

v2:
* split first patch to 2 patches, 1 for ttm, 1 for amdgpu
* refined the new ttm api
* add more comments for shrink_shmem_memory() and its callsite
* export variable pm_transition in kernel
* skip resume in thaw() for successful hibernation case
v3:
* refined ttm_device_prepare_hibernation() to accept device argument
* use guard(mutex) to replace mutex_lock and mutex_unlock
* move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
* add pm_transition_event(), instead of exporting pm_transition variable
* refined amdgpu_pmops_thaw(), use switch-case for more clarity

Samuel Zhang (5):
1. drm/ttm: add ttm_device_prepare_hibernation() api
2. drm/amdgpu: move GTT to shmem after eviction for hibernation
3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
4. PM: hibernate: add new api pm_transition_event()
5. drm/amdgpu: do not resume device in thaw for normal hibernation

 drivers/base/power/main.c                  |  5 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 15 ++++++++++++-
 drivers/gpu/drm/ttm/ttm_device.c           | 23 +++++++++++++++++++
 include/drm/ttm/ttm_device.h               |  1 +
 include/linux/pm.h                         | 16 +++++++++++++
 kernel/power/hibernate.c                   | 26 ++++++++++++++++++++++
 7 files changed, 94 insertions(+), 2 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
@ 2025-07-08  7:42 ` Samuel Zhang
  2025-07-08  8:38   ` Christian König
  2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

This new api is used for hibernation to move GTT BOs to shmem after
VRAM eviction. shmem will be flushed to swap disk later to reduce
the system memory usage for hibernation.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 22 ++++++++++++++++++++++
 include/drm/ttm/ttm_device.h     |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 02e797fd1891..f14437ea0cce 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -123,6 +123,28 @@ static int ttm_global_init(void)
 	return ret;
 }
 
+/**
+ * move GTT BOs to shmem for hibernation.
+ *
+ * returns 0 on success, negative on failure.
+ */
+int ttm_device_prepare_hibernation(struct ttm_device *bdev)
+{
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = false,
+		.force_alloc = true
+	};
+	int ret;
+
+	guard(mutex)(&ttm_global_mutex);
+	do {
+		ret = ttm_device_swapout(bdev, &ctx, GFP_KERNEL);
+	} while (ret > 0);
+	return ret;
+}
+EXPORT_SYMBOL(ttm_device_prepare_hibernation);
+
 /*
  * A buffer object shrink method that tries to swap out the first
  * buffer object on the global::swap_lru list.
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 39b8636b1845..592b5f802859 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -272,6 +272,7 @@ struct ttm_device {
 int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
 int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 		       gfp_t gfp_flags);
+int ttm_device_prepare_hibernation(struct ttm_device *bdev);
 
 static inline struct ttm_resource_manager *
 ttm_manager_type(struct ttm_device *bdev, int mem_type)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
  2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
@ 2025-07-08  7:42 ` Samuel Zhang
  2025-07-08  8:41   ` Christian König
  2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

When hibernate with data center dGPUs, huge number of VRAM BOs evicted
to GTT and takes too much system memory. This will cause hibernation
fail due to insufficient memory for creating the hibernation image.

Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
hibernation code to make room for hibernation image.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 684d66bc0b5f..2f977fece08f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5021,8 +5021,16 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
 		return 0;
 
 	ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
-	if (ret)
+	if (ret) {
 		dev_warn(adev->dev, "evicting device resources failed\n");
+		return ret;
+	}
+
+	if (adev->in_s4) {
+		ret = ttm_device_prepare_hibernation(&adev->mman.bdev);
+		if (ret)
+			dev_err(adev->dev, "prepare hibernation failed, %d\n", ret);
+	}
 	return ret;
 }
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
  2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
  2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
@ 2025-07-08  7:42 ` Samuel Zhang
  2025-07-08 14:28   ` Mario Limonciello
  2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

When hibernate with data center dGPUs, huge number of VRAM data will be
moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
of system memory so that there's no enough free memory for creating the
hibernation image. This will cause hibernation fail and abort.

After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
pages to swap disk and reclaim the pages, so that there's enough system
memory for hibernation image and less pages needed to copy to the image.

This patch can only flush and free about half shmem pages. It will be
better to flush and free more pages, even all of shmem pages, so that
there're less pages to be copied to the hibernation image and the overall
hibernation time can be reduced.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 kernel/power/hibernate.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 10a01af63a80..7ae9d9a7aa1d 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -370,6 +370,23 @@ static int create_image(int platform_mode)
 	return error;
 }
 
+static void shrink_shmem_memory(void)
+{
+	struct sysinfo info;
+	unsigned long nr_shmem_pages, nr_freed_pages;
+
+	si_meminfo(&info);
+	nr_shmem_pages = info.sharedram; /* current page count used for shmem */
+	/*
+	 * The intent is to reclaim all shmem pages. Though shrink_all_memory() can
+	 * only reclaim about half of them, it's enough for creating the hibernation
+	 * image.
+	 */
+	nr_freed_pages = shrink_all_memory(nr_shmem_pages);
+	pr_debug("requested to reclaim %lu shmem pages, actually freed %lu pages\n",
+			nr_shmem_pages, nr_freed_pages);
+}
+
 /**
  * hibernation_snapshot - Quiesce devices and create a hibernation image.
  * @platform_mode: If set, use platform driver to prepare for the transition.
@@ -411,6 +428,15 @@ int hibernation_snapshot(int platform_mode)
 		goto Thaw;
 	}
 
+	/*
+	 * Device drivers may move lots of data to shmem in dpm_prepare(). The shmem
+	 * pages will use lots of system memory, causing hibernation image creation
+	 * fail due to insufficient free memory.
+	 * This call is to force flush the shmem pages to swap disk and reclaim
+	 * the system memory so that image creation can succeed.
+	 */
+	shrink_shmem_memory();
+
 	suspend_console();
 	pm_restrict_gfp_mask();
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
                   ` (2 preceding siblings ...)
  2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
@ 2025-07-08  7:42 ` Samuel Zhang
  2025-07-08 14:36   ` Mario Limonciello
  2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
  2025-07-08 14:31 ` [PATCH v3 0/5] reduce system memory requirement for hibernation Mario Limonciello
  5 siblings, 1 reply; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

dev_pm_ops.thaw() is called in following cases:
* normal case: after hibernation image has been created.
* error case 1: creation of a hibernation image has failed.
* error case 2: restoration from a hibernation image has failed.

For normal case, it is called mainly for resume storage devices for
saving the hibernation image. Other devices that are not involved
in the image saving do not need to resume the device. But since there's
no api to know which case thaw() is called, device drivers can't
conditionally resume device in thaw().

The new pm_transition_event() is such a api to query if thaw() is called
in normal case. The returned value in thaw() is:
* PM_EVENT_THAW: normal case, no need to resume non-storage devices.
* PM_EVENT_RECOVER: error case, need to resume devices.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/base/power/main.c |  5 +++++
 include/linux/pm.h        | 16 ++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 40e1d8d8a589..7e0982caa4d4 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
 static pm_message_t pm_transition;
+int pm_transition_event(void)
+{
+	return pm_transition.event;
+}
+EXPORT_SYMBOL_GPL(pm_transition_event);
 
 static int async_error;
 
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 78855d794342..d1cb77ede1a2 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -657,6 +657,22 @@ struct pm_subsys_data {
 #define DPM_FLAG_SMART_SUSPEND		BIT(2)
 #define DPM_FLAG_MAY_SKIP_RESUME	BIT(3)
 
+/**
+ * pm_transition_event() - Query the current pm transition event value.
+ *
+ * Used to query the reason why thaw() is called. It will be one of 2 values:
+ *
+ * PM_EVENT_THAW: normal case.
+ *		hibernation image has been created.
+ *
+ * PM_EVENT_RECOVER: error case.
+ *		creation of a hibernation image or restoration of the main memory
+ *		contents from a hibernation image has failed.
+ *
+ * Return: PM_EVENT_ messages
+ */
+int pm_transition_event(void);
+
 struct dev_pm_info {
 	pm_message_t		power_state;
 	bool			can_wakeup:1;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
                   ` (3 preceding siblings ...)
  2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
@ 2025-07-08  7:42 ` Samuel Zhang
  2025-07-08 10:28   ` Lazar, Lijo
  2025-07-08 14:40   ` Mario Limonciello
  2025-07-08 14:31 ` [PATCH v3 0/5] reduce system memory requirement for hibernation Mario Limonciello
  5 siblings, 2 replies; 19+ messages in thread
From: Samuel Zhang @ 2025-07-08  7:42 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, rafael, len.brown, pavel,
	gregkh, dakr, airlied, simona, ray.huang, matthew.auld,
	matthew.brost, maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel,
	Samuel Zhang

For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.

On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 4f8632737574..10827becf855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
 	if (amdgpu_ras_intr_triggered())
 		return;
 
+	/* device maybe not resumed here, return immediately in this case */
+	if (adev->in_s4 && adev->in_suspend)
+		return;
+
 	/* if we are running in a VM, make sure the device
 	 * torn down properly on reboot/shutdown.
 	 * unfortunately we can't detect certain
@@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device *dev)
 static int amdgpu_pmops_thaw(struct device *dev)
 {
 	struct drm_device *drm_dev = dev_get_drvdata(dev);
+	int event = pm_transition_event();
 
-	return amdgpu_device_resume(drm_dev, true);
+	switch (event) {
+	case PM_EVENT_THAW: /* normal case */
+		return 0;
+	case PM_EVENT_RECOVER: /* error case */
+		return amdgpu_device_resume(drm_dev, true);
+	default:
+		pr_err("unknown pm_transition_event %d\n", event);
+		return -EOPNOTSUPP;
+	}
 }
 
 static int amdgpu_pmops_poweroff(struct device *dev)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()
  2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
@ 2025-07-08  8:38   ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2025-07-08  8:38 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, rafael, len.brown, pavel, gregkh,
	dakr, airlied, simona, ray.huang, matthew.auld, matthew.brost,
	maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel



On 08.07.25 09:42, Samuel Zhang wrote:
> This new api is used for hibernation to move GTT BOs to shmem after
> VRAM eviction. shmem will be flushed to swap disk later to reduce
> the system memory usage for hibernation.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_device.c | 22 ++++++++++++++++++++++
>  include/drm/ttm/ttm_device.h     |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 02e797fd1891..f14437ea0cce 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -123,6 +123,28 @@ static int ttm_global_init(void)
>  	return ret;
>  }
>  
> +/**
> + * move GTT BOs to shmem for hibernation.

That is not the correct kerneldoc style.

Please make sure that for example make htmldocs doesn't throw any new warning.

> + *
> + * returns 0 on success, negative on failure.
> + */
> +int ttm_device_prepare_hibernation(struct ttm_device *bdev)
> +{
> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = false,
> +		.no_wait_gpu = false,
> +		.force_alloc = true
> +	};
> +	int ret;
> +

> +	guard(mutex)(&ttm_global_mutex);

That is unnecessary now, the function doesn't touch the device list any more.

Regards,
Christian.

> +	do {
> +		ret = ttm_device_swapout(bdev, &ctx, GFP_KERNEL);
> +	} while (ret > 0);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ttm_device_prepare_hibernation);
> +
>  /*
>   * A buffer object shrink method that tries to swap out the first
>   * buffer object on the global::swap_lru list.
> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
> index 39b8636b1845..592b5f802859 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -272,6 +272,7 @@ struct ttm_device {
>  int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
>  int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>  		       gfp_t gfp_flags);
> +int ttm_device_prepare_hibernation(struct ttm_device *bdev);
>  
>  static inline struct ttm_resource_manager *
>  ttm_manager_type(struct ttm_device *bdev, int mem_type)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation
  2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
@ 2025-07-08  8:41   ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2025-07-08  8:41 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, rafael, len.brown, pavel, gregkh,
	dakr, airlied, simona, ray.huang, matthew.auld, matthew.brost,
	maarten.lankhorst, mripard, tzimmermann
  Cc: mario.limonciello, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel

On 08.07.25 09:42, Samuel Zhang wrote:
> When hibernate with data center dGPUs, huge number of VRAM BOs evicted
> to GTT and takes too much system memory. This will cause hibernation
> fail due to insufficient memory for creating the hibernation image.
> 
> Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
> hibernation code to make room for hibernation image.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 684d66bc0b5f..2f977fece08f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5021,8 +5021,16 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
>  		return 0;
>  
>  	ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
> -	if (ret)
> +	if (ret) {
>  		dev_warn(adev->dev, "evicting device resources failed\n");
> +		return ret;
> +	}
> +
> +	if (adev->in_s4) {
> +		ret = ttm_device_prepare_hibernation(&adev->mman.bdev);
> +		if (ret)
> +			dev_err(adev->dev, "prepare hibernation failed, %d\n", ret);
> +	}
>  	return ret;
>  }
>  


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
@ 2025-07-08 10:28   ` Lazar, Lijo
  2025-07-08 14:40   ` Mario Limonciello
  1 sibling, 0 replies; 19+ messages in thread
From: Lazar, Lijo @ 2025-07-08 10:28 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann
  Cc: mario.limonciello, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel



On 7/8/2025 1:12 PM, Samuel Zhang wrote:
> For normal hibernation, GPU do not need to be resumed in thaw since it is
> not involved in writing the hibernation image. Skip resume in this case
> can reduce the hibernation time.
> 
> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
> this can save 50 minutes.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4f8632737574..10827becf855 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>  	if (amdgpu_ras_intr_triggered())
>  		return;
>  
> +	/* device maybe not resumed here, return immediately in this case */
> +	if (adev->in_s4 && adev->in_suspend)
> +		return;
> +
>  	/* if we are running in a VM, make sure the device
>  	 * torn down properly on reboot/shutdown.
>  	 * unfortunately we can't detect certain
> @@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device *dev)
>  static int amdgpu_pmops_thaw(struct device *dev)
>  {
>  	struct drm_device *drm_dev = dev_get_drvdata(dev);
> +	int event = pm_transition_event();
>  
> -	return amdgpu_device_resume(drm_dev, true);
> +	switch (event) {
> +	case PM_EVENT_THAW: /* normal case */
> +		return 0;
> +	case PM_EVENT_RECOVER: /* error case */
> +		return amdgpu_device_resume(drm_dev, true);
> +	default:
> +		pr_err("unknown pm_transition_event %d\n", event);

If it ever happens, keeping a bit more context with 'dev_err' helps -
"unknown pm transition event during thaw %d"

With that -

	Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Thanks,
Lijo

> +		return -EOPNOTSUPP;
> +	}
>  }
>  
>  static int amdgpu_pmops_poweroff(struct device *dev)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
  2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
@ 2025-07-08 14:28   ` Mario Limonciello
  2025-07-08 14:33     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 14:28 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel

On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> When hibernate with data center dGPUs, huge number of VRAM data will be
> moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
> of system memory so that there's no enough free memory for creating the
> hibernation image. This will cause hibernation fail and abort.
> 
> After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
> pages to swap disk and reclaim the pages, so that there's enough system
> memory for hibernation image and less pages needed to copy to the image.
> 
> This patch can only flush and free about half shmem pages. It will be
> better to flush and free more pages, even all of shmem pages, so that
> there're less pages to be copied to the hibernation image and the overall
> hibernation time can be reduced.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>

AFAICT this didn't tangibly change and was just reordered in the series, 
I think you should carry Rafael's A-b tag forward.

> ---
>   kernel/power/hibernate.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 10a01af63a80..7ae9d9a7aa1d 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -370,6 +370,23 @@ static int create_image(int platform_mode)
>   	return error;
>   }
>   
> +static void shrink_shmem_memory(void)
> +{
> +	struct sysinfo info;
> +	unsigned long nr_shmem_pages, nr_freed_pages;
> +
> +	si_meminfo(&info);
> +	nr_shmem_pages = info.sharedram; /* current page count used for shmem */
> +	/*
> +	 * The intent is to reclaim all shmem pages. Though shrink_all_memory() can
> +	 * only reclaim about half of them, it's enough for creating the hibernation
> +	 * image.
> +	 */
> +	nr_freed_pages = shrink_all_memory(nr_shmem_pages);
> +	pr_debug("requested to reclaim %lu shmem pages, actually freed %lu pages\n",
> +			nr_shmem_pages, nr_freed_pages);
> +}
> +
>   /**
>    * hibernation_snapshot - Quiesce devices and create a hibernation image.
>    * @platform_mode: If set, use platform driver to prepare for the transition.
> @@ -411,6 +428,15 @@ int hibernation_snapshot(int platform_mode)
>   		goto Thaw;
>   	}
>   
> +	/*
> +	 * Device drivers may move lots of data to shmem in dpm_prepare(). The shmem
> +	 * pages will use lots of system memory, causing hibernation image creation
> +	 * fail due to insufficient free memory.
> +	 * This call is to force flush the shmem pages to swap disk and reclaim
> +	 * the system memory so that image creation can succeed.
> +	 */
> +	shrink_shmem_memory();
> +
>   	suspend_console();
>   	pm_restrict_gfp_mask();
>   


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/5] reduce system memory requirement for hibernation
  2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
                   ` (4 preceding siblings ...)
  2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
@ 2025-07-08 14:31 ` Mario Limonciello
  5 siblings, 0 replies; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 14:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel, Samuel Zhang,
	alexander.deucher, christian.koenig, len.brown, pavel, gregkh,
	dakr, airlied, simona, ray.huang, matthew.auld, matthew.brost,
	maarten.lankhorst, mripard, tzimmermann

On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> Modern data center dGPUs are usually equipped with very large VRAM. On
> server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate
> will fail due to no enough free memory.
> 
> The root cause is that during hibernation all VRAM memory get evicted to
> GTT or shmem. In both case, it is in system memory and kernel will try to
> copy the pages to hibernation image. In the worst case, this causes 2
> copies of VRAM memory in system memory, 2TB is not enough for the
> hibernation image. 192GB * 8 * 2 = 3TB > 2TB.
> 
> The fix includes following changes. With these changes, there's much less
> pages needed to be copied to hibernate image and hibernation can succeed.
> * patch 1 and 2: move GTT to shmem after evicting VRAM. so that the GTT
>    pages can be freed.
> * patch 3: force write shmem pages to swap disk and free shmem pages.
> 
> After swapout GTT to shmem in hibernation prepare stage, the GPU will be
> resumed again in thaw stage. The swapin and restore BOs of resume takes
> lots of time (50 mintues observed for 8 dGPUs). And it's unnecessary since
> writing hibernation image do not need GPU for hibernate successful case.
> * patch 4 and 5: skip resume of device in thaw stage for successful
>    hibernation case to reduce the hibernation time.
> 
> v2:
> * split first patch to 2 patches, 1 for ttm, 1 for amdgpu
> * refined the new ttm api
> * add more comments for shrink_shmem_memory() and its callsite
> * export variable pm_transition in kernel
> * skip resume in thaw() for successful hibernation case
> v3:
> * refined ttm_device_prepare_hibernation() to accept device argument
> * use guard(mutex) to replace mutex_lock and mutex_unlock
> * move ttm_device_prepare_hibernation call to amdgpu_device_evict_resources()
> * add pm_transition_event(), instead of exporting pm_transition variable
> * refined amdgpu_pmops_thaw(), use switch-case for more clarity
> 
> Samuel Zhang (5):
> 1. drm/ttm: add ttm_device_prepare_hibernation() api
> 2. drm/amdgpu: move GTT to shmem after eviction for hibernation
> 3. PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
> 4. PM: hibernate: add new api pm_transition_event()
> 5. drm/amdgpu: do not resume device in thaw for normal hibernation
> 
>   drivers/base/power/main.c                  |  5 +++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 15 ++++++++++++-
>   drivers/gpu/drm/ttm/ttm_device.c           | 23 +++++++++++++++++++
>   include/drm/ttm/ttm_device.h               |  1 +
>   include/linux/pm.h                         | 16 +++++++++++++
>   kernel/power/hibernate.c                   | 26 ++++++++++++++++++++++
>   7 files changed, 94 insertions(+), 2 deletions(-)
> 

As there isn't a mention of intent on how to merge when this is fully 
reviewed, I wanted to ask Rafael what he thinks.

The options are either:
* the linux-pm changes go to linux-pm and an immutable branch for drm to 
merge
* everything goes through amd-staging-drm-next (and an amdgpu PR to drm 
later)
* everything goes through drm-misc-next

I think everything through drm-misc-next makes most sense if everyone is 
amenable.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
  2025-07-08 14:28   ` Mario Limonciello
@ 2025-07-08 14:33     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-07-08 14:33 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel

On Tue, Jul 8, 2025 at 4:28 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> > When hibernate with data center dGPUs, huge number of VRAM data will be
> > moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
> > of system memory so that there's no enough free memory for creating the
> > hibernation image. This will cause hibernation fail and abort.
> >
> > After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
> > pages to swap disk and reclaim the pages, so that there's enough system
> > memory for hibernation image and less pages needed to copy to the image.
> >
> > This patch can only flush and free about half shmem pages. It will be
> > better to flush and free more pages, even all of shmem pages, so that
> > there're less pages to be copied to the hibernation image and the overall
> > hibernation time can be reduced.
> >
> > Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>
> AFAICT this didn't tangibly change and was just reordered in the series,
> I think you should carry Rafael's A-b tag forward.

Yes, please.

> > ---
> >   kernel/power/hibernate.c | 26 ++++++++++++++++++++++++++
> >   1 file changed, 26 insertions(+)
> >
> > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> > index 10a01af63a80..7ae9d9a7aa1d 100644
> > --- a/kernel/power/hibernate.c
> > +++ b/kernel/power/hibernate.c
> > @@ -370,6 +370,23 @@ static int create_image(int platform_mode)
> >       return error;
> >   }
> >
> > +static void shrink_shmem_memory(void)
> > +{
> > +     struct sysinfo info;
> > +     unsigned long nr_shmem_pages, nr_freed_pages;
> > +
> > +     si_meminfo(&info);
> > +     nr_shmem_pages = info.sharedram; /* current page count used for shmem */
> > +     /*
> > +      * The intent is to reclaim all shmem pages. Though shrink_all_memory() can
> > +      * only reclaim about half of them, it's enough for creating the hibernation
> > +      * image.
> > +      */
> > +     nr_freed_pages = shrink_all_memory(nr_shmem_pages);
> > +     pr_debug("requested to reclaim %lu shmem pages, actually freed %lu pages\n",
> > +                     nr_shmem_pages, nr_freed_pages);
> > +}
> > +
> >   /**
> >    * hibernation_snapshot - Quiesce devices and create a hibernation image.
> >    * @platform_mode: If set, use platform driver to prepare for the transition.
> > @@ -411,6 +428,15 @@ int hibernation_snapshot(int platform_mode)
> >               goto Thaw;
> >       }
> >
> > +     /*
> > +      * Device drivers may move lots of data to shmem in dpm_prepare(). The shmem
> > +      * pages will use lots of system memory, causing hibernation image creation
> > +      * fail due to insufficient free memory.
> > +      * This call is to force flush the shmem pages to swap disk and reclaim
> > +      * the system memory so that image creation can succeed.
> > +      */
> > +     shrink_shmem_memory();
> > +
> >       suspend_console();
> >       pm_restrict_gfp_mask();
> >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
@ 2025-07-08 14:36   ` Mario Limonciello
  2025-07-08 14:39     ` Rafael J. Wysocki
  2025-07-08 16:07     ` Zhang, GuoQing (Sam)
  0 siblings, 2 replies; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 14:36 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel

On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> dev_pm_ops.thaw() is called in following cases:
> * normal case: after hibernation image has been created.
> * error case 1: creation of a hibernation image has failed.
> * error case 2: restoration from a hibernation image has failed.
> 
> For normal case, it is called mainly for resume storage devices for
> saving the hibernation image. Other devices that are not involved
> in the image saving do not need to resume the device. But since there's
> no api to know which case thaw() is called, device drivers can't
> conditionally resume device in thaw().
> 
> The new pm_transition_event() is such a api to query if thaw() is called
> in normal case. The returned value in thaw() is:
> * PM_EVENT_THAW: normal case, no need to resume non-storage devices.
> * PM_EVENT_RECOVER: error case, need to resume devices.
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>   drivers/base/power/main.c |  5 +++++
>   include/linux/pm.h        | 16 ++++++++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 40e1d8d8a589..7e0982caa4d4 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
>   
>   static DEFINE_MUTEX(dpm_list_mtx);
>   static pm_message_t pm_transition;
> +int pm_transition_event(void)
> +{
> +	return pm_transition.event;
> +}
> +EXPORT_SYMBOL_GPL(pm_transition_event);
>   
>   static int async_error;
>   
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 78855d794342..d1cb77ede1a2 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -657,6 +657,22 @@ struct pm_subsys_data {
>   #define DPM_FLAG_SMART_SUSPEND		BIT(2)
>   #define DPM_FLAG_MAY_SKIP_RESUME	BIT(3)
>   
> +/**
> + * pm_transition_event() - Query the current pm transition event value.
> + *
> + * Used to query the reason why thaw() is called. It will be one of 2 values:
> + *
> + * PM_EVENT_THAW: normal case.
> + *		hibernation image has been created.
> + *
> + * PM_EVENT_RECOVER: error case.
> + *		creation of a hibernation image or restoration of the main memory
> + *		contents from a hibernation image has failed.

I don't believe this documentation is complete.  In the use in this 
series those are two events used, but as this is now exported this might 
be used by other callers later which could use it for other PM_EVENT_*.

So because of this I think it's best to convert the comment in 
include/linux/pm.h to kdoc and then reference that from this kdoc.

> + *
> + * Return: PM_EVENT_ messages
> + */
> +int pm_transition_event(void);
> +
>   struct dev_pm_info {
>   	pm_message_t		power_state;
>   	bool			can_wakeup:1;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-08 14:36   ` Mario Limonciello
@ 2025-07-08 14:39     ` Rafael J. Wysocki
  2025-07-08 16:07     ` Zhang, GuoQing (Sam)
  1 sibling, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-07-08 14:39 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann, lijo.lazar, victor.zhao, haijun.chang, Qing.Ma,
	Owen.Zhang2, linux-pm, linux-kernel, amd-gfx, dri-devel

On Tue, Jul 8, 2025 at 4:37 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> > dev_pm_ops.thaw() is called in following cases:
> > * normal case: after hibernation image has been created.
> > * error case 1: creation of a hibernation image has failed.
> > * error case 2: restoration from a hibernation image has failed.
> >
> > For normal case, it is called mainly for resume storage devices for
> > saving the hibernation image. Other devices that are not involved
> > in the image saving do not need to resume the device. But since there's
> > no api to know which case thaw() is called, device drivers can't
> > conditionally resume device in thaw().
> >
> > The new pm_transition_event() is such a api to query if thaw() is called
> > in normal case. The returned value in thaw() is:
> > * PM_EVENT_THAW: normal case, no need to resume non-storage devices.
> > * PM_EVENT_RECOVER: error case, need to resume devices.
> >
> > Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> > ---
> >   drivers/base/power/main.c |  5 +++++
> >   include/linux/pm.h        | 16 ++++++++++++++++
> >   2 files changed, 21 insertions(+)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 40e1d8d8a589..7e0982caa4d4 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
> >
> >   static DEFINE_MUTEX(dpm_list_mtx);
> >   static pm_message_t pm_transition;
> > +int pm_transition_event(void)
> > +{
> > +     return pm_transition.event;
> > +}
> > +EXPORT_SYMBOL_GPL(pm_transition_event);
> >
> >   static int async_error;
> >
> > diff --git a/include/linux/pm.h b/include/linux/pm.h
> > index 78855d794342..d1cb77ede1a2 100644
> > --- a/include/linux/pm.h
> > +++ b/include/linux/pm.h
> > @@ -657,6 +657,22 @@ struct pm_subsys_data {
> >   #define DPM_FLAG_SMART_SUSPEND              BIT(2)
> >   #define DPM_FLAG_MAY_SKIP_RESUME    BIT(3)
> >
> > +/**
> > + * pm_transition_event() - Query the current pm transition event value.
> > + *
> > + * Used to query the reason why thaw() is called. It will be one of 2 values:
> > + *
> > + * PM_EVENT_THAW: normal case.
> > + *           hibernation image has been created.
> > + *
> > + * PM_EVENT_RECOVER: error case.
> > + *           creation of a hibernation image or restoration of the main memory
> > + *           contents from a hibernation image has failed.
>
> I don't believe this documentation is complete.  In the use in this
> series those are two events used, but as this is now exported this might
> be used by other callers later which could use it for other PM_EVENT_*.
>
> So because of this I think it's best to convert the comment in
> include/linux/pm.h to kdoc and then reference that from this kdoc.

+1

> > + *
> > + * Return: PM_EVENT_ messages
> > + */
> > +int pm_transition_event(void);
> > +
> >   struct dev_pm_info {
> >       pm_message_t            power_state;
> >       bool                    can_wakeup:1;
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
  2025-07-08 10:28   ` Lazar, Lijo
@ 2025-07-08 14:40   ` Mario Limonciello
  2025-07-08 16:08     ` Zhang, GuoQing (Sam)
  1 sibling, 1 reply; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 14:40 UTC (permalink / raw)
  To: Samuel Zhang, alexander.deucher, christian.koenig, rafael,
	len.brown, pavel, gregkh, dakr, airlied, simona, ray.huang,
	matthew.auld, matthew.brost, maarten.lankhorst, mripard,
	tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel

On 7/8/2025 3:42 AM, Samuel Zhang wrote:
> For normal hibernation, GPU do not need to be resumed in thaw since it is
> not involved in writing the hibernation image. Skip resume in this case
> can reduce the hibernation time.
> 
> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
> this can save 50 minutes.

If I'm not mistaken this will also have the side effect that display is 
not resumed in the "normal case" too, right?

I know the GPU you used doesn't have a display, but I'm just thinking 
about the callpaths and implications.

Would you be able to test this series specifically on an APU with a 
display connected to eDP and no compositor running (so no DRM master) to 
make sure it works as intended?

> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4f8632737574..10827becf855 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>   	if (amdgpu_ras_intr_triggered())
>   		return;
>   
> +	/* device maybe not resumed here, return immediately in this case */
> +	if (adev->in_s4 && adev->in_suspend)
> +		return;
> +
>   	/* if we are running in a VM, make sure the device
>   	 * torn down properly on reboot/shutdown.
>   	 * unfortunately we can't detect certain
> @@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device *dev)
>   static int amdgpu_pmops_thaw(struct device *dev)
>   {
>   	struct drm_device *drm_dev = dev_get_drvdata(dev);
> +	int event = pm_transition_event();
>   
> -	return amdgpu_device_resume(drm_dev, true);
> +	switch (event) {
> +	case PM_EVENT_THAW: /* normal case */
> +		return 0;
> +	case PM_EVENT_RECOVER: /* error case */
> +		return amdgpu_device_resume(drm_dev, true);
> +	default:
> +		pr_err("unknown pm_transition_event %d\n", event);
> +		return -EOPNOTSUPP;
> +	}
>   }
>   
>   static int amdgpu_pmops_poweroff(struct device *dev)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-08 14:36   ` Mario Limonciello
  2025-07-08 14:39     ` Rafael J. Wysocki
@ 2025-07-08 16:07     ` Zhang, GuoQing (Sam)
  2025-07-08 16:11       ` Mario Limonciello
  1 sibling, 1 reply; 19+ messages in thread
From: Zhang, GuoQing (Sam) @ 2025-07-08 16:07 UTC (permalink / raw)
  To: Mario Limonciello, Samuel Zhang, alexander.deucher,
	christian.koenig, rafael, len.brown, pavel, gregkh, dakr, airlied,
	simona, ray.huang, matthew.auld, matthew.brost, maarten.lankhorst,
	mripard, tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel


On 2025/7/8 22:36, Mario Limonciello wrote:
> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
>> dev_pm_ops.thaw() is called in following cases:
>> * normal case: after hibernation image has been created.
>> * error case 1: creation of a hibernation image has failed.
>> * error case 2: restoration from a hibernation image has failed.
>>
>> For normal case, it is called mainly for resume storage devices for
>> saving the hibernation image. Other devices that are not involved
>> in the image saving do not need to resume the device. But since there's
>> no api to know which case thaw() is called, device drivers can't
>> conditionally resume device in thaw().
>>
>> The new pm_transition_event() is such a api to query if thaw() is called
>> in normal case. The returned value in thaw() is:
>> * PM_EVENT_THAW: normal case, no need to resume non-storage devices.
>> * PM_EVENT_RECOVER: error case, need to resume devices.
>>
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>> ---
>>   drivers/base/power/main.c |  5 +++++
>>   include/linux/pm.h        | 16 ++++++++++++++++
>>   2 files changed, 21 insertions(+)
>>
>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> index 40e1d8d8a589..7e0982caa4d4 100644
>> --- a/drivers/base/power/main.c
>> +++ b/drivers/base/power/main.c
>> @@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
>>     static DEFINE_MUTEX(dpm_list_mtx);
>>   static pm_message_t pm_transition;
>> +int pm_transition_event(void)
>> +{
>> +    return pm_transition.event;
>> +}
>> +EXPORT_SYMBOL_GPL(pm_transition_event);
>>     static int async_error;
>>   diff --git a/include/linux/pm.h b/include/linux/pm.h
>> index 78855d794342..d1cb77ede1a2 100644
>> --- a/include/linux/pm.h
>> +++ b/include/linux/pm.h
>> @@ -657,6 +657,22 @@ struct pm_subsys_data {
>>   #define DPM_FLAG_SMART_SUSPEND        BIT(2)
>>   #define DPM_FLAG_MAY_SKIP_RESUME    BIT(3)
>>   +/**
>> + * pm_transition_event() - Query the current pm transition event value.
>> + *
>> + * Used to query the reason why thaw() is called. It will be one of 
>> 2 values:
>> + *
>> + * PM_EVENT_THAW: normal case.
>> + *        hibernation image has been created.
>> + *
>> + * PM_EVENT_RECOVER: error case.
>> + *        creation of a hibernation image or restoration of the main 
>> memory
>> + *        contents from a hibernation image has failed.
>
> I don't believe this documentation is complete.  In the use in this 
> series those are two events used, but as this is now exported this 
> might be used by other callers later which could use it for other 
> PM_EVENT_*.
>
> So because of this I think it's best to convert the comment in 
> include/linux/pm.h to kdoc and then reference that from this kdoc.


Hi Mario, thank you for the feedback. I don't have experience on kdoc. 
do you mean generate new `Documentation/power/pm.rst` from 
`include/linux/pm.h` using the `scripts/kernel-doc` tool? Could you give 
some guidance on this? Thank you!

Regards
Sam


>
>> + *
>> + * Return: PM_EVENT_ messages
>> + */
>> +int pm_transition_event(void);
>> +
>>   struct dev_pm_info {
>>       pm_message_t        power_state;
>>       bool            can_wakeup:1;
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-08 14:40   ` Mario Limonciello
@ 2025-07-08 16:08     ` Zhang, GuoQing (Sam)
  2025-07-08 16:11       ` Mario Limonciello
  0 siblings, 1 reply; 19+ messages in thread
From: Zhang, GuoQing (Sam) @ 2025-07-08 16:08 UTC (permalink / raw)
  To: Mario Limonciello, Samuel Zhang, alexander.deucher,
	christian.koenig, rafael, len.brown, pavel, gregkh, dakr, airlied,
	simona, ray.huang, matthew.auld, matthew.brost, maarten.lankhorst,
	mripard, tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel


On 2025/7/8 22:40, Mario Limonciello wrote:
> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
>> For normal hibernation, GPU do not need to be resumed in thaw since 
>> it is
>> not involved in writing the hibernation image. Skip resume in this case
>> can reduce the hibernation time.
>>
>> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
>> this can save 50 minutes.
>
> If I'm not mistaken this will also have the side effect that display 
> is not resumed in the "normal case" too, right?


Yes, I believe so.


>
> I know the GPU you used doesn't have a display, but I'm just thinking 
> about the callpaths and implications.
>
> Would you be able to test this series specifically on an APU with a 
> display connected to eDP and no compositor running (so no DRM master) 
> to make sure it works as intended?


Sorry, Mario. I don't have such APU environment to test this behavior.

Regards
Sam


>
>>
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 4f8632737574..10827becf855 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>>       if (amdgpu_ras_intr_triggered())
>>           return;
>>   +    /* device maybe not resumed here, return immediately in this 
>> case */
>> +    if (adev->in_s4 && adev->in_suspend)
>> +        return;
>> +
>>       /* if we are running in a VM, make sure the device
>>        * torn down properly on reboot/shutdown.
>>        * unfortunately we can't detect certain
>> @@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device 
>> *dev)
>>   static int amdgpu_pmops_thaw(struct device *dev)
>>   {
>>       struct drm_device *drm_dev = dev_get_drvdata(dev);
>> +    int event = pm_transition_event();
>>   -    return amdgpu_device_resume(drm_dev, true);
>> +    switch (event) {
>> +    case PM_EVENT_THAW: /* normal case */
>> +        return 0;
>> +    case PM_EVENT_RECOVER: /* error case */
>> +        return amdgpu_device_resume(drm_dev, true);
>> +    default:
>> +        pr_err("unknown pm_transition_event %d\n", event);
>> +        return -EOPNOTSUPP;
>> +    }
>>   }
>>     static int amdgpu_pmops_poweroff(struct device *dev)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()
  2025-07-08 16:07     ` Zhang, GuoQing (Sam)
@ 2025-07-08 16:11       ` Mario Limonciello
  0 siblings, 0 replies; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 16:11 UTC (permalink / raw)
  To: Zhang, GuoQing (Sam), Samuel Zhang, alexander.deucher,
	christian.koenig, rafael, len.brown, pavel, gregkh, dakr, airlied,
	simona, ray.huang, matthew.auld, matthew.brost, maarten.lankhorst,
	mripard, tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel

On 7/8/2025 12:07 PM, Zhang, GuoQing (Sam) wrote:
> 
> On 2025/7/8 22:36, Mario Limonciello wrote:
>> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
>>> dev_pm_ops.thaw() is called in following cases:
>>> * normal case: after hibernation image has been created.
>>> * error case 1: creation of a hibernation image has failed.
>>> * error case 2: restoration from a hibernation image has failed.
>>>
>>> For normal case, it is called mainly for resume storage devices for
>>> saving the hibernation image. Other devices that are not involved
>>> in the image saving do not need to resume the device. But since there's
>>> no api to know which case thaw() is called, device drivers can't
>>> conditionally resume device in thaw().
>>>
>>> The new pm_transition_event() is such a api to query if thaw() is called
>>> in normal case. The returned value in thaw() is:
>>> * PM_EVENT_THAW: normal case, no need to resume non-storage devices.
>>> * PM_EVENT_RECOVER: error case, need to resume devices.
>>>
>>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>>> ---
>>>   drivers/base/power/main.c |  5 +++++
>>>   include/linux/pm.h        | 16 ++++++++++++++++
>>>   2 files changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>>> index 40e1d8d8a589..7e0982caa4d4 100644
>>> --- a/drivers/base/power/main.c
>>> +++ b/drivers/base/power/main.c
>>> @@ -62,6 +62,11 @@ static LIST_HEAD(dpm_noirq_list);
>>>     static DEFINE_MUTEX(dpm_list_mtx);
>>>   static pm_message_t pm_transition;
>>> +int pm_transition_event(void)
>>> +{
>>> +    return pm_transition.event;
>>> +}
>>> +EXPORT_SYMBOL_GPL(pm_transition_event);
>>>     static int async_error;
>>>   diff --git a/include/linux/pm.h b/include/linux/pm.h
>>> index 78855d794342..d1cb77ede1a2 100644
>>> --- a/include/linux/pm.h
>>> +++ b/include/linux/pm.h
>>> @@ -657,6 +657,22 @@ struct pm_subsys_data {
>>>   #define DPM_FLAG_SMART_SUSPEND        BIT(2)
>>>   #define DPM_FLAG_MAY_SKIP_RESUME    BIT(3)
>>>   +/**
>>> + * pm_transition_event() - Query the current pm transition event value.
>>> + *
>>> + * Used to query the reason why thaw() is called. It will be one of 
>>> 2 values:
>>> + *
>>> + * PM_EVENT_THAW: normal case.
>>> + *        hibernation image has been created.
>>> + *
>>> + * PM_EVENT_RECOVER: error case.
>>> + *        creation of a hibernation image or restoration of the main 
>>> memory
>>> + *        contents from a hibernation image has failed.
>>
>> I don't believe this documentation is complete.  In the use in this 
>> series those are two events used, but as this is now exported this 
>> might be used by other callers later which could use it for other 
>> PM_EVENT_*.
>>
>> So because of this I think it's best to convert the comment in 
>> include/linux/pm.h to kdoc and then reference that from this kdoc.
> 
> 
> Hi Mario, thank you for the feedback. I don't have experience on kdoc. 
> do you mean generate new `Documentation/power/pm.rst` from `include/ 
> linux/pm.h` using the `scripts/kernel-doc` tool? Could you give some 
> guidance on this? Thank you!

If the comment starts with /* it's a regular comment.  If it starts with 
/** it's a kernel doc.

You should just need to convert it to a kdoc by changing it to /**.

Then you can reference it using a link (IIRC)

https://sublime-and-sphinx-guide.readthedocs.io/en/latest/references.html

> 
> Regards
> Sam
> 
> 
>>
>>> + *
>>> + * Return: PM_EVENT_ messages
>>> + */
>>> +int pm_transition_event(void);
>>> +
>>>   struct dev_pm_info {
>>>       pm_message_t        power_state;
>>>       bool            can_wakeup:1;
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation
  2025-07-08 16:08     ` Zhang, GuoQing (Sam)
@ 2025-07-08 16:11       ` Mario Limonciello
  0 siblings, 0 replies; 19+ messages in thread
From: Mario Limonciello @ 2025-07-08 16:11 UTC (permalink / raw)
  To: Zhang, GuoQing (Sam), Samuel Zhang, alexander.deucher,
	christian.koenig, rafael, len.brown, pavel, gregkh, dakr, airlied,
	simona, ray.huang, matthew.auld, matthew.brost, maarten.lankhorst,
	mripard, tzimmermann
  Cc: lijo.lazar, victor.zhao, haijun.chang, Qing.Ma, Owen.Zhang2,
	linux-pm, linux-kernel, amd-gfx, dri-devel

On 7/8/2025 12:08 PM, Zhang, GuoQing (Sam) wrote:
> 
> On 2025/7/8 22:40, Mario Limonciello wrote:
>> On 7/8/2025 3:42 AM, Samuel Zhang wrote:
>>> For normal hibernation, GPU do not need to be resumed in thaw since 
>>> it is
>>> not involved in writing the hibernation image. Skip resume in this case
>>> can reduce the hibernation time.
>>>
>>> On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
>>> this can save 50 minutes.
>>
>> If I'm not mistaken this will also have the side effect that display 
>> is not resumed in the "normal case" too, right?
> 
> 
> Yes, I believe so.
> 
> 
>>
>> I know the GPU you used doesn't have a display, but I'm just thinking 
>> about the callpaths and implications.
>>
>> Would you be able to test this series specifically on an APU with a 
>> display connected to eDP and no compositor running (so no DRM master) 
>> to make sure it works as intended?
> 
> 
> Sorry, Mario. I don't have such APU environment to test this behavior.
> 

OK, let me see if I can get someone to test this for you.  Will let you 
know any problems.

> Regards
> Sam
> 
> 
>>
>>>
>>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 ++++++++++++++-
>>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/ 
>>> drm/amd/amdgpu/amdgpu_drv.c
>>> index 4f8632737574..10827becf855 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>>>       if (amdgpu_ras_intr_triggered())
>>>           return;
>>>   +    /* device maybe not resumed here, return immediately in this 
>>> case */
>>> +    if (adev->in_s4 && adev->in_suspend)
>>> +        return;
>>> +
>>>       /* if we are running in a VM, make sure the device
>>>        * torn down properly on reboot/shutdown.
>>>        * unfortunately we can't detect certain
>>> @@ -2654,8 +2658,17 @@ static int amdgpu_pmops_freeze(struct device 
>>> *dev)
>>>   static int amdgpu_pmops_thaw(struct device *dev)
>>>   {
>>>       struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> +    int event = pm_transition_event();
>>>   -    return amdgpu_device_resume(drm_dev, true);
>>> +    switch (event) {
>>> +    case PM_EVENT_THAW: /* normal case */
>>> +        return 0;
>>> +    case PM_EVENT_RECOVER: /* error case */
>>> +        return amdgpu_device_resume(drm_dev, true);
>>> +    default:
>>> +        pr_err("unknown pm_transition_event %d\n", event);
>>> +        return -EOPNOTSUPP;
>>> +    }
>>>   }
>>>     static int amdgpu_pmops_poweroff(struct device *dev)
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-07-08 16:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08  7:42 [PATCH v3 0/5] reduce system memory requirement for hibernation Samuel Zhang
2025-07-08  7:42 ` [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation() Samuel Zhang
2025-07-08  8:38   ` Christian König
2025-07-08  7:42 ` [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation Samuel Zhang
2025-07-08  8:41   ` Christian König
2025-07-08  7:42 ` [PATCH v3 3/5] PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() Samuel Zhang
2025-07-08 14:28   ` Mario Limonciello
2025-07-08 14:33     ` Rafael J. Wysocki
2025-07-08  7:42 ` [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event() Samuel Zhang
2025-07-08 14:36   ` Mario Limonciello
2025-07-08 14:39     ` Rafael J. Wysocki
2025-07-08 16:07     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08  7:42 ` [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation Samuel Zhang
2025-07-08 10:28   ` Lazar, Lijo
2025-07-08 14:40   ` Mario Limonciello
2025-07-08 16:08     ` Zhang, GuoQing (Sam)
2025-07-08 16:11       ` Mario Limonciello
2025-07-08 14:31 ` [PATCH v3 0/5] reduce system memory requirement for hibernation Mario Limonciello

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).