public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: maarten.lankhorst@linux.intel.com
Subject: Re: [Intel-gfx] [PATCH v3 3/6] drm/i915 Implement LMEM backup and restore for suspend / resume
Date: Fri, 17 Sep 2021 13:03:54 +0100	[thread overview]
Message-ID: <b717eec5-7fe4-88d4-8345-e28679ab45f5@intel.com> (raw)
In-Reply-To: <20210914193112.497379-4-thomas.hellstrom@linux.intel.com>

On 14/09/2021 20:31, Thomas Hellström wrote:
> Just evict unpinned objects to system. For pinned LMEM objects,
> make a backup system object and blit the contents to that.
> 
> Backup is performed in three steps,
> 1: Opportunistically evict evictable objects using the gpu blitter.
> 2: After gt idle, evict evictable objects using the gpu blitter. This will
> be modified in an upcoming patch to backup pinned objects that are not used
> by the blitter itself.
> 3: Backup remaining pinned objects using memcpy.
> 
> Also move uC suspend to after 2) to make sure we have a functional GuC
> during 2) if using GuC submission.
> 
> v2:
> - Major refactor to make sure gem_exec_suspend@hang-SX subtests work, and
>    suspend / resume works with a slightly modified GuC submission enabling
>    patch series.
> 
> v3:
> - Fix a potential use-after-free (Matthew Auld)
> - Use i915_gem_object_create_shmem() instead of
>    i915_gem_object_create_region (Matthew Auld)
> - Minor simplifications (Matthew Auld)
> - Fix up kerneldoc for i195_ttm_restore_region().
> - Final lmem_suspend() call moved to i915_gem_backup_suspend from
>    i915_gem_suspend_late, since the latter gets called at driver unload
>    and we don't unnecessarily want to run it at that time.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
>   drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  92 +++++++-
>   drivers/gpu/drm/i915/gem/i915_gem_pm.h        |   3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  29 ++-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.h       |  10 +
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c    | 203 ++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h    |  24 +++
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   4 +-
>   drivers/gpu/drm/i915/i915_drv.c               |  10 +-
>   drivers/gpu/drm/i915/i915_drv.h               |   2 +-
>   11 files changed, 362 insertions(+), 17 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
>   create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 9d371be7dc5c..f9b69492a56c 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -154,6 +154,7 @@ gem-y += \
>   	gem/i915_gem_throttle.o \
>   	gem/i915_gem_tiling.o \
>   	gem/i915_gem_ttm.o \
> +	gem/i915_gem_ttm_pm.o \
>   	gem/i915_gem_userptr.o \
>   	gem/i915_gem_wait.o \
>   	gem/i915_gemfs.o
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 2471f36aaff3..734cc8e16481 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -534,6 +534,7 @@ struct drm_i915_gem_object {
>   	struct {
>   		struct sg_table *cached_io_st;
>   		struct i915_gem_object_page_iter get_io_page;
> +		struct drm_i915_gem_object *backup;
>   		bool created:1;
>   	} ttm;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> index 8b9d7d14c4bd..8736ae1dfbb2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> @@ -5,6 +5,7 @@
>    */
>   
>   #include "gem/i915_gem_pm.h"
> +#include "gem/i915_gem_ttm_pm.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_pm.h"
>   #include "gt/intel_gt_requests.h"
> @@ -39,7 +40,86 @@ void i915_gem_suspend(struct drm_i915_private *i915)
>   	i915_gem_drain_freed_objects(i915);
>   }
>   
> -void i915_gem_suspend_late(struct drm_i915_private *i915)
> +static int lmem_restore(struct drm_i915_private *i915, bool allow_gpu)
> +{
> +	struct intel_memory_region *mr;
> +	int ret = 0, id;
> +
> +	for_each_memory_region(mr, i915, id) {
> +		if (mr->type == INTEL_MEMORY_LOCAL) {
> +			ret = i915_ttm_restore_region(mr, allow_gpu);
> +			if (ret)
> +				break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int lmem_suspend(struct drm_i915_private *i915, bool allow_gpu,
> +			bool backup_pinned)
> +{
> +	struct intel_memory_region *mr;
> +	int ret = 0, id;
> +
> +	for_each_memory_region(mr, i915, id) {
> +		if (mr->type == INTEL_MEMORY_LOCAL) {
> +			ret = i915_ttm_backup_region(mr, allow_gpu, backup_pinned);
> +			if (ret)
> +				break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static void lmem_recover(struct drm_i915_private *i915)
> +{
> +	struct intel_memory_region *mr;
> +	int id;
> +
> +	for_each_memory_region(mr, i915, id)
> +		if (mr->type == INTEL_MEMORY_LOCAL)
> +			i915_ttm_recover_region(mr);
> +}
> +
> +int i915_gem_backup_suspend(struct drm_i915_private *i915)
> +{
> +	int ret;
> +
> +	/* Opportunistically try to evict unpinned objects */
> +	ret = lmem_suspend(i915, true, false);
> +	if (ret)
> +		goto out_recover;
> +
> +	i915_gem_suspend(i915);
> +
> +	/*
> +	 * More objects may have become unpinned as requests were
> +	 * retired. Now try to evict again. The gt may be wedged here
> +	 * in which case we automatically fall back to memcpy.
> +	 */
> +	ret = lmem_suspend(i915, true, false);
> +	if (ret)
> +		goto out_recover;
> +
> +	/*
> +	 * Remaining objects are backed up using memcpy once we've stopped
> +	 * using the migrate context.
> +	 */
> +	ret = lmem_suspend(i915, false, true);
> +	if (ret)
> +		goto out_recover;
> +
> +	return 0;
> +
> +out_recover:
> +	lmem_recover(i915);
> +
> +	return ret;
> +}
> +
> +int i915_gem_suspend_late(struct drm_i915_private *i915)
>   {
>   	struct drm_i915_gem_object *obj;
>   	struct list_head *phases[] = {
> @@ -83,6 +163,8 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
>   	spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
>   	if (flush)
>   		wbinvd_on_all_cpus();
> +
> +	return 0;


We can drop this change now?

I guess only slight concern is all the GEM_WARN_ON() instead of proper 
error handling in some places, but hopefully these should never be hit 
in practice,
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

  reply	other threads:[~2021-09-17 12:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 19:31 [Intel-gfx] [PATCH v3 0/6] drm/i915: Suspend / resume backup- and restore of LMEM Thomas Hellström
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 1/6] drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects Thomas Hellström
2021-09-16 10:17   ` Matthew Auld
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 2/6] drm/i915/gem: Implement a function to process all gem objects of a region Thomas Hellström
2021-09-16 10:23   ` Matthew Auld
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 3/6] drm/i915 Implement LMEM backup and restore for suspend / resume Thomas Hellström
2021-09-17 12:03   ` Matthew Auld [this message]
2021-09-20 10:49   ` Matthew Auld
2021-09-20 11:05     ` Thomas Hellström
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 4/6] drm/i915/gt: Register the migrate contexts with their engines Thomas Hellström
2021-09-20  9:53   ` Matthew Auld
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend Thomas Hellström
2021-09-20  9:57   ` Matthew Auld
2021-09-14 19:31 ` [Intel-gfx] [PATCH v3 6/6] drm/i915: Reduce the number of objects subject to memcpy recover Thomas Hellström
2021-09-20 11:05   ` Matthew Auld
2021-09-20 11:09     ` Thomas Hellström
2021-09-14 19:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Suspend / resume backup- and restore of LMEM. (rev4) Patchwork
2021-09-14 20:06 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-09-14 21:14 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-09-15 12:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Suspend / resume backup- and restore of LMEM. (rev5) Patchwork
2021-09-15 13:09 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b717eec5-7fe4-88d4-8345-e28679ab45f5@intel.com \
    --to=matthew.auld@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox