All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org,
	Matthew Brost <matthew.brost@intel.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] drm/xe: improve hibernation on igpu
Date: Tue, 5 Nov 2024 16:54:06 +0200	[thread overview]
Message-ID: <ZyoxjhuG_unc-V8Z@intel.com> (raw)
In-Reply-To: <20241101170156.213490-2-matthew.auld@intel.com>

On Fri, Nov 01, 2024 at 05:01:57PM +0000, Matthew Auld wrote:
> The GGTT looks to be stored inside stolen memory on igpu which is not
> treated as normal RAM.

The GGTT lives in GSM, not DSM (which is what people normally
mean when the talk about "stolen").

> The core kernel skips this memory range when
> creating the hibernation image, therefore when coming back from
> hibernation the GGTT programming is lost. This seems to cause issues
> with broken resume where GuC FW fails to load:
> 
> [drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
> [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
> [drm] *ERROR* GT0: firmware signature verification failed
> [drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.
> 
> Current GGTT users are kernel internal and tracked as pinned, so it
> should be possible to hook into the existing save/restore logic that we
> use for dgpu, where the actual evict is skipped but on restore we
> importantly restore the GGTT programming.  This has been confirmed to
> fix hibernation on at least ADL and MTL, though likely all igpu
> platforms are affected.
> 
> This also means we have a hole in our testing, where the existing s4
> tests only really test the driver hooks, and don't go as far as actually
> rebooting and restoring from the hibernation image and in turn powering
> down RAM (and therefore losing the contents of stolen).
> 
> v2 (Brost)
>  - Remove extra newline and drop unnecessary parentheses.
> 
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c       | 37 ++++++++++++++------------------
>  drivers/gpu/drm/xe/xe_bo_evict.c |  6 ------
>  2 files changed, 16 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 8286cbc23721..549866da5cd1 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -952,7 +952,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
>  	if (WARN_ON(!xe_bo_is_pinned(bo)))
>  		return -EINVAL;
>  
> -	if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm))
> +	if (WARN_ON(xe_bo_is_vram(bo)))
> +		return -EINVAL;
> +
> +	if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo)))
>  		return -EINVAL;
>  
>  	if (!mem_type_is_vram(place->mem_type))
> @@ -1774,6 +1777,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  
>  int xe_bo_pin(struct xe_bo *bo)
>  {
> +	struct ttm_place *place = &bo->placements[0];
>  	struct xe_device *xe = xe_bo_device(bo);
>  	int err;
>  
> @@ -1804,8 +1808,6 @@ int xe_bo_pin(struct xe_bo *bo)
>  	 */
>  	if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
>  	    bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> -		struct ttm_place *place = &(bo->placements[0]);
> -
>  		if (mem_type_is_vram(place->mem_type)) {
>  			xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS);
>  
> @@ -1813,13 +1815,12 @@ int xe_bo_pin(struct xe_bo *bo)
>  				       vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT;
>  			place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
>  		}
> +	}
>  
> -		if (mem_type_is_vram(place->mem_type) ||
> -		    bo->flags & XE_BO_FLAG_GGTT) {
> -			spin_lock(&xe->pinned.lock);
> -			list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> -			spin_unlock(&xe->pinned.lock);
> -		}
> +	if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> +		spin_lock(&xe->pinned.lock);
> +		list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> +		spin_unlock(&xe->pinned.lock);
>  	}
>  
>  	ttm_bo_pin(&bo->ttm);
> @@ -1867,24 +1868,18 @@ void xe_bo_unpin_external(struct xe_bo *bo)
>  
>  void xe_bo_unpin(struct xe_bo *bo)
>  {
> +	struct ttm_place *place = &bo->placements[0];
>  	struct xe_device *xe = xe_bo_device(bo);
>  
>  	xe_assert(xe, !bo->ttm.base.import_attach);
>  	xe_assert(xe, xe_bo_is_pinned(bo));
>  
> -	if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
> -	    bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> -		struct ttm_place *place = &(bo->placements[0]);
> -
> -		if (mem_type_is_vram(place->mem_type) ||
> -		    bo->flags & XE_BO_FLAG_GGTT) {
> -			spin_lock(&xe->pinned.lock);
> -			xe_assert(xe, !list_empty(&bo->pinned_link));
> -			list_del_init(&bo->pinned_link);
> -			spin_unlock(&xe->pinned.lock);
> -		}
> +	if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> +		spin_lock(&xe->pinned.lock);
> +		xe_assert(xe, !list_empty(&bo->pinned_link));
> +		list_del_init(&bo->pinned_link);
> +		spin_unlock(&xe->pinned.lock);
>  	}
> -
>  	ttm_bo_unpin(&bo->ttm);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
> index 32043e1e5a86..b01bc20eb90b 100644
> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
> @@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe)
>  	u8 id;
>  	int ret;
>  
> -	if (!IS_DGFX(xe))
> -		return 0;
> -
>  	/* User memory */
>  	for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
>  		struct ttm_resource_manager *man =
> @@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe)
>  	struct xe_bo *bo;
>  	int ret;
>  
> -	if (!IS_DGFX(xe))
> -		return 0;
> -
>  	spin_lock(&xe->pinned.lock);
>  	for (;;) {
>  		bo = list_first_entry_or_null(&xe->pinned.evicted,
> -- 
> 2.47.0

-- 
Ville Syrjälä
Intel

      parent reply	other threads:[~2024-11-05 14:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-01 17:01 [PATCH v2] drm/xe: improve hibernation on igpu Matthew Auld
2024-11-01 17:07 ` ✓ CI.Patch_applied: success for " Patchwork
2024-11-01 17:07 ` ✗ CI.checkpatch: warning " Patchwork
2024-11-01 17:09 ` ✓ CI.KUnit: success " Patchwork
2024-11-01 17:20 ` ✓ CI.Build: " Patchwork
2024-11-01 17:22 ` ✓ CI.Hooks: " Patchwork
2024-11-01 17:24 ` ✓ CI.checksparse: " Patchwork
2024-11-01 17:38 ` [PATCH v2] " Lucas De Marchi
2024-11-01 19:16   ` Matthew Brost
2024-11-05 17:32     ` Lucas De Marchi
2024-11-05 18:12       ` Matthew Brost
2024-11-05 19:18         ` Lucas De Marchi
2024-11-05 19:26           ` Matthew Brost
2024-11-08 19:42             ` Lucas De Marchi
2024-11-08 23:30               ` Matthew Brost
2024-11-11 10:52                 ` Matthew Auld
2024-11-01 17:47 ` ✓ CI.BAT: success for " Patchwork
2024-11-01 18:51 ` ✗ CI.FULL: failure " Patchwork
2024-11-05 14:54 ` Ville Syrjälä [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZyoxjhuG_unc-V8Z@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.