From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org,
Matthew Brost <matthew.brost@intel.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v2] drm/xe: improve hibernation on igpu
Date: Tue, 5 Nov 2024 16:54:06 +0200 [thread overview]
Message-ID: <ZyoxjhuG_unc-V8Z@intel.com> (raw)
In-Reply-To: <20241101170156.213490-2-matthew.auld@intel.com>
On Fri, Nov 01, 2024 at 05:01:57PM +0000, Matthew Auld wrote:
> The GGTT looks to be stored inside stolen memory on igpu which is not
> treated as normal RAM.
The GGTT lives in GSM, not DSM (which is what people normally
mean when the talk about "stolen").
> The core kernel skips this memory range when
> creating the hibernation image, therefore when coming back from
> hibernation the GGTT programming is lost. This seems to cause issues
> with broken resume where GuC FW fails to load:
>
> [drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
> [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
> [drm] *ERROR* GT0: firmware signature verification failed
> [drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.
>
> Current GGTT users are kernel internal and tracked as pinned, so it
> should be possible to hook into the existing save/restore logic that we
> use for dgpu, where the actual evict is skipped but on restore we
> importantly restore the GGTT programming. This has been confirmed to
> fix hibernation on at least ADL and MTL, though likely all igpu
> platforms are affected.
>
> This also means we have a hole in our testing, where the existing s4
> tests only really test the driver hooks, and don't go as far as actually
> rebooting and restoring from the hibernation image and in turn powering
> down RAM (and therefore losing the contents of stolen).
>
> v2 (Brost)
> - Remove extra newline and drop unnecessary parentheses.
>
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_bo.c | 37 ++++++++++++++------------------
> drivers/gpu/drm/xe/xe_bo_evict.c | 6 ------
> 2 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 8286cbc23721..549866da5cd1 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -952,7 +952,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
> if (WARN_ON(!xe_bo_is_pinned(bo)))
> return -EINVAL;
>
> - if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm))
> + if (WARN_ON(xe_bo_is_vram(bo)))
> + return -EINVAL;
> +
> + if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo)))
> return -EINVAL;
>
> if (!mem_type_is_vram(place->mem_type))
> @@ -1774,6 +1777,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
>
> int xe_bo_pin(struct xe_bo *bo)
> {
> + struct ttm_place *place = &bo->placements[0];
> struct xe_device *xe = xe_bo_device(bo);
> int err;
>
> @@ -1804,8 +1808,6 @@ int xe_bo_pin(struct xe_bo *bo)
> */
> if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
> bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> - struct ttm_place *place = &(bo->placements[0]);
> -
> if (mem_type_is_vram(place->mem_type)) {
> xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS);
>
> @@ -1813,13 +1815,12 @@ int xe_bo_pin(struct xe_bo *bo)
> vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT;
> place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
> }
> + }
>
> - if (mem_type_is_vram(place->mem_type) ||
> - bo->flags & XE_BO_FLAG_GGTT) {
> - spin_lock(&xe->pinned.lock);
> - list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> - spin_unlock(&xe->pinned.lock);
> - }
> + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> + spin_lock(&xe->pinned.lock);
> + list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> + spin_unlock(&xe->pinned.lock);
> }
>
> ttm_bo_pin(&bo->ttm);
> @@ -1867,24 +1868,18 @@ void xe_bo_unpin_external(struct xe_bo *bo)
>
> void xe_bo_unpin(struct xe_bo *bo)
> {
> + struct ttm_place *place = &bo->placements[0];
> struct xe_device *xe = xe_bo_device(bo);
>
> xe_assert(xe, !bo->ttm.base.import_attach);
> xe_assert(xe, xe_bo_is_pinned(bo));
>
> - if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
> - bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> - struct ttm_place *place = &(bo->placements[0]);
> -
> - if (mem_type_is_vram(place->mem_type) ||
> - bo->flags & XE_BO_FLAG_GGTT) {
> - spin_lock(&xe->pinned.lock);
> - xe_assert(xe, !list_empty(&bo->pinned_link));
> - list_del_init(&bo->pinned_link);
> - spin_unlock(&xe->pinned.lock);
> - }
> + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> + spin_lock(&xe->pinned.lock);
> + xe_assert(xe, !list_empty(&bo->pinned_link));
> + list_del_init(&bo->pinned_link);
> + spin_unlock(&xe->pinned.lock);
> }
> -
> ttm_bo_unpin(&bo->ttm);
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
> index 32043e1e5a86..b01bc20eb90b 100644
> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
> @@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe)
> u8 id;
> int ret;
>
> - if (!IS_DGFX(xe))
> - return 0;
> -
> /* User memory */
> for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
> struct ttm_resource_manager *man =
> @@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe)
> struct xe_bo *bo;
> int ret;
>
> - if (!IS_DGFX(xe))
> - return 0;
> -
> spin_lock(&xe->pinned.lock);
> for (;;) {
> bo = list_first_entry_or_null(&xe->pinned.evicted,
> --
> 2.47.0
--
Ville Syrjälä
Intel
prev parent reply other threads:[~2024-11-05 14:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-01 17:01 [PATCH v2] drm/xe: improve hibernation on igpu Matthew Auld
2024-11-01 17:07 ` ✓ CI.Patch_applied: success for " Patchwork
2024-11-01 17:07 ` ✗ CI.checkpatch: warning " Patchwork
2024-11-01 17:09 ` ✓ CI.KUnit: success " Patchwork
2024-11-01 17:20 ` ✓ CI.Build: " Patchwork
2024-11-01 17:22 ` ✓ CI.Hooks: " Patchwork
2024-11-01 17:24 ` ✓ CI.checksparse: " Patchwork
2024-11-01 17:38 ` [PATCH v2] " Lucas De Marchi
2024-11-01 19:16 ` Matthew Brost
2024-11-05 17:32 ` Lucas De Marchi
2024-11-05 18:12 ` Matthew Brost
2024-11-05 19:18 ` Lucas De Marchi
2024-11-05 19:26 ` Matthew Brost
2024-11-08 19:42 ` Lucas De Marchi
2024-11-08 23:30 ` Matthew Brost
2024-11-11 10:52 ` Matthew Auld
2024-11-01 17:47 ` ✓ CI.BAT: success for " Patchwork
2024-11-01 18:51 ` ✗ CI.FULL: failure " Patchwork
2024-11-05 14:54 ` Ville Syrjälä [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyoxjhuG_unc-V8Z@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox