* FAILED: patch "[PATCH] drm/xe: improve hibernation on igpu" failed to apply to 6.11-stable tree
@ 2024-11-17 20:36 gregkh
2024-11-18 11:34 ` Matthew Auld
0 siblings, 1 reply; 3+ messages in thread
From: gregkh @ 2024-11-17 20:36 UTC (permalink / raw)
To: matthew.auld, lucas.demarchi, matthew.brost, stable; +Cc: stable
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2024111758-jumbo-neon-1b3c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83 Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld@intel.com>
Date: Fri, 1 Nov 2024 17:01:57 +0000
Subject: [PATCH] drm/xe: improve hibernation on igpu
The GGTT looks to be stored inside stolen memory on igpu which is not
treated as normal RAM. The core kernel skips this memory range when
creating the hibernation image, therefore when coming back from
hibernation the GGTT programming is lost. This seems to cause issues
with broken resume where GuC FW fails to load:
[drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
[drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
[drm] *ERROR* GT0: firmware signature verification failed
[drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.
Current GGTT users are kernel internal and tracked as pinned, so it
should be possible to hook into the existing save/restore logic that we
use for dgpu, where the actual evict is skipped but on restore we
importantly restore the GGTT programming. This has been confirmed to
fix hibernation on at least ADL and MTL, though likely all igpu
platforms are affected.
This also means we have a hole in our testing, where the existing s4
tests only really test the driver hooks, and don't go as far as actually
rebooting and restoring from the hibernation image and in turn powering
down RAM (and therefore losing the contents of stolen).
v2 (Brost)
- Remove extra newline and drop unnecessary parentheses.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com
(cherry picked from commit f2a6b8e396666d97ada8e8759dfb6a69d8df6380)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 74f68289f74c..2a093540354e 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -948,7 +948,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
if (WARN_ON(!xe_bo_is_pinned(bo)))
return -EINVAL;
- if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm))
+ if (WARN_ON(xe_bo_is_vram(bo)))
+ return -EINVAL;
+
+ if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo)))
return -EINVAL;
if (!mem_type_is_vram(place->mem_type))
@@ -1723,6 +1726,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
int xe_bo_pin(struct xe_bo *bo)
{
+ struct ttm_place *place = &bo->placements[0];
struct xe_device *xe = xe_bo_device(bo);
int err;
@@ -1753,8 +1757,6 @@ int xe_bo_pin(struct xe_bo *bo)
*/
if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
- struct ttm_place *place = &(bo->placements[0]);
-
if (mem_type_is_vram(place->mem_type)) {
xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS);
@@ -1762,13 +1764,12 @@ int xe_bo_pin(struct xe_bo *bo)
vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT;
place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
}
+ }
- if (mem_type_is_vram(place->mem_type) ||
- bo->flags & XE_BO_FLAG_GGTT) {
- spin_lock(&xe->pinned.lock);
- list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
- spin_unlock(&xe->pinned.lock);
- }
+ if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
+ spin_lock(&xe->pinned.lock);
+ list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
+ spin_unlock(&xe->pinned.lock);
}
ttm_bo_pin(&bo->ttm);
@@ -1816,24 +1817,18 @@ void xe_bo_unpin_external(struct xe_bo *bo)
void xe_bo_unpin(struct xe_bo *bo)
{
+ struct ttm_place *place = &bo->placements[0];
struct xe_device *xe = xe_bo_device(bo);
xe_assert(xe, !bo->ttm.base.import_attach);
xe_assert(xe, xe_bo_is_pinned(bo));
- if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
- bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
- struct ttm_place *place = &(bo->placements[0]);
-
- if (mem_type_is_vram(place->mem_type) ||
- bo->flags & XE_BO_FLAG_GGTT) {
- spin_lock(&xe->pinned.lock);
- xe_assert(xe, !list_empty(&bo->pinned_link));
- list_del_init(&bo->pinned_link);
- spin_unlock(&xe->pinned.lock);
- }
+ if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
+ spin_lock(&xe->pinned.lock);
+ xe_assert(xe, !list_empty(&bo->pinned_link));
+ list_del_init(&bo->pinned_link);
+ spin_unlock(&xe->pinned.lock);
}
-
ttm_bo_unpin(&bo->ttm);
}
diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
index 32043e1e5a86..b01bc20eb90b 100644
--- a/drivers/gpu/drm/xe/xe_bo_evict.c
+++ b/drivers/gpu/drm/xe/xe_bo_evict.c
@@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe)
u8 id;
int ret;
- if (!IS_DGFX(xe))
- return 0;
-
/* User memory */
for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
struct ttm_resource_manager *man =
@@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe)
struct xe_bo *bo;
int ret;
- if (!IS_DGFX(xe))
- return 0;
-
spin_lock(&xe->pinned.lock);
for (;;) {
bo = list_first_entry_or_null(&xe->pinned.evicted,
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: FAILED: patch "[PATCH] drm/xe: improve hibernation on igpu" failed to apply to 6.11-stable tree
2024-11-17 20:36 FAILED: patch "[PATCH] drm/xe: improve hibernation on igpu" failed to apply to 6.11-stable tree gregkh
@ 2024-11-18 11:34 ` Matthew Auld
2024-11-18 15:57 ` Lucas De Marchi
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Auld @ 2024-11-18 11:34 UTC (permalink / raw)
To: gregkh, lucas.demarchi, matthew.brost, stable
On 17/11/2024 20:36, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 6.11-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
>
> To reproduce the conflict and resubmit, you may use the following commands:
>
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
> git checkout FETCH_HEAD
> git cherry-pick -x 46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83
> # <resolve conflicts, build, test, etc.>
> git commit -s
> git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2024111758-jumbo-neon-1b3c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
>
> Possible dependencies:
There is a dependency with:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd886a63d6e2ce5c16e662c07547c067ad7d91f5
I guess we need to get that patch into linux-6.11.y branch also? I think
we also need both patches for "drm/xe: handle flat ccs during
hibernation on igpu" to work, even if it applies without.
>
>
>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From 46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83 Mon Sep 17 00:00:00 2001
> From: Matthew Auld <matthew.auld@intel.com>
> Date: Fri, 1 Nov 2024 17:01:57 +0000
> Subject: [PATCH] drm/xe: improve hibernation on igpu
>
> The GGTT looks to be stored inside stolen memory on igpu which is not
> treated as normal RAM. The core kernel skips this memory range when
> creating the hibernation image, therefore when coming back from
> hibernation the GGTT programming is lost. This seems to cause issues
> with broken resume where GuC FW fails to load:
>
> [drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
> [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
> [drm] *ERROR* GT0: firmware signature verification failed
> [drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.
>
> Current GGTT users are kernel internal and tracked as pinned, so it
> should be possible to hook into the existing save/restore logic that we
> use for dgpu, where the actual evict is skipped but on restore we
> importantly restore the GGTT programming. This has been confirmed to
> fix hibernation on at least ADL and MTL, though likely all igpu
> platforms are affected.
>
> This also means we have a hole in our testing, where the existing s4
> tests only really test the driver hooks, and don't go as far as actually
> rebooting and restoring from the hibernation image and in turn powering
> down RAM (and therefore losing the contents of stolen).
>
> v2 (Brost)
> - Remove extra newline and drop unnecessary parentheses.
>
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com
> (cherry picked from commit f2a6b8e396666d97ada8e8759dfb6a69d8df6380)
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 74f68289f74c..2a093540354e 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -948,7 +948,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
> if (WARN_ON(!xe_bo_is_pinned(bo)))
> return -EINVAL;
>
> - if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm))
> + if (WARN_ON(xe_bo_is_vram(bo)))
> + return -EINVAL;
> +
> + if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo)))
> return -EINVAL;
>
> if (!mem_type_is_vram(place->mem_type))
> @@ -1723,6 +1726,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
>
> int xe_bo_pin(struct xe_bo *bo)
> {
> + struct ttm_place *place = &bo->placements[0];
> struct xe_device *xe = xe_bo_device(bo);
> int err;
>
> @@ -1753,8 +1757,6 @@ int xe_bo_pin(struct xe_bo *bo)
> */
> if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
> bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> - struct ttm_place *place = &(bo->placements[0]);
> -
> if (mem_type_is_vram(place->mem_type)) {
> xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS);
>
> @@ -1762,13 +1764,12 @@ int xe_bo_pin(struct xe_bo *bo)
> vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT;
> place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
> }
> + }
>
> - if (mem_type_is_vram(place->mem_type) ||
> - bo->flags & XE_BO_FLAG_GGTT) {
> - spin_lock(&xe->pinned.lock);
> - list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> - spin_unlock(&xe->pinned.lock);
> - }
> + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> + spin_lock(&xe->pinned.lock);
> + list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
> + spin_unlock(&xe->pinned.lock);
> }
>
> ttm_bo_pin(&bo->ttm);
> @@ -1816,24 +1817,18 @@ void xe_bo_unpin_external(struct xe_bo *bo)
>
> void xe_bo_unpin(struct xe_bo *bo)
> {
> + struct ttm_place *place = &bo->placements[0];
> struct xe_device *xe = xe_bo_device(bo);
>
> xe_assert(xe, !bo->ttm.base.import_attach);
> xe_assert(xe, xe_bo_is_pinned(bo));
>
> - if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
> - bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
> - struct ttm_place *place = &(bo->placements[0]);
> -
> - if (mem_type_is_vram(place->mem_type) ||
> - bo->flags & XE_BO_FLAG_GGTT) {
> - spin_lock(&xe->pinned.lock);
> - xe_assert(xe, !list_empty(&bo->pinned_link));
> - list_del_init(&bo->pinned_link);
> - spin_unlock(&xe->pinned.lock);
> - }
> + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
> + spin_lock(&xe->pinned.lock);
> + xe_assert(xe, !list_empty(&bo->pinned_link));
> + list_del_init(&bo->pinned_link);
> + spin_unlock(&xe->pinned.lock);
> }
> -
> ttm_bo_unpin(&bo->ttm);
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
> index 32043e1e5a86..b01bc20eb90b 100644
> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
> @@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe)
> u8 id;
> int ret;
>
> - if (!IS_DGFX(xe))
> - return 0;
> -
> /* User memory */
> for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
> struct ttm_resource_manager *man =
> @@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe)
> struct xe_bo *bo;
> int ret;
>
> - if (!IS_DGFX(xe))
> - return 0;
> -
> spin_lock(&xe->pinned.lock);
> for (;;) {
> bo = list_first_entry_or_null(&xe->pinned.evicted,
>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: FAILED: patch "[PATCH] drm/xe: improve hibernation on igpu" failed to apply to 6.11-stable tree
2024-11-18 11:34 ` Matthew Auld
@ 2024-11-18 15:57 ` Lucas De Marchi
0 siblings, 0 replies; 3+ messages in thread
From: Lucas De Marchi @ 2024-11-18 15:57 UTC (permalink / raw)
To: Matthew Auld; +Cc: gregkh, matthew.brost, stable
On Mon, Nov 18, 2024 at 11:34:35AM +0000, Matthew Auld wrote:
>On 17/11/2024 20:36, gregkh@linuxfoundation.org wrote:
>>
>>The patch below does not apply to the 6.11-stable tree.
>>If someone wants it applied there, or to any other stable or longterm
>>tree, then please email the backport, including the original git commit
>>id to <stable@vger.kernel.org>.
>>
>>To reproduce the conflict and resubmit, you may use the following commands:
>>
>>git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
>>git checkout FETCH_HEAD
>>git cherry-pick -x 46f1f4b0f3c2a2dff9887de7c66ccc7ef482bd83
>># <resolve conflicts, build, test, etc.>
>>git commit -s
>>git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2024111758-jumbo-neon-1b3c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
>>
>>Possible dependencies:
>
>There is a dependency with:
>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd886a63d6e2ce5c16e662c07547c067ad7d91f5
>
>I guess we need to get that patch into linux-6.11.y branch also? I
>think we also need both patches for "drm/xe: handle flat ccs during
>hibernation on igpu" to work, even if it applies without.
there are more dependencies... in 6.11 for example we can't simply "use
what we have for dgfx" like we did here, because those patches are not
there. I have a few extra patches to stable that includes it and will
submit soon.
thanks
Lucas De Marchi
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-11-18 15:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-17 20:36 FAILED: patch "[PATCH] drm/xe: improve hibernation on igpu" failed to apply to 6.11-stable tree gregkh
2024-11-18 11:34 ` Matthew Auld
2024-11-18 15:57 ` Lucas De Marchi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox