public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@gmail.com>
To: "Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Mario Limonciello" <superm1@kernel.org>,
	"Ionut Nechita" <ionut_n2001@yahoo.com>
Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 0/1] drm/amdgpu: Fix TLB flush failures after hibernation resume
Date: Tue,  6 Jan 2026 14:59:30 +0200	[thread overview]
Message-ID: <20260106125929.25214-3-sunlightlinux@gmail.com> (raw)

From: Ionut Nechita <ionut_n2001@yahoo.com>

Hi,

This patch addresses critical TLB flush failures that occur during
hibernation resume on AMD GPUs, particularly affecting ROCm workloads.

Problem:
--------
After resuming from hibernation (S4), the amdgpu driver consistently
fails TLB invalidation operations with these errors:

  amdgpu: TLB flush failed for PASID xxxxx
  amdgpu: failed to write reg 28b4 wait reg 28c6
  amdgpu: failed to write reg 1a6f4 wait reg 1a706

These failures cause compute workloads to malfunction or crash, making
hibernation unreliable for systems running ROCm/OpenCL applications.

Root Cause:
-----------
During resume, the KIQ (Kernel Interface Queue) ring is marked as ready
(ring.sched.ready = true) before the GPU hardware has fully initialized.
When TLB invalidation attempts to use KIQ for register access during
this window, the commands fail because the GPU is not yet stable.

Solution:
---------
This patch introduces a resume_gpu_stable flag that:
- Starts as false during resume
- Forces TLB invalidation to use the reliable MMIO path initially
- Gets set to true after ring tests pass in gfx_v9_0_cp_resume()
- Allows switching to the faster KIQ path once GPU is confirmed stable

This ensures TLB flushes work correctly during early resume while still
benefiting from KIQ-based invalidation after the GPU is fully operational.

Testing:
--------
Tested on AMD Cezanne (Renoir) with ROCm workloads across multiple
hibernation cycles. The patch eliminates all TLB flush failures and
restores reliable hibernation support for compute workloads.

Impact:
-------
Affects all AMD GPUs using KIQ for TLB invalidation, particularly
visible on systems with active compute workloads (ROCm, OpenCL).

Ionut Nechita (1):
  drm/amdgpu: Fix TLB flush failures after hibernation resume

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    |  9 +++++++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 10 ++++++++++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      |  6 +++++-
 5 files changed, 29 insertions(+), 3 deletions(-)

-- 
2.52.0


             reply	other threads:[~2026-01-06 13:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06 12:59 Ionut Nechita (Sunlight Linux) [this message]
2026-01-06 12:59 ` [PATCH 1/1] drm/amdgpu: Fix TLB flush failures after hibernation resume Ionut Nechita (Sunlight Linux)
2026-01-06 16:26   ` Alex Deucher
2026-01-07 10:52     ` Ionut Nechita (Sunlight Linux)
2026-01-08 12:36   ` Christian König
2026-01-26 19:40     ` Ionut Nechita (Sunlight Linux)
2026-01-26 20:25       ` Alex Deucher
2026-01-26 20:28         ` Mario Limonciello (AMD) (kernel.org)
2026-01-26 20:32           ` Mario Limonciello (AMD) (kernel.org)
2026-01-26 20:46             ` Ionut Nechita (Sunlight Linux)
2026-01-26 20:37         ` Ionut Nechita (Sunlight Linux)
2026-01-27 11:35           ` Christian König
2026-02-01 19:05         ` Ionut Nechita (Sunlight Linux)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260106125929.25214-3-sunlightlinux@gmail.com \
    --to=sunlightlinux@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ionut_n2001@yahoo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox