From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD6E71E5207 for ; Tue, 6 Jan 2026 13:00:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767704450; cv=none; b=ZIcmAw43bUh4vVeKhwDrOK0jLHYSgqzJA0vBZEv4G2xHou5ySrs0d+Fqprn2vU7KddtANvMHCico8f2oNzdQKeLS9Efhk2XXMFvM65Ci/qmCSBt94RdncpGoTdgiotCLDxOrtOMw2DTyHHm8cbjyYS8ioix7jp/emiK37sbMgVk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767704450; c=relaxed/simple; bh=+3RV/7F33iZXKjTwyKgtJm7paVrsAHnmG+Gs011nItI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Xa1GYb4JICZa0MrI55qnUc8hjX1QCF4iqd/IjpzonT7tZ402dJW7j2AMOE7dD5ubjdPJpnqqRdSehd8+NnxofBeGdpmMTb4HpGRYUAot8XcTK5bMMEpj7QyQjR7Axb/UZhgHdmFHRN5QO5wzj4uuDQX6I2CFXcxnC9/q0yvmjtg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fVb9cQ43; arc=none smtp.client-ip=209.85.128.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fVb9cQ43" Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-4779cc419b2so7749855e9.3 for ; Tue, 06 Jan 2026 05:00:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767704447; x=1768309247; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Nbo9CTKl9iI9p+wpU++4dDR3MBazactVN245YZgjtV0=; b=fVb9cQ438E3ubOqbOdixMhleOTihV+rSpRNXGNMvA7g4rysXVjFlR8shwzsEHT9Igb n+RoymNFiBVWajzFOUgHx55s7Pg+/Rl49f4Y3t4i32p3e4PJIt9ek6Jk3aKKDOm/rwRW XmIQjP7Tsm9Lu6kWAWXcVUQQ9MaCKiqmmJwcI+x2Aqbh3Ib+z+dYzyxf1LOSnB96OWT9 HB7vwe7NG9hHnaQCNTBEKCBQbWYMsOaIah6Hf/UfMqEQm7AiqcLT+EsdATWIyd1Lj6o1 hzLlMe+0hAeGt/gP6L1oUG2iFrOKJ3FEr5pez7Ngff8usRldDjZnMj/dsAC4xGZpI/J0 pewg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767704447; x=1768309247; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Nbo9CTKl9iI9p+wpU++4dDR3MBazactVN245YZgjtV0=; b=EC1iKGrlCPZPeHvKno6hCjcwEs+5a9ZvWyQG3hZ5839YI9GuvIw2szSCv3nPwEwvp8 VakUL7JrH+mZ6ArO7PM+OVt0B6gP4FMWRCH4j4dDgdVziKwgNraGsQWXwkoBFbx0dc/c tVl4FqF/1BOHKW8WcYDR3OqSP7Dn1eenLxFUKAF/6OfGteCprYdis9j4Qr4yy5Wp1Ey1 bgb7xjOdB0D3zqzOS4GvzxsInTjfRn7WdIkbnRoIP+UMbQSJA482bOtuxiku94Q+I2Oq RHyVEm7ec6gmVtmofZNatlJptPinMYE186eq7N7E40ilvn1f5/qI+VqMDa/sXDli01Kv POIA== X-Forwarded-Encrypted: i=1; AJvYcCVQkhC3szYGwd/V7mplN/DOF/5uHqnLYbvURQmUVjofCUhaHq6sQWwoAKOC68Y5d1gdK/p4/F6plKPo9S4=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6u/+Tk+ljdvcjpd9v/VxetytwykB+8MrSdzBgvSCgIpQJYPiA QCmNMW1W3mB4radfSlJtDCsDQ5SyC2qbu0MLNUoKRf3Ucb/NzfYYR/B3 X-Gm-Gg: AY/fxX4xej7XRiCw6tCzK22saOz6q66noG1lpqW7wLkbkm02XWgpft5SBhBR261hzjo Z0HCWq+uBXnlON4c6V/XtwasOmL2SBzPbSbICN4bwZdDAIm1cLZJmzHLR21Je/2e27Jue8TAa8r 219167p4Xvjai6D7KZs3MM+ndOv/YOx8I/C9JKL3i92yT7h+CjPPWKUUAHY2bpEhtKCMil4rUyk RHbkjXry1rsKK9H6nDHjIUZdCGgb+LaaI8TVf4/uFzz2dwBhd99/5fAsYTA8fH9ledZKwq1XZWz vx1GxEuA2QRSg3kjW/4lr7allu5j4tifXZnOrakLL9LJJCtuY+X02uqz8YbZKlmw6yRXILj9a1x OLZ3ub0NyjiDay0iUzigyFDIHZZvQgxUAnOfgx4R09IsHArDZ0DC1k2Nd0yIY2n5hEtXUV9fRH8 eGW/2wtNXbmB4bXLB9cuKiYXnWaHOD3SHXJJXadnA= X-Google-Smtp-Source: AGHT+IHhEw9eqkK1KQJPCzdybojiNr+2Qye8Rob2jwtE95qwCczcNJiecqF6pkxIxbyLsgpc1DZoEA== X-Received: by 2002:a05:600c:4704:b0:477:b0b9:312a with SMTP id 5b1f17b1804b1-47d7f066c43mr26446565e9.7.1767704446642; Tue, 06 Jan 2026 05:00:46 -0800 (PST) Received: from ionutnechita-arz2022.local ([2a02:2f0e:ca09:7000:33fc:5cce:3767:6b22]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47d7f68f69dsm42684065e9.1.2026.01.06.05.00.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jan 2026 05:00:46 -0800 (PST) From: "Ionut Nechita (Sunlight Linux)" To: Alex Deucher , =?UTF-8?q?Christian=20K=C3=B6nig?= , Mario Limonciello , Ionut Nechita Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/1] drm/amdgpu: Fix TLB flush failures after hibernation resume Date: Tue, 6 Jan 2026 14:59:30 +0200 Message-ID: <20260106125929.25214-3-sunlightlinux@gmail.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Ionut Nechita Hi, This patch addresses critical TLB flush failures that occur during hibernation resume on AMD GPUs, particularly affecting ROCm workloads. Problem: -------- After resuming from hibernation (S4), the amdgpu driver consistently fails TLB invalidation operations with these errors: amdgpu: TLB flush failed for PASID xxxxx amdgpu: failed to write reg 28b4 wait reg 28c6 amdgpu: failed to write reg 1a6f4 wait reg 1a706 These failures cause compute workloads to malfunction or crash, making hibernation unreliable for systems running ROCm/OpenCL applications. Root Cause: ----------- During resume, the KIQ (Kernel Interface Queue) ring is marked as ready (ring.sched.ready = true) before the GPU hardware has fully initialized. When TLB invalidation attempts to use KIQ for register access during this window, the commands fail because the GPU is not yet stable. Solution: --------- This patch introduces a resume_gpu_stable flag that: - Starts as false during resume - Forces TLB invalidation to use the reliable MMIO path initially - Gets set to true after ring tests pass in gfx_v9_0_cp_resume() - Allows switching to the faster KIQ path once GPU is confirmed stable This ensures TLB flushes work correctly during early resume while still benefiting from KIQ-based invalidation after the GPU is fully operational. Testing: -------- Tested on AMD Cezanne (Renoir) with ROCm workloads across multiple hibernation cycles. The patch eliminates all TLB flush failures and restores reliable hibernation support for compute workloads. Impact: ------- Affects all AMD GPUs using KIQ for TLB invalidation, particularly visible on systems with active compute workloads (ROCm, OpenCL). Ionut Nechita (1): drm/amdgpu: Fix TLB flush failures after hibernation resume drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 +++++++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 +++++- 5 files changed, 29 insertions(+), 3 deletions(-) -- 2.52.0