public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Zhigang Luo <Zhigang.Luo@amd.com>,
	Felix Kuehling <felix.kuehling@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	Felix.Kuehling@amd.com, christian.koenig@amd.com,
	Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.8 33/43] amd/amdkfd: sync all devices to wait all processes being evicted
Date: Mon, 22 Apr 2024 19:14:19 -0400	[thread overview]
Message-ID: <20240422231521.1592991-33-sashal@kernel.org> (raw)
In-Reply-To: <20240422231521.1592991-1-sashal@kernel.org>

From: Zhigang Luo <Zhigang.Luo@amd.com>

[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]

If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc2243..fcf6558d019e5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -944,7 +944,6 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 {
 	struct kfd_node *node;
 	int i;
-	int count;
 
 	if (!kfd->init_complete)
 		return;
@@ -952,12 +951,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 	/* for runtime suspend, skip locking kfd */
 	if (!run_pm) {
 		mutex_lock(&kfd_processes_mutex);
-		count = ++kfd_locked;
-		mutex_unlock(&kfd_processes_mutex);
-
 		/* For first KFD device suspend all the KFD processes */
-		if (count == 1)
+		if (++kfd_locked == 1)
 			kfd_suspend_all_processes();
+		mutex_unlock(&kfd_processes_mutex);
 	}
 
 	for (i = 0; i < kfd->num_nodes; i++) {
@@ -968,7 +965,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 
 int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 {
-	int ret, count, i;
+	int ret, i;
 
 	if (!kfd->init_complete)
 		return 0;
@@ -982,12 +979,10 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 	/* for runtime resume, skip unlocking kfd */
 	if (!run_pm) {
 		mutex_lock(&kfd_processes_mutex);
-		count = --kfd_locked;
-		mutex_unlock(&kfd_processes_mutex);
-
-		WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
-		if (count == 0)
+		if (--kfd_locked == 0)
 			ret = kfd_resume_all_processes();
+		WARN_ONCE(kfd_locked < 0, "KFD suspend / resume ref. error");
+		mutex_unlock(&kfd_processes_mutex);
 	}
 
 	return ret;
-- 
2.43.0


  parent reply	other threads:[~2024-04-22 23:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-22 23:13 [PATCH AUTOSEL 6.8 01/43] tools/power turbostat: Fix added raw MSR output Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 02/43] tools/power turbostat: Increase the limit for fd opened Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 03/43] tools/power turbostat: Fix Bzy_MHz documentation typo Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 04/43] tools/power turbostat: Do not print negative LPI residency Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 05/43] tools/power turbostat: Expand probe_intel_uncore_frequency() Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 06/43] tools/power turbostat: Print ucode revision only if valid Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 07/43] tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 08/43] btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 09/43] btrfs: always clear PERTRANS metadata during commit Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 10/43] memblock tests: fix undefined reference to `early_pfn_to_nid' Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 11/43] memblock tests: fix undefined reference to `panic' Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 12/43] memblock tests: fix undefined reference to `BIT' Sasha Levin
2024-04-22 23:13 ` [PATCH AUTOSEL 6.8 13/43] nouveau/gsp: Avoid addressing beyond end of rpc->entries Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 14/43] scsi: target: Fix SELinux error when systemd-modules loads the target module Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 15/43] scsi: hisi_sas: Handle the NCQ error returned by D2H frame Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 16/43] blk-iocost: avoid out of bounds shift Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 17/43] accel/ivpu: Remove d3hot_after_power_off WA Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 18/43] accel/ivpu: Improve clarity of MMU error messages Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 19/43] accel/ivpu: Fix missed error message after VPU rename Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 20/43] platform/x86: acer-wmi: Add support for Acer PH18-71 Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 21/43] gpu: host1x: Do not setup DMA for virtual devices Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 22/43] MIPS: scall: Save thread_info.syscall unconditionally on entry Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 23/43] tools/power/turbostat: Fix uncore frequency file string Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 24/43] net: add copy_safe_from_sockptr() helper Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 25/43] nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 26/43] drm/amdgpu: Refine IB schedule error logging Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 27/43] drm/amd/display: add DCN 351 version for microcode load Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 28/43] drm/amdgpu: add smu 14.0.1 discovery support Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 29/43] drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 30/43] drm/amd/display: Skip on writeback when it's not applicable Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 31/43] drm/amd/pm: fix the high voltage issue after unload Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 32/43] drm/amdgpu: Fix VCN allocation in CPX partition Sasha Levin
2024-04-22 23:14 ` Sasha Levin [this message]
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 34/43] selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 35/43] Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 36/43] Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 37/43] hv_netvsc: Don't free decrypted memory Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 38/43] uio_hv_generic: " Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 39/43] Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 40/43] drm/xe/xe_migrate: Cast to output precision before multiplying operands Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 41/43] drm/xe: Label RING_CONTEXT_CONTROL as masked Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 42/43] smb3: fix broken reconnect when password changing on the server by allowing password rotation Sasha Levin
2024-04-22 23:14 ` [PATCH AUTOSEL 6.8 43/43] iommu: mtk: fix module autoloading Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240422231521.1592991-33-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Xinhui.Pan@amd.com \
    --cc=Zhigang.Luo@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox