public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Zhigang Luo <Zhigang.Luo@amd.com>,
	Felix Kuehling <felix.kuehling@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	Felix.Kuehling@amd.com, christian.koenig@amd.com,
	Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.6 21/29] amd/amdkfd: sync all devices to wait all processes being evicted
Date: Mon, 22 Apr 2024 19:17:02 -0400	[thread overview]
Message-ID: <20240422231730.1601976-21-sashal@kernel.org> (raw)
In-Reply-To: <20240422231730.1601976-1-sashal@kernel.org>

From: Zhigang Luo <Zhigang.Luo@amd.com>

[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]

If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 93ce181eb3baa..913c70a0ef44f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -935,7 +935,6 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 {
 	struct kfd_node *node;
 	int i;
-	int count;
 
 	if (!kfd->init_complete)
 		return;
@@ -943,12 +942,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 	/* for runtime suspend, skip locking kfd */
 	if (!run_pm) {
 		mutex_lock(&kfd_processes_mutex);
-		count = ++kfd_locked;
-		mutex_unlock(&kfd_processes_mutex);
-
 		/* For first KFD device suspend all the KFD processes */
-		if (count == 1)
+		if (++kfd_locked == 1)
 			kfd_suspend_all_processes();
+		mutex_unlock(&kfd_processes_mutex);
 	}
 
 	for (i = 0; i < kfd->num_nodes; i++) {
@@ -959,7 +956,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 
 int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 {
-	int ret, count, i;
+	int ret, i;
 
 	if (!kfd->init_complete)
 		return 0;
@@ -973,12 +970,10 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 	/* for runtime resume, skip unlocking kfd */
 	if (!run_pm) {
 		mutex_lock(&kfd_processes_mutex);
-		count = --kfd_locked;
-		mutex_unlock(&kfd_processes_mutex);
-
-		WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
-		if (count == 0)
+		if (--kfd_locked == 0)
 			ret = kfd_resume_all_processes();
+		WARN_ONCE(kfd_locked < 0, "KFD suspend / resume ref. error");
+		mutex_unlock(&kfd_processes_mutex);
 	}
 
 	return ret;
-- 
2.43.0


  parent reply	other threads:[~2024-04-22 23:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-22 23:16 [PATCH AUTOSEL 6.6 01/29] tools/power turbostat: Fix added raw MSR output Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 02/29] tools/power turbostat: Increase the limit for fd opened Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 03/29] tools/power turbostat: Fix Bzy_MHz documentation typo Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 04/29] tools/power turbostat: Print ucode revision only if valid Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 05/29] tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 06/29] btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 07/29] btrfs: always clear PERTRANS metadata during commit Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 08/29] memblock tests: fix undefined reference to `early_pfn_to_nid' Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 09/29] memblock tests: fix undefined reference to `panic' Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 10/29] memblock tests: fix undefined reference to `BIT' Sasha Levin
2024-04-23  2:14   ` Suren Baghdasaryan
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 11/29] scsi: target: Fix SELinux error when systemd-modules loads the target module Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 12/29] scsi: hisi_sas: Handle the NCQ error returned by D2H frame Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 13/29] blk-iocost: avoid out of bounds shift Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 14/29] gpu: host1x: Do not setup DMA for virtual devices Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 15/29] MIPS: scall: Save thread_info.syscall unconditionally on entry Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 16/29] tools/power/turbostat: Fix uncore frequency file string Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 17/29] drm/amdgpu: Refine IB schedule error logging Sasha Levin
2024-04-22 23:16 ` [PATCH AUTOSEL 6.6 18/29] drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 19/29] drm/amd/display: Skip on writeback when it's not applicable Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 20/29] drm/amdgpu: Fix VCN allocation in CPX partition Sasha Levin
2024-04-22 23:17 ` Sasha Levin [this message]
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 22/29] selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 23/29] Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 24/29] Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 25/29] hv_netvsc: Don't free decrypted memory Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 26/29] uio_hv_generic: " Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 27/29] Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 28/29] smb3: fix broken reconnect when password changing on the server by allowing password rotation Sasha Levin
2024-04-22 23:17 ` [PATCH AUTOSEL 6.6 29/29] iommu: mtk: fix module autoloading Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240422231730.1601976-21-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Xinhui.Pan@amd.com \
    --cc=Zhigang.Luo@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox