From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 305E8296BCF; Thu, 12 Mar 2026 20:19:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773346750; cv=none; b=fmdgKtuxDzpjnCJkFimn/6tQ7R0ZXgtVzrTVDYP7vDGsh2kmXFYL17i1hAzU47YwELhw68azcG4Y8CKTvRPypKihb7HsZdfilmgSPYRtIfDc9HHU9UcIUqsd8jZMCWzM0TgwCBCtSElGufYzqGPPQYza0/ZB+TeIBpaluMqyYys= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773346750; c=relaxed/simple; bh=9BKmaPkcvpsjYNk3twfiSDJ7CfNM9FVceuWwPcDtbg0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J0twCkjtiVd+FnD80AZW9dDCsd7L1JABSpOj6/HU9qSczJiPJPu76iZKQYsyOr9jzXStCh2txnJBcCh80zz+uMN/bauL505GGvpz+LESuYMlMYrnDSGOYg9N3oXG82AMSKf8U3XpcYga4JBuWwAQEJeUPZZ5p3IgHpw7CKbM6dI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=UyeiTedB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="UyeiTedB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73A4AC4CEF7; Thu, 12 Mar 2026 20:19:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1773346750; bh=9BKmaPkcvpsjYNk3twfiSDJ7CfNM9FVceuWwPcDtbg0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UyeiTedB5c0g4tQf2sSREKjit3rYTZvA6VlKfYrXg1BFaV0CZoUslyBYTrCMoNn2w awcvRn8vSZsz5kw3EF5OevtEMYOdru+NpQCaLJIAthlfOC62ppCmglXuYPNoENh9Qf FAwf8tukIuGorqZrf1RtovNcsmmhE9ZvixuDyUwQ= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Cal Peake , Alex Deucher , Mario Limonciello , Sasha Levin Subject: [PATCH 6.12 113/265] drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected() Date: Thu, 12 Mar 2026 21:08:20 +0100 Message-ID: <20260312201022.320156415@linuxfoundation.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260312201018.128816016@linuxfoundation.org> References: <20260312201018.128816016@linuxfoundation.org> User-Agent: quilt/0.69 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.12-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mario Limonciello [ Upstream commit f7afda7fcd169a9168695247d07ad94cf7b9798f ] The commit 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") introduced early KFD cleanup when drm_dev_is_unplugged() returns true. However, this causes hangs during normal module unload (rmmod amdgpu). The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove() for all removal scenarios, not just surprise disconnects. This was done intentionally in commit 39934d3ed572 ("Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"") to fix IGT PCI software unplug test failures. As a result, drm_dev_is_unplugged() returns true even during normal module unload, triggering the early KFD cleanup inappropriately. The correct check should distinguish between: - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected() returns true - Normal module unload (rmmod): pci_dev_is_disconnected() returns false Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure the early cleanup only happens during true hardware disconnect events. Cc: stable@vger.kernel.org Reported-by: Cal Peake Closes: https://lore.kernel.org/all/b0c22deb-c0fa-3343-33cf-fd9a77d7db99@absolutedigital.net/ Fixes: 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") Acked-by: Alex Deucher Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index cab75f5c9f2fd..361184355e232 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4648,7 +4648,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) * before ip_fini_early to prevent kfd locking refcount issues by calling * amdgpu_amdkfd_suspend() */ - if (drm_dev_is_unplugged(adev_to_drm(adev))) + if (pci_dev_is_disconnected(adev->pdev)) amdgpu_amdkfd_device_fini_sw(adev); amdgpu_device_ip_fini_early(adev); @@ -4660,7 +4660,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_gart_dummy_page_fini(adev); - if (drm_dev_is_unplugged(adev_to_drm(adev))) + if (pci_dev_is_disconnected(adev->pdev)) amdgpu_device_unmap_mmio(adev); } -- 2.51.0