From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9B06C761AF for ; Sat, 1 Apr 2023 01:45:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233746AbjDABpP (ORCPT ); Fri, 31 Mar 2023 21:45:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233749AbjDABo0 (ORCPT ); Fri, 31 Mar 2023 21:44:26 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B2E323685; Fri, 31 Mar 2023 18:43:43 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7AABDB83301; Sat, 1 Apr 2023 01:43:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96DB2C433D2; Sat, 1 Apr 2023 01:43:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680313421; bh=7H5totHyxq4yVx5iYXOEPlYPqoU+uIYlMPC64nUrdwM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XtQbddcQYVSSB7Ww2/TNrj9z9NeZJttypjjwTqxhm4hCHbNTAOgqCSAaRxJzJnuqB 8z7estf+dD3t2PWe6k9Vu9adKJLr/qty2HNAQnSlln7f6VliHU90lGF3p8uHHq37Mp nB0XOWGBTbvjP01/SOez/XhobyEJrn6cVZ2ePKYsj5JjDf4NGEto7BZ6jQ5SuhC17q S8tfvtUGHPwydu6Yk73/BRJz01Hy2SlDQF18CM19pE4YGUQEVjSc3hxGKeI3L3p1Mv jfz6QxmU/8kUYr7lb3Xe62L/KBfUHnEfTZtOs5froxSYHt1J5HeaGlnUNek/pINbTs zx/58O/YRi1MQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: YuBiao Wang , Luben Tuikov , Alex Deucher , Sasha Levin , christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, andrey.grodzovsky@amd.com, Jiadong.Zhu@amd.com, gpiccoli@igalia.com, Jack.Xiao@amd.com, Likun.Gao@amd.com, slark_xiao@163.com, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.1 22/24] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs Date: Fri, 31 Mar 2023 21:42:38 -0400 Message-Id: <20230401014242.3356780-22-sashal@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230401014242.3356780-1-sashal@kernel.org> References: <20230401014242.3356780-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: YuBiao Wang [ Upstream commit 033c56474acf567a450f8bafca50e0b610f2b716 ] [Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang. [How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete. Signed-off-by: YuBiao Wang Acked-by: Luben Tuikov Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 6fdb679321d0d..3cc1929285fc0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -624,6 +624,15 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) ptr = &ring->fence_drv.fences[i]; old = rcu_dereference_protected(*ptr, 1); if (old && old->ops == &amdgpu_job_fence_ops) { + struct amdgpu_job *job; + + /* For non-scheduler bad job, i.e. failed ib test, we need to signal + * it right here or we won't be able to track them in fence_drv + * and they will remain unsignaled during sa_bo free. + */ + job = container_of(old, struct amdgpu_job, hw_fence); + if (!job->base.s_fence && !dma_fence_is_signaled(old)) + dma_fence_signal(old); RCU_INIT_POINTER(*ptr, NULL); dma_fence_put(old); } -- 2.39.2