From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C5DAC77B7C for ; Thu, 11 May 2023 19:42:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239429AbjEKTmB (ORCPT ); Thu, 11 May 2023 15:42:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239427AbjEKTle (ORCPT ); Thu, 11 May 2023 15:41:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AD24D879; Thu, 11 May 2023 12:40:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E101465118; Thu, 11 May 2023 19:39:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3D79C433EF; Thu, 11 May 2023 19:39:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1683833993; bh=VvDpMmnOjRzqKFHIoHJVTqhB5W+jslLgc+TiZ7+ROAg=; h=From:To:Cc:Subject:Date:From; b=LzTkXgc13zpSBibBqobUKanAdMPDIgyvoAQA7700FhowjUjLXL4P8qQUx3DA2C4Sl ya1PQnoCp9Tzt8giHlzShtj3YXDC1kqgh87t47SiVPOaQrp9BfUmR6mtmUOjVrgr5E b6F9apCYHt1zfqnF9ITascabE1OpdrT1vQBk22ZgGMF64Xcw4UYYCBM1Quk3D7Z5BB TifoPkTEldjD8Qf+AEqwI6aaA7QUFMU5GXcief81pXB7NrdgzUVtAXphzhpnBahJCB I0KrgyokIOc+rE0UvAg8fKNxI/LDJhBPcC/QJUbaJj+m53MkpCMPd9P1Z77K4WrRvR GKmIBT/Gtf9Pg== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Chong Li , JingWen.Chen2@amd.com, Alex Deucher , Sasha Levin , christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, Hawking.Zhang@amd.com, mario.limonciello@amd.com, lijo.lazar@amd.com, YiPeng.Chai@amd.com, andrey.grodzovsky@amd.com, Amaranath.Somalapuram@amd.com, Bokun.Zhang@amd.com, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.1 1/9] drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init" Date: Thu, 11 May 2023 15:39:34 -0400 Message-Id: <20230511193945.623476-1-sashal@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Chong Li [ Upstream commit 38eecbe086a4e52f54b2bbda8feba65d44addbef ] [WHY] Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context. We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()". [HOW] Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()", to ensure that registers be accessed through RLCG instead of KIQ. Call Trace: show_stack+0x52/0x69 dump_stack_lvl+0x49/0x6d dump_stack+0x10/0x18 __schedule_bug.cold+0x4f/0x6b __schedule+0x473/0x5d0 ? __wake_up_klogd.part.0+0x40/0x70 ? vprintk_emit+0xbe/0x1f0 schedule+0x68/0x110 schedule_timeout+0x87/0x160 ? timer_migration_handler+0xa0/0xa0 msleep+0x2d/0x50 amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu] amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu] amdgpu_device_rreg+0x3a/0x50 [amdgpu] amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu] gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu] gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu] amdgpu_irq_update+0x53/0x80 [amdgpu] amdgpu_irq_get+0x7c/0xb0 [amdgpu] amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu] amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu] Signed-off-by: Chong Li Reviewed-by: JingWen.Chen2@amd.com Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 32 ++++++++++++---------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9df5dcedaf3e2..494e8ce52af22 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2511,8 +2511,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) amdgpu_fru_get_product_info(adev); init_failed: - if (amdgpu_sriov_vf(adev)) - amdgpu_virt_release_full_gpu(adev, true); return r; } @@ -3837,18 +3835,6 @@ int amdgpu_device_init(struct amdgpu_device *adev, r = amdgpu_device_ip_init(adev); if (r) { - /* failed in exclusive mode due to timeout */ - if (amdgpu_sriov_vf(adev) && - !amdgpu_sriov_runtime(adev) && - amdgpu_virt_mmio_blocked(adev) && - !amdgpu_virt_wait_reset(adev)) { - dev_err(adev->dev, "VF exclusive mode timeout\n"); - /* Don't send request since VF is inactive. */ - adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME; - adev->virt.ops = NULL; - r = -EAGAIN; - goto release_ras_con; - } dev_err(adev->dev, "amdgpu_device_ip_init failed\n"); amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 0); goto release_ras_con; @@ -3920,8 +3906,10 @@ int amdgpu_device_init(struct amdgpu_device *adev, msecs_to_jiffies(AMDGPU_RESUME_MS)); } - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev)) { + amdgpu_virt_release_full_gpu(adev, true); flush_delayed_work(&adev->delayed_init_work); + } r = sysfs_create_files(&adev->dev->kobj, amdgpu_dev_attributes); if (r) @@ -3958,6 +3946,20 @@ int amdgpu_device_init(struct amdgpu_device *adev, return 0; release_ras_con: + if (amdgpu_sriov_vf(adev)) + amdgpu_virt_release_full_gpu(adev, true); + + /* failed in exclusive mode due to timeout */ + if (amdgpu_sriov_vf(adev) && + !amdgpu_sriov_runtime(adev) && + amdgpu_virt_mmio_blocked(adev) && + !amdgpu_virt_wait_reset(adev)) { + dev_err(adev->dev, "VF exclusive mode timeout\n"); + /* Don't send request since VF is inactive. */ + adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME; + adev->virt.ops = NULL; + r = -EAGAIN; + } amdgpu_release_ras_context(adev); failed: -- 2.39.2