From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Lang Yu" <Lang.Yu@amd.com>,
"Christian KÃnig" <christian.koenig@amd.com>,
"Andrey Grodzovsky" <andrey.grodzovsky@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Sasha Levin" <sashal@kernel.org>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.12 59/63] drm/amd/amdgpu: fix a potential deadlock in gpu reset
Date: Mon, 24 May 2021 10:46:16 -0400 [thread overview]
Message-ID: <20210524144620.2497249-59-sashal@kernel.org> (raw)
In-Reply-To: <20210524144620.2497249-1-sashal@kernel.org>
From: Lang Yu <Lang.Yu@amd.com>
[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ]
When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:
[ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[ 806.290952] [drm] free PSP TMR buffer
[ 806.319406] ============================================
[ 806.320315] WARNING: possible recursive locking detected
[ 806.321225] 5.11.0-custom #1 Tainted: G W OEL
[ 806.322135] --------------------------------------------
[ 806.323043] cat/2593 is trying to acquire lock:
[ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.325668]
but task is already holding lock:
[ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.328430]
other info that might help us debug this:
[ 806.329539] Possible unsafe locking scenario:
[ 806.330549] CPU0
[ 806.330983] ----
[ 806.331416] lock(&adev->dm.dc_lock);
[ 806.332086] lock(&adev->dm.dc_lock);
[ 806.332738]
*** DEADLOCK ***
[ 806.333747] May be due to missing lock nesting notation
[ 806.334899] 3 locks held by cat/2593:
[ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.340869]
stack backtrace:
[ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1
[ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[ 806.344413] Call Trace:
[ 806.344849] dump_stack+0x93/0xbd
[ 806.345435] __lock_acquire.cold+0x18a/0x2cf
[ 806.346179] lock_acquire+0xca/0x390
[ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.347813] __mutex_lock+0x9b/0x930
[ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50
[ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.354064] mutex_lock_nested+0x1b/0x20
[ 806.354747] ? mutex_lock_nested+0x1b/0x20
[ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5eee251e3335..85d90e857693 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4356,7 +4356,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
r = amdgpu_ib_ring_tests(tmp_adev);
if (r) {
dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r);
- r = amdgpu_device_ip_suspend(tmp_adev);
need_full_reset = true;
r = -EAGAIN;
goto end;
--
2.30.2
next prev parent reply other threads:[~2021-05-24 14:51 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-24 14:45 [PATCH AUTOSEL 5.12 01/63] platform/x86: hp_accel: Avoid invoking _INI to speed up resume Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 02/63] gpio: cadence: Add missing MODULE_DEVICE_TABLE Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 03/63] Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 04/63] Revert "media: usb: gspca: add a missed check for goto_low_power" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 05/63] Revert "ALSA: sb: fix a missing check of snd_ctl_add" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 06/63] Revert "serial: max310x: pass return value of spi_register_driver" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 07/63] serial: max310x: unregister uart driver in case of failure and abort Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 08/63] Revert "net: fujitsu: fix a potential NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 09/63] net: fujitsu: fix potential null-ptr-deref Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 10/63] Revert "net/smc: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 11/63] net/smc: properly handle workqueue allocation failure Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 12/63] Revert "net: caif: replace BUG_ON with recovery code" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 13/63] net: caif: remove BUG_ON(dev == NULL) in caif_xmit Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 14/63] Revert "char: hpet: fix a missing check of ioremap" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 15/63] char: hpet: add checks after calling ioremap Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 16/63] Revert "ALSA: gus: add a check of the status of snd_ctl_add" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 17/63] ALSA: sb8: Add a comment note regarding an unused pointer Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 18/63] Revert "ALSA: usx2y: Fix potential NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 19/63] Revert "isdn: mISDNinfineon: fix " Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 20/63] isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 21/63] Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 22/63] ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd() Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 23/63] Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 24/63] isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 25/63] Revert "dmaengine: qcom_hidma: Check for driver register failure" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 26/63] dmaengine: qcom_hidma: comment platform_driver_register call Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 27/63] Revert "libertas: add checks for the return value of sysfs_create_group" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 28/63] libertas: register sysfs groups properly Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 29/63] Revert "ASoC: rt5645: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 30/63] ASoC: rt5645: add error checking to rt5645_probe function Sasha Levin
2021-05-25 14:01 ` Mark Brown
2021-05-25 14:44 ` Greg Kroah-Hartman
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 31/63] Revert "ASoC: cs43130: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 32/63] ASoC: cs43130: handle errors in cs43130_probe() properly Sasha Levin
2021-05-25 14:00 ` Mark Brown
2021-05-25 14:43 ` Greg Kroah-Hartman
2021-05-25 22:17 ` Mark Brown
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 33/63] Revert "media: dvb: Add check on sp8870_readreg" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 34/63] media: dvb: Add check on sp8870_readreg return Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 35/63] Revert "media: gspca: mt9m111: Check write_bridge for timeout" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 36/63] media: gspca: mt9m111: Check write_bridge for timeout Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 37/63] Revert "media: gspca: Check the return value of write_bridge for timeout" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 38/63] media: gspca: properly check for errors in po1030_probe() Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 39/63] Revert "net: liquidio: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 40/63] net: liquidio: Add missing null pointer checks Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 41/63] Revert "brcmfmac: add a check for the status of usb_register" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 42/63] brcmfmac: properly check for bus register errors Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 43/63] cdrom: gdrom: initialize global variable at init time Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 44/63] btrfs: return whole extents in fiemap Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 45/63] scsi: ufs: ufs-mediatek: Fix power down spec violation Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 46/63] scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 47/63] openrisc: Define memory barrier mb Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 48/63] scsi: pm80xx: Fix drives missing during rmmod/insmod loop Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 49/63] btrfs: release path before starting transaction when cloning inline extent Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 50/63] btrfs: do not BUG_ON in link_to_fixup_dir Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 51/63] ALSA: dice: disable double_pcm_frames mode for M-Audio Profire 610, 2626 and Avid M-Box 3 Pro Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 52/63] platform/x86: hp-wireless: add AMD's hardware id to the supported list Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 53/63] platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 54/63] platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 55/63] SMB3: incorrect file id in requests compounded with open Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 56/63] drm/amd/display: Disconnect non-DP with no EDID Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 57/63] drm/amd/amdgpu: fix refcount leak Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 58/63] drm/amdgpu: Fix a use-after-free Sasha Levin
2021-05-24 14:46 ` Sasha Levin [this message]
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 60/63] drm/amdgpu: stop touching sched.ready in the backend Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 61/63] platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 62/63] block: fix a race between del_gendisk and BLKRRPART Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 63/63] linux/bits.h: fix compilation error with GENMASK Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210524144620.2497249-59-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Lang.Yu@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=andrey.grodzovsky@amd.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox