From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Lang Yu" <Lang.Yu@amd.com>,
"Christian KÃnig" <christian.koenig@amd.com>,
"Andrey Grodzovsky" <andrey.grodzovsky@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Sasha Levin" <sashal@kernel.org>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.10 58/62] drm/amd/amdgpu: fix a potential deadlock in gpu reset
Date: Mon, 24 May 2021 10:47:39 -0400 [thread overview]
Message-ID: <20210524144744.2497894-58-sashal@kernel.org> (raw)
In-Reply-To: <20210524144744.2497894-1-sashal@kernel.org>
From: Lang Yu <Lang.Yu@amd.com>
[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ]
When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:
[ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[ 806.290952] [drm] free PSP TMR buffer
[ 806.319406] ============================================
[ 806.320315] WARNING: possible recursive locking detected
[ 806.321225] 5.11.0-custom #1 Tainted: G W OEL
[ 806.322135] --------------------------------------------
[ 806.323043] cat/2593 is trying to acquire lock:
[ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.325668]
but task is already holding lock:
[ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.328430]
other info that might help us debug this:
[ 806.329539] Possible unsafe locking scenario:
[ 806.330549] CPU0
[ 806.330983] ----
[ 806.331416] lock(&adev->dm.dc_lock);
[ 806.332086] lock(&adev->dm.dc_lock);
[ 806.332738]
*** DEADLOCK ***
[ 806.333747] May be due to missing lock nesting notation
[ 806.334899] 3 locks held by cat/2593:
[ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.340869]
stack backtrace:
[ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1
[ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[ 806.344413] Call Trace:
[ 806.344849] dump_stack+0x93/0xbd
[ 806.345435] __lock_acquire.cold+0x18a/0x2cf
[ 806.346179] lock_acquire+0xca/0x390
[ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.347813] __mutex_lock+0x9b/0x930
[ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50
[ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.354064] mutex_lock_nested+0x1b/0x20
[ 806.354747] ? mutex_lock_nested+0x1b/0x20
[ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7f2689d4b86d..87c7c45f1bb7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4368,7 +4368,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
r = amdgpu_ib_ring_tests(tmp_adev);
if (r) {
dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r);
- r = amdgpu_device_ip_suspend(tmp_adev);
need_full_reset = true;
r = -EAGAIN;
goto end;
--
2.30.2
next prev parent reply other threads:[~2021-05-24 14:56 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-24 14:46 [PATCH AUTOSEL 5.10 01/62] platform/x86: hp_accel: Avoid invoking _INI to speed up resume Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 02/62] gpio: cadence: Add missing MODULE_DEVICE_TABLE Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 03/62] Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 04/62] Revert "media: usb: gspca: add a missed check for goto_low_power" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 05/62] Revert "ALSA: sb: fix a missing check of snd_ctl_add" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 06/62] Revert "serial: max310x: pass return value of spi_register_driver" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 07/62] serial: max310x: unregister uart driver in case of failure and abort Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 08/62] Revert "net: fujitsu: fix a potential NULL pointer dereference" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 09/62] net: fujitsu: fix potential null-ptr-deref Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 10/62] Revert "net/smc: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 11/62] net/smc: properly handle workqueue allocation failure Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 12/62] Revert "net: caif: replace BUG_ON with recovery code" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 13/62] net: caif: remove BUG_ON(dev == NULL) in caif_xmit Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 14/62] Revert "char: hpet: fix a missing check of ioremap" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 15/62] char: hpet: add checks after calling ioremap Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 16/62] Revert "ALSA: gus: add a check of the status of snd_ctl_add" Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 17/62] ALSA: sb8: Add a comment note regarding an unused pointer Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.10 18/62] Revert "ALSA: usx2y: Fix potential NULL pointer dereference" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 19/62] Revert "isdn: mISDNinfineon: fix " Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 20/62] isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 21/62] Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 22/62] ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd() Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 23/62] Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 24/62] isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 25/62] Revert "dmaengine: qcom_hidma: Check for driver register failure" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 26/62] dmaengine: qcom_hidma: comment platform_driver_register call Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 27/62] Revert "libertas: add checks for the return value of sysfs_create_group" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 28/62] libertas: register sysfs groups properly Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 29/62] Revert "ASoC: rt5645: fix a NULL pointer dereference" Sasha Levin
2021-05-25 22:00 ` Mark Brown
2021-05-26 0:58 ` Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 30/62] ASoC: rt5645: add error checking to rt5645_probe function Sasha Levin
2021-05-25 21:49 ` Mark Brown
2021-05-25 22:15 ` Phillip Potter
2021-05-26 10:28 ` Mark Brown
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 31/62] Revert "ASoC: cs43130: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 32/62] ASoC: cs43130: handle errors in cs43130_probe() properly Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 33/62] Revert "media: dvb: Add check on sp8870_readreg" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 34/62] media: dvb: Add check on sp8870_readreg return Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 35/62] Revert "media: gspca: mt9m111: Check write_bridge for timeout" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 36/62] media: gspca: mt9m111: Check write_bridge for timeout Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 37/62] Revert "media: gspca: Check the return value of write_bridge for timeout" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 38/62] media: gspca: properly check for errors in po1030_probe() Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 39/62] Revert "net: liquidio: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 40/62] net: liquidio: Add missing null pointer checks Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 41/62] Revert "brcmfmac: add a check for the status of usb_register" Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 42/62] brcmfmac: properly check for bus register errors Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 43/62] cdrom: gdrom: initialize global variable at init time Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 44/62] btrfs: return whole extents in fiemap Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 45/62] scsi: ufs: ufs-mediatek: Fix power down spec violation Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 46/62] scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 47/62] openrisc: Define memory barrier mb Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 48/62] scsi: pm80xx: Fix drives missing during rmmod/insmod loop Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 49/62] btrfs: release path before starting transaction when cloning inline extent Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 50/62] btrfs: do not BUG_ON in link_to_fixup_dir Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 51/62] platform/x86: hp-wireless: add AMD's hardware id to the supported list Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 52/62] platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 53/62] platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 54/62] SMB3: incorrect file id in requests compounded with open Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 55/62] drm/amd/display: Disconnect non-DP with no EDID Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 56/62] drm/amd/amdgpu: fix refcount leak Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 57/62] drm/amdgpu: Fix a use-after-free Sasha Levin
2021-05-24 14:47 ` Sasha Levin [this message]
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 59/62] drm/amdgpu: stop touching sched.ready in the backend Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 60/62] platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 61/62] block: fix a race between del_gendisk and BLKRRPART Sasha Levin
2021-05-24 14:47 ` [PATCH AUTOSEL 5.10 62/62] linux/bits.h: fix compilation error with GENMASK Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210524144744.2497894-58-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Lang.Yu@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=andrey.grodzovsky@amd.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox