public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Lang Yu" <Lang.Yu@amd.com>,
	"Christian KÃnig" <christian.koenig@amd.com>,
	"Andrey Grodzovsky" <andrey.grodzovsky@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Sasha Levin" <sashal@kernel.org>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.12 59/63] drm/amd/amdgpu: fix a potential deadlock in gpu reset
Date: Mon, 24 May 2021 10:46:16 -0400	[thread overview]
Message-ID: <20210524144620.2497249-59-sashal@kernel.org> (raw)
In-Reply-To: <20210524144620.2497249-1-sashal@kernel.org>

From: Lang Yu <Lang.Yu@amd.com>

[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ]

When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:

[  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[  806.290952] [drm] free PSP TMR buffer

[  806.319406] ============================================
[  806.320315] WARNING: possible recursive locking detected
[  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
[  806.322135] --------------------------------------------
[  806.323043] cat/2593 is trying to acquire lock:
[  806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.325668]
               but task is already holding lock:
[  806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.328430]
               other info that might help us debug this:
[  806.329539]  Possible unsafe locking scenario:

[  806.330549]        CPU0
[  806.330983]        ----
[  806.331416]   lock(&adev->dm.dc_lock);
[  806.332086]   lock(&adev->dm.dc_lock);
[  806.332738]
                *** DEADLOCK ***

[  806.333747]  May be due to missing lock nesting notation

[  806.334899] 3 locks held by cat/2593:
[  806.335537]  #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[  806.337009]  #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  806.339018]  #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.340869]
               stack backtrace:
[  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
[  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[  806.344413] Call Trace:
[  806.344849]  dump_stack+0x93/0xbd
[  806.345435]  __lock_acquire.cold+0x18a/0x2cf
[  806.346179]  lock_acquire+0xca/0x390
[  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.347813]  __mutex_lock+0x9b/0x930
[  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
[  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.354064]  mutex_lock_nested+0x1b/0x20
[  806.354747]  ? mutex_lock_nested+0x1b/0x20
[  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5eee251e3335..85d90e857693 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4356,7 +4356,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
 			r = amdgpu_ib_ring_tests(tmp_adev);
 			if (r) {
 				dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r);
-				r = amdgpu_device_ip_suspend(tmp_adev);
 				need_full_reset = true;
 				r = -EAGAIN;
 				goto end;
-- 
2.30.2


  parent reply	other threads:[~2021-05-24 14:51 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-24 14:45 [PATCH AUTOSEL 5.12 01/63] platform/x86: hp_accel: Avoid invoking _INI to speed up resume Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 02/63] gpio: cadence: Add missing MODULE_DEVICE_TABLE Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 03/63] Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 04/63] Revert "media: usb: gspca: add a missed check for goto_low_power" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 05/63] Revert "ALSA: sb: fix a missing check of snd_ctl_add" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 06/63] Revert "serial: max310x: pass return value of spi_register_driver" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 07/63] serial: max310x: unregister uart driver in case of failure and abort Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 08/63] Revert "net: fujitsu: fix a potential NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 09/63] net: fujitsu: fix potential null-ptr-deref Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 10/63] Revert "net/smc: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 11/63] net/smc: properly handle workqueue allocation failure Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 12/63] Revert "net: caif: replace BUG_ON with recovery code" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 13/63] net: caif: remove BUG_ON(dev == NULL) in caif_xmit Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 14/63] Revert "char: hpet: fix a missing check of ioremap" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 15/63] char: hpet: add checks after calling ioremap Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 16/63] Revert "ALSA: gus: add a check of the status of snd_ctl_add" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 17/63] ALSA: sb8: Add a comment note regarding an unused pointer Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 18/63] Revert "ALSA: usx2y: Fix potential NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 19/63] Revert "isdn: mISDNinfineon: fix " Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 20/63] isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 21/63] Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 22/63] ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd() Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 23/63] Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 24/63] isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 25/63] Revert "dmaengine: qcom_hidma: Check for driver register failure" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 26/63] dmaengine: qcom_hidma: comment platform_driver_register call Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 27/63] Revert "libertas: add checks for the return value of sysfs_create_group" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 28/63] libertas: register sysfs groups properly Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 29/63] Revert "ASoC: rt5645: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 30/63] ASoC: rt5645: add error checking to rt5645_probe function Sasha Levin
2021-05-25 14:01   ` Mark Brown
2021-05-25 14:44     ` Greg Kroah-Hartman
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 31/63] Revert "ASoC: cs43130: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 32/63] ASoC: cs43130: handle errors in cs43130_probe() properly Sasha Levin
2021-05-25 14:00   ` Mark Brown
2021-05-25 14:43     ` Greg Kroah-Hartman
2021-05-25 22:17       ` Mark Brown
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 33/63] Revert "media: dvb: Add check on sp8870_readreg" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 34/63] media: dvb: Add check on sp8870_readreg return Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 35/63] Revert "media: gspca: mt9m111: Check write_bridge for timeout" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 36/63] media: gspca: mt9m111: Check write_bridge for timeout Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 37/63] Revert "media: gspca: Check the return value of write_bridge for timeout" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 38/63] media: gspca: properly check for errors in po1030_probe() Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 39/63] Revert "net: liquidio: fix a NULL pointer dereference" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 40/63] net: liquidio: Add missing null pointer checks Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 41/63] Revert "brcmfmac: add a check for the status of usb_register" Sasha Levin
2021-05-24 14:45 ` [PATCH AUTOSEL 5.12 42/63] brcmfmac: properly check for bus register errors Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 43/63] cdrom: gdrom: initialize global variable at init time Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 44/63] btrfs: return whole extents in fiemap Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 45/63] scsi: ufs: ufs-mediatek: Fix power down spec violation Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 46/63] scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 47/63] openrisc: Define memory barrier mb Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 48/63] scsi: pm80xx: Fix drives missing during rmmod/insmod loop Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 49/63] btrfs: release path before starting transaction when cloning inline extent Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 50/63] btrfs: do not BUG_ON in link_to_fixup_dir Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 51/63] ALSA: dice: disable double_pcm_frames mode for M-Audio Profire 610, 2626 and Avid M-Box 3 Pro Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 52/63] platform/x86: hp-wireless: add AMD's hardware id to the supported list Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 53/63] platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 54/63] platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 55/63] SMB3: incorrect file id in requests compounded with open Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 56/63] drm/amd/display: Disconnect non-DP with no EDID Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 57/63] drm/amd/amdgpu: fix refcount leak Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 58/63] drm/amdgpu: Fix a use-after-free Sasha Levin
2021-05-24 14:46 ` Sasha Levin [this message]
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 60/63] drm/amdgpu: stop touching sched.ready in the backend Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 61/63] platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 62/63] block: fix a race between del_gendisk and BLKRRPART Sasha Levin
2021-05-24 14:46 ` [PATCH AUTOSEL 5.12 63/63] linux/bits.h: fix compilation error with GENMASK Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210524144620.2497249-59-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Lang.Yu@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox