From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 407FBC2B9F7 for ; Mon, 24 May 2021 14:50:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08B5861492 for ; Mon, 24 May 2021 14:50:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08B5861492 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ECCFB6E874; Mon, 24 May 2021 14:50:10 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id ABD0E6E870; Mon, 24 May 2021 14:50:07 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 8F54A6147F; Mon, 24 May 2021 14:50:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621867807; bh=akenY2+nrnz0yRofVdRHzxhKQS7tviJ6AFW4Q7dwg0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i6uSDwfDNM1D+V6fPYQLpAvulCPBYb9KlmbHFhkdEQ68Sr+lcgcaa783zud8bThvz PmcdehJ0eJDQNiMRjM+cA+1/jQj+//WfqEmiIciHrP+Ao1KyH3UGnBPbiQUwvIzkvI kJK1h2W+ziiWl98tkmzPaDm6OCGFKBzB6WjSFM+jmJoYccHkQFbzZRFvwIW4mZxNdG dvO8ouzpcq9X216iKbK1xqJ7OQdC5Jhoi0nVT0e4bUeueEQOJHOa7hzJISbO4AmTDy Iplrmq8G1QIn5JoC74XCaJ/ERruH5X/POksw+zZPlpCpWRFZPF2qZdr9zTyZB6fH41 V+RPHrpcgMXYw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 5.4 52/52] drm/amd/amdgpu: fix a potential deadlock in gpu reset Date: Mon, 24 May 2021 10:49:02 -0400 Message-Id: <20210524144903.2498518-52-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210524144903.2498518-1-sashal@kernel.org> References: <20210524144903.2498518-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sasha Levin , Andrey Grodzovsky , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Alex Deucher , Lang Yu , =?UTF-8?q?Christian=20K=C3=83nig?= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" RnJvbTogTGFuZyBZdSA8TGFuZy5ZdUBhbWQuY29tPgoKWyBVcHN0cmVhbSBjb21taXQgOWMyODc2 ZDU2ZjFjZTliNmIyMDcyZjE0NDZmYjFlOGQxNTMyY2IzZCBdCgpXaGVuIGFtZGdwdV9pYl9yaW5n X3Rlc3RzIGZhaWxlZCwgdGhlIHJlc2V0IGxvZ2ljIGNhbGxlZAphbWRncHVfZGV2aWNlX2lwX3N1 c3BlbmQgdHdpY2UsIHRoZW4gZGVhZGxvY2sgb2NjdXJyZWQuCkRlYWRsb2NrIGxvZzoKClsgIDgw NS42NTUxOTJdIGFtZGdwdSAwMDAwOjA0OjAwLjA6IGFtZGdwdTogaWIgcmluZyB0ZXN0IGZhaWxl ZCAoLTExMCkuClsgIDgwNi4yOTA5NTJdIFtkcm1dIGZyZWUgUFNQIFRNUiBidWZmZXIKClsgIDgw Ni4zMTk0MDZdID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Clsg IDgwNi4zMjAzMTVdIFdBUk5JTkc6IHBvc3NpYmxlIHJlY3Vyc2l2ZSBsb2NraW5nIGRldGVjdGVk ClsgIDgwNi4zMjEyMjVdIDUuMTEuMC1jdXN0b20gIzEgVGFpbnRlZDogRyAgICAgICAgVyAgT0VM ClsgIDgwNi4zMjIxMzVdIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tClsgIDgwNi4zMjMwNDNdIGNhdC8yNTkzIGlzIHRyeWluZyB0byBhY3F1aXJlIGxvY2s6Clsg IDgwNi4zMjM4MjVdIGZmZmY4ODgxMzZiMWNkYzggKCZhZGV2LT5kbS5kY19sb2NrKXsrLisufS17 MzozfSwgYXQ6IGRtX3N1c3BlbmQrMHhiOC8weDFkMCBbYW1kZ3B1XQpbICA4MDYuMzI1NjY4XQog ICAgICAgICAgICAgICBidXQgdGFzayBpcyBhbHJlYWR5IGhvbGRpbmcgbG9jazoKWyAgODA2LjMy NjY2NF0gZmZmZjg4ODEzNmIxY2RjOCAoJmFkZXYtPmRtLmRjX2xvY2speysuKy59LXszOjN9LCBh dDogZG1fc3VzcGVuZCsweGI4LzB4MWQwIFthbWRncHVdClsgIDgwNi4zMjg0MzBdCiAgICAgICAg ICAgICAgIG90aGVyIGluZm8gdGhhdCBtaWdodCBoZWxwIHVzIGRlYnVnIHRoaXM6ClsgIDgwNi4z Mjk1MzldICBQb3NzaWJsZSB1bnNhZmUgbG9ja2luZyBzY2VuYXJpbzoKClsgIDgwNi4zMzA1NDld ICAgICAgICBDUFUwClsgIDgwNi4zMzA5ODNdICAgICAgICAtLS0tClsgIDgwNi4zMzE0MTZdICAg bG9jaygmYWRldi0+ZG0uZGNfbG9jayk7ClsgIDgwNi4zMzIwODZdICAgbG9jaygmYWRldi0+ZG0u ZGNfbG9jayk7ClsgIDgwNi4zMzI3MzhdCiAgICAgICAgICAgICAgICAqKiogREVBRExPQ0sgKioq CgpbICA4MDYuMzMzNzQ3XSAgTWF5IGJlIGR1ZSB0byBtaXNzaW5nIGxvY2sgbmVzdGluZyBub3Rh dGlvbgoKWyAgODA2LjMzNDg5OV0gMyBsb2NrcyBoZWxkIGJ5IGNhdC8yNTkzOgpbICA4MDYuMzM1 NTM3XSAgIzA6IGZmZmY4ODgxMDBkM2YxYjggKCZhdHRyLT5tdXRleCl7Ky4rLn0tezM6M30sIGF0 OiBzaW1wbGVfYXR0cl9yZWFkKzB4NGUvMHgxMTAKWyAgODA2LjMzNzAwOV0gICMxOiBmZmZmODg4 MTM2YjFmZDc4ICgmYWRldi0+cmVzZXRfc2VtKXsrKysrfS17MzozfSwgYXQ6IGFtZGdwdV9kZXZp Y2VfbG9ja19hZGV2KzB4NDIvMHg5NCBbYW1kZ3B1XQpbICA4MDYuMzM5MDE4XSAgIzI6IGZmZmY4 ODgxMzZiMWNkYzggKCZhZGV2LT5kbS5kY19sb2NrKXsrLisufS17MzozfSwgYXQ6IGRtX3N1c3Bl bmQrMHhiOC8weDFkMCBbYW1kZ3B1XQpbICA4MDYuMzQwODY5XQogICAgICAgICAgICAgICBzdGFj ayBiYWNrdHJhY2U6ClsgIDgwNi4zNDE2MjFdIENQVTogNiBQSUQ6IDI1OTMgQ29tbTogY2F0IFRh aW50ZWQ6IEcgICAgICAgIFcgIE9FTCAgICA1LjExLjAtY3VzdG9tICMxClsgIDgwNi4zNDI5MjFd IEhhcmR3YXJlIG5hbWU6IEFNRCBDZWxhZG9uLUNaTi9DZWxhZG9uLUNaTiwgQklPUyBXTEQwQzIz Tl9XZWVrbHlfMjBfMTJfMiAxMi8yMy8yMDIwClsgIDgwNi4zNDQ0MTNdIENhbGwgVHJhY2U6Clsg IDgwNi4zNDQ4NDldICBkdW1wX3N0YWNrKzB4OTMvMHhiZApbICA4MDYuMzQ1NDM1XSAgX19sb2Nr X2FjcXVpcmUuY29sZCsweDE4YS8weDJjZgpbICA4MDYuMzQ2MTc5XSAgbG9ja19hY3F1aXJlKzB4 Y2EvMHgzOTAKWyAgODA2LjM0NjgwN10gID8gZG1fc3VzcGVuZCsweGI4LzB4MWQwIFthbWRncHVd ClsgIDgwNi4zNDc4MTNdICBfX211dGV4X2xvY2srMHg5Yi8weDkzMApbICA4MDYuMzQ4NDU0XSAg PyBkbV9zdXNwZW5kKzB4YjgvMHgxZDAgW2FtZGdwdV0KWyAgODA2LjM0OTQzNF0gID8gYW1kZ3B1 X2RldmljZV9pbmRpcmVjdF9ycmVnKzB4NTgvMHg3MCBbYW1kZ3B1XQpbICA4MDYuMzUwNTgxXSAg PyBfcmF3X3NwaW5fdW5sb2NrX2lycXJlc3RvcmUrMHg0Ny8weDUwClsgIDgwNi4zNTE0MzddICA/ IGRtX3N1c3BlbmQrMHhiOC8weDFkMCBbYW1kZ3B1XQpbICA4MDYuMzUyNDM3XSAgPyByY3VfcmVh ZF9sb2NrX3NjaGVkX2hlbGQrMHg0Zi8weDgwClsgIDgwNi4zNTMyNTJdICA/IHJjdV9yZWFkX2xv Y2tfc2NoZWRfaGVsZCsweDRmLzB4ODAKWyAgODA2LjM1NDA2NF0gIG11dGV4X2xvY2tfbmVzdGVk KzB4MWIvMHgyMApbICA4MDYuMzU0NzQ3XSAgPyBtdXRleF9sb2NrX25lc3RlZCsweDFiLzB4MjAK WyAgODA2LjM1NTQ1N10gIGRtX3N1c3BlbmQrMHhiOC8weDFkMCBbYW1kZ3B1XQpbICA4MDYuMzU2 NDI3XSAgPyBzb2MxNV9jb21tb25fc2V0X2Nsb2NrZ2F0aW5nX3N0YXRlKzB4MTdkLzB4MTkgW2Ft ZGdwdV0KWyAgODA2LjM1NzczNl0gIGFtZGdwdV9kZXZpY2VfaXBfc3VzcGVuZF9waGFzZTErMHg3 OC8weGQwIFthbWRncHVdClsgIDgwNi4zNjAzOTRdICBhbWRncHVfZGV2aWNlX2lwX3N1c3BlbmQr MHgyMS8weDcwIFthbWRncHVdClsgIDgwNi4zNjI5MjZdICBhbWRncHVfZGV2aWNlX3ByZV9hc2lj X3Jlc2V0KzB4YjMvMHgyNzAgW2FtZGdwdV0KWyAgODA2LjM2NTU2MF0gIGFtZGdwdV9kZXZpY2Vf Z3B1X3JlY292ZXIuY29sZCsweDY3OS8weDhlYiBbYW1kZ3B1XQoKU2lnbmVkLW9mZi1ieTogTGFu ZyBZdSA8TGFuZy5ZdUBhbWQuY29tPgpBY2tlZC1ieTogQ2hyaXN0aWFuIEvDg25pZyA8Y2hyaXN0 aWFuLmtvZW5pZ0BhbWQuY29tPgpSZXZpZXdlZC1ieTogQW5kcmV5IEdyb2R6b3Zza3kgPGFuZHJl eS5ncm9kem92c2t5QGFtZC5jb20+ClNpZ25lZC1vZmYtYnk6IEFsZXggRGV1Y2hlciA8YWxleGFu ZGVyLmRldWNoZXJAYW1kLmNvbT4KU2lnbmVkLW9mZi1ieTogU2FzaGEgTGV2aW4gPHNhc2hhbEBr ZXJuZWwub3JnPgotLS0KIGRyaXZlcnMvZ3B1L2RybS9hbWQvYW1kZ3B1L2FtZGdwdV9kZXZpY2Uu YyB8IDEgLQogMSBmaWxlIGNoYW5nZWQsIDEgZGVsZXRpb24oLSkKCmRpZmYgLS1naXQgYS9kcml2 ZXJzL2dwdS9kcm0vYW1kL2FtZGdwdS9hbWRncHVfZGV2aWNlLmMgYi9kcml2ZXJzL2dwdS9kcm0v YW1kL2FtZGdwdS9hbWRncHVfZGV2aWNlLmMKaW5kZXggM2IzZmM5YTQyNmU5Li43NjVmOWE2YzQ2 NDAgMTAwNjQ0Ci0tLSBhL2RyaXZlcnMvZ3B1L2RybS9hbWQvYW1kZ3B1L2FtZGdwdV9kZXZpY2Uu YworKysgYi9kcml2ZXJzL2dwdS9kcm0vYW1kL2FtZGdwdS9hbWRncHVfZGV2aWNlLmMKQEAgLTM3 MDQsNyArMzcwNCw2IEBAIHN0YXRpYyBpbnQgYW1kZ3B1X2RvX2FzaWNfcmVzZXQoc3RydWN0IGFt ZGdwdV9oaXZlX2luZm8gKmhpdmUsCiAJCQlyID0gYW1kZ3B1X2liX3JpbmdfdGVzdHModG1wX2Fk ZXYpOwogCQkJaWYgKHIpIHsKIAkJCQlkZXZfZXJyKHRtcF9hZGV2LT5kZXYsICJpYiByaW5nIHRl c3QgZmFpbGVkICglZCkuXG4iLCByKTsKLQkJCQlyID0gYW1kZ3B1X2RldmljZV9pcF9zdXNwZW5k KHRtcF9hZGV2KTsKIAkJCQluZWVkX2Z1bGxfcmVzZXQgPSB0cnVlOwogCQkJCXIgPSAtRUFHQUlO OwogCQkJCWdvdG8gZW5kOwotLSAKMi4zMC4yCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fXwphbWQtZ2Z4IG1haWxpbmcgbGlzdAphbWQtZ2Z4QGxpc3RzLmZy ZWVkZXNrdG9wLm9yZwpodHRwczovL2xpc3RzLmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xpc3Rp bmZvL2FtZC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76A70C47085 for ; Mon, 24 May 2021 14:50:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40F8B61483 for ; Mon, 24 May 2021 14:50:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40F8B61483 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E8C876E86A; Mon, 24 May 2021 14:50:09 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id ABD0E6E870; Mon, 24 May 2021 14:50:07 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 8F54A6147F; Mon, 24 May 2021 14:50:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621867807; bh=akenY2+nrnz0yRofVdRHzxhKQS7tviJ6AFW4Q7dwg0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i6uSDwfDNM1D+V6fPYQLpAvulCPBYb9KlmbHFhkdEQ68Sr+lcgcaa783zud8bThvz PmcdehJ0eJDQNiMRjM+cA+1/jQj+//WfqEmiIciHrP+Ao1KyH3UGnBPbiQUwvIzkvI kJK1h2W+ziiWl98tkmzPaDm6OCGFKBzB6WjSFM+jmJoYccHkQFbzZRFvwIW4mZxNdG dvO8ouzpcq9X216iKbK1xqJ7OQdC5Jhoi0nVT0e4bUeueEQOJHOa7hzJISbO4AmTDy Iplrmq8G1QIn5JoC74XCaJ/ERruH5X/POksw+zZPlpCpWRFZPF2qZdr9zTyZB6fH41 V+RPHrpcgMXYw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 5.4 52/52] drm/amd/amdgpu: fix a potential deadlock in gpu reset Date: Mon, 24 May 2021 10:49:02 -0400 Message-Id: <20210524144903.2498518-52-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210524144903.2498518-1-sashal@kernel.org> References: <20210524144903.2498518-1-sashal@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sasha Levin , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Alex Deucher , Lang Yu , =?UTF-8?q?Christian=20K=C3=83nig?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Lang Yu [ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ] When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu Acked-by: Christian KÃnig Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3b3fc9a426e9..765f9a6c4640 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3704,7 +3704,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, r = amdgpu_ib_ring_tests(tmp_adev); if (r) { dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r); - r = amdgpu_device_ip_suspend(tmp_adev); need_full_reset = true; r = -EAGAIN; goto end; -- 2.30.2 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F696C47088 for ; Mon, 24 May 2021 15:10:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC45D61376 for ; Mon, 24 May 2021 15:10:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235197AbhEXPJj (ORCPT ); Mon, 24 May 2021 11:09:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:38186 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234716AbhEXPBI (ORCPT ); Mon, 24 May 2021 11:01:08 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8F54A6147F; Mon, 24 May 2021 14:50:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621867807; bh=akenY2+nrnz0yRofVdRHzxhKQS7tviJ6AFW4Q7dwg0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i6uSDwfDNM1D+V6fPYQLpAvulCPBYb9KlmbHFhkdEQ68Sr+lcgcaa783zud8bThvz PmcdehJ0eJDQNiMRjM+cA+1/jQj+//WfqEmiIciHrP+Ao1KyH3UGnBPbiQUwvIzkvI kJK1h2W+ziiWl98tkmzPaDm6OCGFKBzB6WjSFM+jmJoYccHkQFbzZRFvwIW4mZxNdG dvO8ouzpcq9X216iKbK1xqJ7OQdC5Jhoi0nVT0e4bUeueEQOJHOa7hzJISbO4AmTDy Iplrmq8G1QIn5JoC74XCaJ/ERruH5X/POksw+zZPlpCpWRFZPF2qZdr9zTyZB6fH41 V+RPHrpcgMXYw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Lang Yu , =?UTF-8?q?Christian=20K=C3=83nig?= , Andrey Grodzovsky , Alex Deucher , Sasha Levin , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.4 52/52] drm/amd/amdgpu: fix a potential deadlock in gpu reset Date: Mon, 24 May 2021 10:49:02 -0400 Message-Id: <20210524144903.2498518-52-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210524144903.2498518-1-sashal@kernel.org> References: <20210524144903.2498518-1-sashal@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Lang Yu [ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ] When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu Acked-by: Christian KÃnig Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3b3fc9a426e9..765f9a6c4640 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3704,7 +3704,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, r = amdgpu_ib_ring_tests(tmp_adev); if (r) { dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r); - r = amdgpu_device_ip_suspend(tmp_adev); need_full_reset = true; r = -EAGAIN; goto end; -- 2.30.2