From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2197CC43464 for ; Fri, 18 Sep 2020 02:06:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8F1823770 for ; Fri, 18 Sep 2020 02:06:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="tbY0crE8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8F1823770 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4862B6E107; Fri, 18 Sep 2020 02:06:37 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id F257C6E107 for ; Fri, 18 Sep 2020 02:06:35 +0000 (UTC) Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2BCFD23770; Fri, 18 Sep 2020 02:06:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600394795; bh=ffDT/WKFTgV67vs+6P5VHGIfYyVvsSViiPmyQrIFwe0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tbY0crE8KOFG/32kJXkzQyL1R8WNS+Q/niwTxJpKAxaWGHJ5gDm1fLNSdOfsrm+2u pH/wmDPj5/sEQGd5PqMYwIAhXLxLx9nxzUHOUl0YFIZCsZnDq0tucZdiifFd9Pc1un xC82E5N/MT29JsVD1V6G7hGLr3dEdeuCuhRPuNAQ= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 5.4 265/330] drm/amd/powerplay: try to do a graceful shutdown on SW CTF Date: Thu, 17 Sep 2020 22:00:05 -0400 Message-Id: <20200918020110.2063155-265-sashal@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org> References: <20200918020110.2063155-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alex Deucher , Sasha Levin , Evan Quan , dri-devel@lists.freedesktop.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Evan Quan [ Upstream commit 9495220577416632675959caf122e968469ffd16 ] Normally this(SW CTF) should not happen. And by doing graceful shutdown we can prevent further damage. Signed-off-by: Evan Quan Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/powerplay/hwmgr/smu_helper.c | 21 +++++++++++++++---- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 7 +++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c index d09690fca4520..414added3d02c 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c @@ -22,6 +22,7 @@ */ #include +#include #include "hwmgr.h" #include "pp_debug.h" @@ -593,12 +594,18 @@ int phm_irq_process(struct amdgpu_device *adev, uint32_t src_id = entry->src_id; if (client_id == AMDGPU_IRQ_CLIENTID_LEGACY) { - if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) + if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) { pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); - else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); + } else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), @@ -609,12 +616,18 @@ int phm_irq_process(struct amdgpu_device *adev, PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); } else if (client_id == SOC15_IH_CLIENTID_THM) { - if (src_id == 0) + if (src_id == 0) { pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); - else + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); + } else pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c index c4d8c52c6b9ca..6c4405622c9bb 100644 --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "pp_debug.h" #include "amdgpu.h" @@ -1538,6 +1539,12 @@ static int smu_v11_0_irq_process(struct amdgpu_device *adev, PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); break; case THM_11_0__SRCID__THM_DIG_THERM_H2L: pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", -- 2.25.1 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DE13C43463 for ; Fri, 18 Sep 2020 02:06:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0552B2396E for ; Fri, 18 Sep 2020 02:06:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600394805; bh=ffDT/WKFTgV67vs+6P5VHGIfYyVvsSViiPmyQrIFwe0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=zTnD9Sh6x2ygB2YLIeXNTlHvTxunZkNvqUUKjS0l2KZZ2WaKWbfRsfbbpjKKJ0Ela 0DsSsrLmiOclLB7aNmYFT0cnNdKtfbH9nYhxg/lKlrG6C1465IPkW4a669LU7BzxCK x1H/JTloD+IQ5Hwq3UDfbeCDTUa+AjVR1geKpRHw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727812AbgIRCGn (ORCPT ); Thu, 17 Sep 2020 22:06:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:56036 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726205AbgIRCGg (ORCPT ); Thu, 17 Sep 2020 22:06:36 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2BCFD23770; Fri, 18 Sep 2020 02:06:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600394795; bh=ffDT/WKFTgV67vs+6P5VHGIfYyVvsSViiPmyQrIFwe0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tbY0crE8KOFG/32kJXkzQyL1R8WNS+Q/niwTxJpKAxaWGHJ5gDm1fLNSdOfsrm+2u pH/wmDPj5/sEQGd5PqMYwIAhXLxLx9nxzUHOUl0YFIZCsZnDq0tucZdiifFd9Pc1un xC82E5N/MT29JsVD1V6G7hGLr3dEdeuCuhRPuNAQ= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Evan Quan , Alex Deucher , Sasha Levin , dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.4 265/330] drm/amd/powerplay: try to do a graceful shutdown on SW CTF Date: Thu, 17 Sep 2020 22:00:05 -0400 Message-Id: <20200918020110.2063155-265-sashal@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org> References: <20200918020110.2063155-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Evan Quan [ Upstream commit 9495220577416632675959caf122e968469ffd16 ] Normally this(SW CTF) should not happen. And by doing graceful shutdown we can prevent further damage. Signed-off-by: Evan Quan Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/powerplay/hwmgr/smu_helper.c | 21 +++++++++++++++---- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 7 +++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c index d09690fca4520..414added3d02c 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c @@ -22,6 +22,7 @@ */ #include +#include #include "hwmgr.h" #include "pp_debug.h" @@ -593,12 +594,18 @@ int phm_irq_process(struct amdgpu_device *adev, uint32_t src_id = entry->src_id; if (client_id == AMDGPU_IRQ_CLIENTID_LEGACY) { - if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) + if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) { pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); - else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); + } else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), @@ -609,12 +616,18 @@ int phm_irq_process(struct amdgpu_device *adev, PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); } else if (client_id == SOC15_IH_CLIENTID_THM) { - if (src_id == 0) + if (src_id == 0) { pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); - else + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); + } else pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c index c4d8c52c6b9ca..6c4405622c9bb 100644 --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "pp_debug.h" #include "amdgpu.h" @@ -1538,6 +1539,12 @@ static int smu_v11_0_irq_process(struct amdgpu_device *adev, PCI_BUS_NUM(adev->pdev->devfn), PCI_SLOT(adev->pdev->devfn), PCI_FUNC(adev->pdev->devfn)); + /* + * SW CTF just occurred. + * Try to do a graceful shutdown to prevent further damage. + */ + dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); + orderly_poweroff(true); break; case THM_11_0__SRCID__THM_DIG_THERM_H2L: pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", -- 2.25.1