From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Candice Li <candice.li@amd.com>,
Hawking Zhang <Hawking.Zhang@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
christian.koenig@amd.com, airlied@linux.ie,
dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.18 05/47] drm/amdgpu: Resolve RAS GFX error count issue after cold boot on Arcturus
Date: Mon, 13 Jun 2022 22:03:58 -0400 [thread overview]
Message-ID: <20220614020441.1098348-5-sashal@kernel.org> (raw)
In-Reply-To: <20220614020441.1098348-1-sashal@kernel.org>
From: Candice Li <candice.li@amd.com>
[ Upstream commit 2a460963350ec6b1534d28d7f943b5f84815aff2 ]
Adjust the sequence for ras late init and separate ras reset error status
from query status.
v2: squash in fix from Candice
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 ++++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 27 ++++++++++++++++++++-----
2 files changed, 28 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 28a736c507bb..bd3b32e5ba9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -625,17 +625,20 @@ int amdgpu_get_gfx_off_status(struct amdgpu_device *adev, uint32_t *value)
int amdgpu_gfx_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block)
{
int r;
- r = amdgpu_ras_block_late_init(adev, ras_block);
- if (r)
- return r;
if (amdgpu_ras_is_supported(adev, ras_block->block)) {
if (!amdgpu_persistent_edc_harvesting_supported(adev))
amdgpu_ras_reset_error_status(adev, AMDGPU_RAS_BLOCK__GFX);
+ r = amdgpu_ras_block_late_init(adev, ras_block);
+ if (r)
+ return r;
+
r = amdgpu_irq_get(adev, &adev->gfx.cp_ecc_error_irq, 0);
if (r)
goto late_fini;
+ } else {
+ amdgpu_ras_feature_enable_on_boot(adev, ras_block, 0);
}
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 424c22a841f4..3f96dadf2698 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -195,6 +195,13 @@ static ssize_t amdgpu_ras_debugfs_read(struct file *f, char __user *buf,
if (amdgpu_ras_query_error_status(obj->adev, &info))
return -EINVAL;
+ /* Hardware counter will be reset automatically after the query on Vega20 and Arcturus */
+ if (obj->adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 2) &&
+ obj->adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 4)) {
+ if (amdgpu_ras_reset_error_status(obj->adev, info.head.block))
+ dev_warn(obj->adev->dev, "Failed to reset error counter and error status");
+ }
+
s = snprintf(val, sizeof(val), "%s: %lu\n%s: %lu\n",
"ue", info.ue_count,
"ce", info.ce_count);
@@ -548,9 +555,10 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev,
if (amdgpu_ras_query_error_status(obj->adev, &info))
return -EINVAL;
- if (obj->adev->asic_type == CHIP_ALDEBARAN) {
+ if (obj->adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 2) &&
+ obj->adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 4)) {
if (amdgpu_ras_reset_error_status(obj->adev, info.head.block))
- DRM_WARN("Failed to reset error counter and error status");
+ dev_warn(obj->adev->dev, "Failed to reset error counter and error status");
}
return sysfs_emit(buf, "%s: %lu\n%s: %lu\n", "ue", info.ue_count,
@@ -1023,9 +1031,6 @@ int amdgpu_ras_query_error_status(struct amdgpu_device *adev,
}
}
- if (!amdgpu_persistent_edc_harvesting_supported(adev))
- amdgpu_ras_reset_error_status(adev, info->head.block);
-
return 0;
}
@@ -1145,6 +1150,12 @@ int amdgpu_ras_query_error_count(struct amdgpu_device *adev,
if (res)
return res;
+ if (adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 2) &&
+ adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 4)) {
+ if (amdgpu_ras_reset_error_status(adev, info.head.block))
+ dev_warn(adev->dev, "Failed to reset error counter and error status");
+ }
+
ce += info.ce_count;
ue += info.ue_count;
}
@@ -1705,6 +1716,12 @@ static void amdgpu_ras_log_on_err_counter(struct amdgpu_device *adev)
continue;
amdgpu_ras_query_error_status(adev, &info);
+
+ if (adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 2) &&
+ adev->ip_versions[MP0_HWIP][0] != IP_VERSION(11, 0, 4)) {
+ if (amdgpu_ras_reset_error_status(adev, info.head.block))
+ dev_warn(adev->dev, "Failed to reset error counter and error status");
+ }
}
}
--
2.35.1
next prev parent reply other threads:[~2022-06-14 2:05 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-14 2:03 [PATCH AUTOSEL 5.18 01/47] powerpc/kasan: Silence KASAN warnings in __get_wchan() Sasha Levin
2022-06-14 2:03 ` [PATCH AUTOSEL 5.18 02/47] ASoC: nau8822: Add operation for internal PLL off and on Sasha Levin
2022-06-14 2:03 ` [PATCH AUTOSEL 5.18 03/47] ASoC: qcom: lpass-platform: Update VMA access permissions in mmap callback Sasha Levin
2022-06-14 2:03 ` [PATCH AUTOSEL 5.18 04/47] drm/amd/display: Read Golden Settings Table from VBIOS Sasha Levin
2022-06-14 2:03 ` Sasha Levin [this message]
2022-06-14 2:03 ` [PATCH AUTOSEL 5.18 06/47] drm/amdkfd: Use mmget_not_zero in MMU notifier Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 07/47] dma-debug: make things less spammy under memory pressure Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 08/47] ASoC: Intel: cirrus-common: fix incorrect channel mapping Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 09/47] ASoC: cs42l52: Fix TLV scales for mixer controls Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 10/47] ASoC: cs35l36: Update digital volume TLV Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 11/47] ASoC: cs53l30: Correct number of volume levels on SX controls Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 12/47] ASoC: cs42l52: Correct TLV for Bypass Volume Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 13/47] ASoC: cs42l56: Correct typo in minimum level for SX volume controls Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 14/47] ASoC: cs42l51: Correct minimum value for SX volume control Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 15/47] drm/amdkfd: add pinned BOs to kfd_bo_list Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 16/47] ata: libata-core: fix NULL pointer deref in ata_host_alloc_pinfo() Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 17/47] quota: Prevent memory allocation recursion while holding dq_lock Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 18/47] ASoC: wm8962: Fix suspend while playing music Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 19/47] ASoC: es8328: Fix event generation for deemphasis control Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 20/47] ASoC: wm_adsp: Fix event generation for wm_adsp_fw_put() Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 21/47] ALSA: hda: MTL: add HD Audio PCI ID and HDMI codec vendor ID Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 22/47] Input: soc_button_array - also add Lenovo Yoga Tablet2 1051F to dmi_use_low_level_irq Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 23/47] scsi: vmw_pvscsi: Expand vcpuHint to 16 bits Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 24/47] scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 25/47] scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 26/47] scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 27/47] scsi: mpt3sas: Fix out-of-bounds compiler warning Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 28/47] scsi: ipr: Fix missing/incorrect resource cleanup in error case Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 29/47] scsi: pmcraid: Fix missing " Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 30/47] ALSA: hda/realtek - Add HW8326 support Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 31/47] virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 32/47] nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 33/47] ipv6: Fix signed integer overflow in __ip6_append_data Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 34/47] ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 35/47] net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 36/47] gcc-12: disable '-Wdangling-pointer' warning for now Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 37/47] mellanox: mlx5: avoid uninitialized variable warning with gcc-12 Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 38/47] gcc-12: disable '-Warray-bounds' universally for now Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 39/47] netfs: gcc-12: temporarily disable '-Wattribute-warning' " Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 40/47] MIPS: Loongson-3: fix compile mips cpu_hwmon as module build error Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 41/47] random: credit cpu and bootloader seeds by default Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 42/47] gpio: dwapb: Don't print error on -EPROBE_DEFER Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 43/47] platform/x86/intel: Fix pmt_crashlog array reference Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 44/47] platform/x86/intel: pmc: Support Intel Raptorlake P Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 45/47] platform/x86: gigabyte-wmi: Add Z690M AORUS ELITE AX DDR4 support Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 46/47] platform/x86: gigabyte-wmi: Add support for B450M DS3H-CF Sasha Levin
2022-06-14 2:04 ` [PATCH AUTOSEL 5.18 47/47] platform/x86/intel: hid: Add Surface Go to VGBS allow list Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220614020441.1098348-5-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Hawking.Zhang@amd.com \
--cc=airlied@linux.ie \
--cc=alexander.deucher@amd.com \
--cc=candice.li@amd.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox