AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] drm/amdgpu: Warn when bad pages approaches threshold
@ 2021-10-19 17:50 Kent Russell
  2021-10-19 17:50 ` [PATCH 2/4] drm/amdgpu: Clarify error when hitting bad page threshold Kent Russell
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Kent Russell @ 2021-10-19 17:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Kent Russell, Luben Tuikov, Mukul Joshi

Currently dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 98732518543e..8270aad23a06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1077,6 +1077,16 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
 		if (res)
 			DRM_ERROR("RAS table incorrect checksum or error:%d\n",
 				  res);
+
+		/* threshold = -1 is automatic, threshold = 0 means that page
+		 * retirement is disabled.
+		 */
+		if (amdgpu_bad_page_threshold > 0 &&
+		    control->ras_num_recs >= 0 &&
+		    control->ras_num_recs >= (amdgpu_bad_page_threshold * 9 / 10))
+			DRM_WARN("RAS records:%u approaching threshold:%d",
+					control->ras_num_recs,
+					amdgpu_bad_page_threshold);
 	} else if (hdr->header == RAS_TABLE_HDR_BAD &&
 		   amdgpu_bad_page_threshold != 0) {
 		res = __verify_ras_table_checksum(control);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-20 15:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-19 17:50 [PATCH 1/4] drm/amdgpu: Warn when bad pages approaches threshold Kent Russell
2021-10-19 17:50 ` [PATCH 2/4] drm/amdgpu: Clarify error when hitting bad page threshold Kent Russell
2021-10-19 18:47   ` Luben Tuikov
2021-10-19 17:50 ` [PATCH 3/4] drm/amdgpu: Add kernel parameter for ignoring " Kent Russell
2021-10-19 18:13   ` Felix Kuehling
2021-10-19 18:23     ` Russell, Kent
2021-10-20 10:55   ` Christian König
2021-10-20 14:56     ` Russell, Kent
2021-10-20 15:02       ` Christian König
2021-10-19 17:50 ` [PATCH 4/4] drm/amdgpu: Implement ignore_bad_page_threshold parameter Kent Russell
2021-10-19 18:08 ` [PATCH 1/4] drm/amdgpu: Warn when bad pages approaches threshold Felix Kuehling
2021-10-19 18:22   ` Russell, Kent
2021-10-19 18:42     ` Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox