From: <Trigger.Huang@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: <sunil.khatri@amd.com>, <alexander.deucher@amd.com>,
Trigger Huang <Trigger.Huang@amd.com>
Subject: [PATCH 1/2] drm/amdgpu: skip printing vram_lost if needed
Date: Mon, 19 Aug 2024 17:53:30 +0800 [thread overview]
Message-ID: <20240819095331.460721-2-Trigger.Huang@amd.com> (raw)
In-Reply-To: <20240819095331.460721-1-Trigger.Huang@amd.com>
From: Trigger Huang <Trigger.Huang@amd.com>
The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is also trying to decouple the core dump function from
the GPU reset function, by replacing the argument amdgpu_reset_context
with amdgpu_job to specify the context for core dump.
V2: Inform user if VRAM lost check is skipped so users don't assume
VRAM wasn't lost (Alex)
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 20 ++++++++++---------
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h | 7 +++----
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
3 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
index cf2b4dd4d865..5ac59b62020c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
@@ -28,8 +28,8 @@
#include "atom.h"
#ifndef CONFIG_DEV_COREDUMP
-void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
+void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
+ bool vram_lost, struct amdgpu_job *job)
{
}
#else
@@ -315,7 +315,9 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count,
}
}
- if (coredump->reset_vram_lost)
+ if (coredump->skip_vram_check)
+ drm_printf(&p, "VRAM lost check is skipped!\n");
+ else if (coredump->reset_vram_lost)
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
return count - iter.remain;
@@ -326,12 +328,11 @@ static void amdgpu_devcoredump_free(void *data)
kfree(data);
}
-void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context)
+void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
+ bool vram_lost, struct amdgpu_job *job)
{
- struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
- struct amdgpu_job *job = reset_context->job;
+ struct amdgpu_coredump_info *coredump;
struct drm_sched_job *s_job;
coredump = kzalloc(sizeof(*coredump), GFP_NOWAIT);
@@ -341,11 +342,12 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
return;
}
+ coredump->skip_vram_check = skip_vram_check;
coredump->reset_vram_lost = vram_lost;
- if (reset_context->job && reset_context->job->vm) {
+ if (job && job->vm) {
+ struct amdgpu_vm *vm = job->vm;
struct amdgpu_task_info *ti;
- struct amdgpu_vm *vm = reset_context->job->vm;
ti = amdgpu_vm_get_task_info_vm(vm);
if (ti) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h
index 52459512cb2b..ef9772c6bcc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h
@@ -26,7 +26,6 @@
#define __AMDGPU_DEV_COREDUMP_H__
#include "amdgpu.h"
-#include "amdgpu_reset.h"
#ifdef CONFIG_DEV_COREDUMP
@@ -36,12 +35,12 @@ struct amdgpu_coredump_info {
struct amdgpu_device *adev;
struct amdgpu_task_info reset_task_info;
struct timespec64 reset_time;
+ bool skip_vram_check;
bool reset_vram_lost;
struct amdgpu_ring *ring;
};
#endif
-void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
- struct amdgpu_reset_context *reset_context);
-
+void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
+ bool vram_lost, struct amdgpu_job *job);
#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ad97f03f1358..59a443abc11e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5468,7 +5468,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags))
- amdgpu_coredump(tmp_adev, vram_lost, reset_context);
+ amdgpu_coredump(tmp_adev, false, vram_lost, reset_context->job);
if (vram_lost) {
DRM_INFO("VRAM is lost due to GPU reset!\n");
--
2.34.1
next prev parent reply other threads:[~2024-08-19 9:54 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 9:53 [PATCH 0/2] Improve the dev coredump Trigger.Huang
2024-08-19 9:53 ` Trigger.Huang [this message]
2024-08-19 9:53 ` [PATCH 2/2] drm/amdgpu: Do core dump immediately when job tmo Trigger.Huang
2024-08-19 10:30 ` Khatri, Sunil
2024-08-20 7:30 ` Huang, Trigger
2024-08-20 14:06 ` Alex Deucher
2024-08-20 15:07 ` Khatri, Sunil
2024-08-20 15:29 ` Alex Deucher
2024-08-20 15:31 ` Khatri, Sunil
2024-08-20 16:01 ` Alex Deucher
2024-08-20 16:54 ` Khatri, Sunil
2024-08-21 8:19 ` Huang, Trigger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240819095331.460721-2-Trigger.Huang@amd.com \
--to=trigger.huang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=sunil.khatri@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox