From: Sunil Khatri <sunil.khatri@amd.com>
To: "Alex Deucher" <alexander.deucher@amd.com>,
"Liu Leo" <Leo.Liu@amd.com>,
"Christian König" <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org, Sunil Khatri <sunil.khatri@amd.com>
Subject: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's
Date: Fri, 26 Jul 2024 18:17:51 +0530 [thread overview]
Message-ID: <20240726124751.3121312-2-sunil.khatri@amd.com> (raw)
In-Reply-To: <20240726124751.3121312-1-sunil.khatri@amd.com>
Problem:
IP dump right now is done post suspend of
all IP's which for some IP's could change power
state and software state too which we do not want
to reflect in the dump as it might not be same at
the time of hang.
Solution:
IP should be dumped as close to the HW state when
the GPU was in hung state without trying to reinitialize
any resource.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 60 +++++++++++-----------
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 730dae77570c..74f6f15e73b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5277,11 +5277,29 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev)
return ret;
}
+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+ int i;
+
+ lockdep_assert_held(&adev->reset_domain->sem);
+
+ for (i = 0; i < adev->reset_info.num_regs; i++) {
+ adev->reset_info.reset_dump_reg_value[i] =
+ RREG32(adev->reset_info.reset_dump_reg_list[i]);
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_info.reset_dump_reg_list[i],
+ adev->reset_info.reset_dump_reg_value[i]);
+ }
+
+ return 0;
+}
+
int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
struct amdgpu_reset_context *reset_context)
{
int i, r = 0;
struct amdgpu_job *job = NULL;
+ struct amdgpu_device *tmp_adev = reset_context->reset_req_dev;
bool need_full_reset =
test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
@@ -5340,6 +5358,18 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
}
}
+ if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags)) {
+ amdgpu_reset_reg_dumps(tmp_adev);
+
+ dev_info(tmp_adev->dev, "Dumping IP State\n");
+ /* Trigger ip dump before we reset the asic */
+ for (i = 0; i < tmp_adev->num_ip_blocks; i++)
+ if (tmp_adev->ip_blocks[i].version->funcs->dump_ip_state)
+ tmp_adev->ip_blocks[i].version->funcs->dump_ip_state(
+ (void *)tmp_adev);
+ dev_info(tmp_adev->dev, "Dumping IP State Completed\n");
+ }
+
if (need_full_reset)
r = amdgpu_device_ip_suspend(adev);
if (need_full_reset)
@@ -5352,47 +5382,17 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
return r;
}
-static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
-{
- int i;
-
- lockdep_assert_held(&adev->reset_domain->sem);
-
- for (i = 0; i < adev->reset_info.num_regs; i++) {
- adev->reset_info.reset_dump_reg_value[i] =
- RREG32(adev->reset_info.reset_dump_reg_list[i]);
-
- trace_amdgpu_reset_reg_dumps(adev->reset_info.reset_dump_reg_list[i],
- adev->reset_info.reset_dump_reg_value[i]);
- }
-
- return 0;
-}
-
int amdgpu_do_asic_reset(struct list_head *device_list_handle,
struct amdgpu_reset_context *reset_context)
{
struct amdgpu_device *tmp_adev = NULL;
bool need_full_reset, skip_hw_reset, vram_lost = false;
int r = 0;
- uint32_t i;
/* Try reset handler method first */
tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
reset_list);
- if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags)) {
- amdgpu_reset_reg_dumps(tmp_adev);
-
- dev_info(tmp_adev->dev, "Dumping IP State\n");
- /* Trigger ip dump before we reset the asic */
- for (i = 0; i < tmp_adev->num_ip_blocks; i++)
- if (tmp_adev->ip_blocks[i].version->funcs->dump_ip_state)
- tmp_adev->ip_blocks[i].version->funcs
- ->dump_ip_state((void *)tmp_adev);
- dev_info(tmp_adev->dev, "Dumping IP State Completed\n");
- }
-
reset_context->reset_device_list = device_list_handle;
r = amdgpu_reset_perform_reset(tmp_adev, reset_context);
/* If reset handler not implemented, continue; otherwise return */
--
2.34.1
next prev parent reply other threads:[~2024-07-26 12:48 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-26 12:47 [PATCH 1/2] drm/amdgpu: print VCN instance dump for valid instance Sunil Khatri
2024-07-26 12:47 ` Sunil Khatri [this message]
2024-07-26 13:12 ` [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's Alex Deucher
2024-07-26 13:48 ` Lazar, Lijo
2024-07-26 14:23 ` Khatri, Sunil
2024-07-26 14:41 ` Khatri, Sunil
2024-07-26 15:06 ` Lazar, Lijo
2024-07-26 17:16 ` Khatri, Sunil
2024-07-26 18:43 ` Alex Deucher
2024-07-26 19:21 ` Khatri, Sunil
2024-07-29 4:38 ` Lazar, Lijo
2024-07-29 5:38 ` Khatri, Sunil
2024-07-29 5:47 ` Lazar, Lijo
2024-07-29 8:27 ` Khatri, Sunil
2024-07-26 13:12 ` [PATCH 1/2] drm/amdgpu: print VCN instance dump for valid instance Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240726124751.3121312-2-sunil.khatri@amd.com \
--to=sunil.khatri@amd.com \
--cc=Leo.Liu@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox