From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Brost, Matthew" <matthew.brost@intel.com>
Cc: "Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Subject: Re: [PATCH v2] drm/xe: Faster devcoredump
Date: Mon, 29 Jul 2024 10:47:59 +0200 [thread overview]
Message-ID: <abcc60c6-acdc-45f2-8b30-fa30bfd19461@linux.intel.com> (raw)
In-Reply-To: <7783452aa26ca547e2ac80fca1ac752574c49f2d.camel@intel.com>
Hey,
I like speed, so great to have it fixed!
Den 2024-07-27 kl. 00:01, skrev Zanoni, Paulo R:
> On Thu, 2024-07-25 at 22:21 -0700, Matthew Brost wrote:
>> The current algorithm to read out devcoredump is O(N*N) where N is the
>> size of coredump due to usage of the drm_coredump_printer in
>> xe_devcoredump_read. Switch to a O(N) algorithm which prints the
>> devcoredump into a readable format in snapshot work and update
>> xe_devcoredump_read to memcpy from the readable format directly.
>
> I just tested this:
>
> root@martianriver:~# time cp /sys/class/drm/card0/device/devcoredump/data gpu-hang.data
>
> real 0m0.313s
> user 0m0.008s
> sys 0m0.298s
> root@martianriver:~# ls -lh gpu-hang.data
> -rw------- 1 root root 221M Jul 26 14:47 gpu-hang.data
>
> Going from an estimated 221 minutes to 0.3 seconds, I'd say it's an improvement.
>
>>
>> v2:
>> - Fix double free on devcoredump removal (Testing)
>> - Set read_data_size after snap work flush
>> - Adjust remaining in iterator upon realloc (Testing)
>> - Set read_data upon realloc (Testing)
>>
>> Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2408
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_devcoredump.c | 140 +++++++++++++++++-----
>> drivers/gpu/drm/xe/xe_devcoredump.h | 13 ++
>> drivers/gpu/drm/xe/xe_devcoredump_types.h | 4 +
>> drivers/gpu/drm/xe/xe_vm.c | 9 +-
>> drivers/gpu/drm/xe/xe_vm.h | 4 +-
>> 5 files changed, 136 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
>> index d8d8ca2c19d3..6af161250a9e 100644
>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>> @@ -66,22 +66,9 @@ static struct xe_guc *exec_queue_to_guc(struct xe_exec_queue *q)
>> return &q->gt->uc.guc;
>> }
>>
>> -static void xe_devcoredump_deferred_snap_work(struct work_struct *work)
>> +static void __xe_devcoredump_read(char *buffer, size_t count,
>> + struct xe_devcoredump *coredump)
>> {
>> - struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work);
>> -
>> - /* keep going if fw fails as we still want to save the memory and SW data */
>> - if (xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL))
>> - xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n");
>> - xe_vm_snapshot_capture_delayed(ss->vm);
>> - xe_guc_exec_queue_snapshot_capture_delayed(ss->ge);
>> - xe_force_wake_put(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
Should this put be conditional?
>> -}
>> -
>> -static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>> - size_t count, void *data, size_t datalen)
>> -{
>> - struct xe_devcoredump *coredump = data;
>> struct xe_device *xe;
>> struct xe_devcoredump_snapshot *ss;
>> struct drm_printer p;
>> @@ -89,18 +76,12 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>> struct timespec64 ts;
>> int i;
>>
>> - if (!coredump)
>> - return -ENODEV;
>> -
>> xe = coredump_to_xe(coredump);
>> ss = &coredump->snapshot;
>>
>> - /* Ensure delayed work is captured before continuing */
>> - flush_work(&ss->work);
>> -
>> iter.data = buffer;
>> iter.offset = 0;
>> - iter.start = offset;
>> + iter.start = 0;
>> iter.remain = count;
>>
>> p = drm_coredump_printer(&iter);
>> @@ -129,15 +110,86 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>> xe_hw_engine_snapshot_print(coredump->snapshot.hwe[i],
>> &p);
>> drm_printf(&p, "\n**** VM state ****\n");
>> - xe_vm_snapshot_print(coredump->snapshot.vm, &p);
>> + xe_vm_snapshot_print(ss, coredump->snapshot.vm, &p);
>>
>> - return count - iter.remain;
>> + ss->read_data_size = iter.offset;
>> +}
>> +
>> +static void xe_devcoredump_snapshot_free(struct xe_devcoredump_snapshot *ss)
>> +{
>> + int i;
>> +
>> + xe_guc_ct_snapshot_free(ss->ct);
>> + ss->ct = NULL;
>> +
>> + xe_guc_exec_queue_snapshot_free(ss->ge);
>> + ss->ge = NULL;
>> +
>> + xe_sched_job_snapshot_free(ss->job);
>> + ss->job = NULL;
>> +
>> + for (i = 0; i < XE_NUM_HW_ENGINES; i++)
>> + if (ss->hwe[i]) {
>> + xe_hw_engine_snapshot_free(ss->hwe[i]);
>> + ss->hwe[i] = NULL;
>> + }
>> +
>> + xe_vm_snapshot_free(ss->vm);
>> + ss->vm = NULL;
>> +}
>> +
>> +static void xe_devcoredump_deferred_snap_work(struct work_struct *work)
>> +{
>> + struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work);
>> + struct xe_devcoredump *coredump = container_of(ss, typeof(*coredump), snapshot);
>> +
>> + /* keep going if fw fails as we still want to save the memory and SW data */
>> + if (xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL))
>> + xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n");
>> + xe_vm_snapshot_capture_delayed(ss->vm);
>> + xe_guc_exec_queue_snapshot_capture_delayed(ss->ge);
>> + xe_force_wake_put(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
>> +
>> + ss->read_data = kvmalloc(SZ_16M, GFP_USER);
>> + if (!ss->read_data)
>> + return;
>> +
>> + ss->read_data_size = SZ_16M;
Shouldn't it be easy to actually make a reasonable approximation of the size, instead of reallocating all the time?
Or run twice, returning size on first attempt, and data on second.
In any case,
ss->read_data_size = some const + VM_DUMP_SIZE * some other const
This approach is likely too fragile, but we should be able to change the code to dump twice to get the accurate number.
Cheers,
~Maarten
next prev parent reply other threads:[~2024-07-29 8:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-26 5:21 [PATCH v2] drm/xe: Faster devcoredump Matthew Brost
2024-07-26 5:25 ` ✓ CI.Patch_applied: success for drm/xe: Faster devcoredump (rev2) Patchwork
2024-07-26 5:26 ` ✓ CI.checkpatch: " Patchwork
2024-07-26 5:27 ` ✓ CI.KUnit: " Patchwork
2024-07-26 5:39 ` ✓ CI.Build: " Patchwork
2024-07-26 5:41 ` ✓ CI.Hooks: " Patchwork
2024-07-26 5:42 ` ✓ CI.checksparse: " Patchwork
2024-07-26 6:03 ` ✓ CI.BAT: " Patchwork
2024-07-26 15:45 ` ✓ CI.FULL: " Patchwork
2024-07-26 22:01 ` [PATCH v2] drm/xe: Faster devcoredump Zanoni, Paulo R
2024-07-29 8:47 ` Maarten Lankhorst [this message]
2024-07-30 22:42 ` Matthew Brost
2024-07-31 21:22 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abcc60c6-acdc-45f2-8b30-fa30bfd19461@linux.intel.com \
--to=maarten.lankhorst@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=paulo.r.zanoni@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox