From: John Harrison <john.c.harrison@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
Shuicheng Lin <shuicheng.lin@intel.com>,
<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v3 2/2] drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc
Date: Mon, 24 Feb 2025 14:14:39 -0800 [thread overview]
Message-ID: <8a5b6c4b-c06b-4bbc-bd15-f1b62ef07436@intel.com> (raw)
In-Reply-To: <veg3pgmivlvwgur3fb7k4da4aqad335b5tfrd6sdnq3g6ne3wh@mjhypez7eqn4>
On 2/24/2025 13:38, Lucas De Marchi wrote:
> On Thu, Feb 20, 2025 at 05:36:19PM -0800, John Harrison wrote:
>> On 2/20/2025 15:54, Lucas De Marchi wrote:
>>> On Thu, Feb 20, 2025 at 05:29:56PM +0100, Michal Wajdeczko wrote:
>>>> On 20.02.2025 01:17, Shuicheng Lin wrote:
>>>>> kzalloc returns a valid pointer or NULL if the allocation fails.
>>>>> It never returns an error pointer. It is better to check for NULL
>>>>> directly.
>>>>>
>>>>> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
>>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_devcoredump.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> index 60d15e455017..81b9d9bb3f57 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>> @@ -426,8 +426,8 @@ void xe_print_blob_ascii85(struct drm_printer
>>>>> *p, const char *prefix, char suffi
>>>>> drm_printf(p, "Offset not word aligned: %zu", offset);
>>>>>
>>>>> line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>> - if (IS_ERR_OR_NULL(line_buff)) {
>>>>> - drm_printf(p, "Failed to allocate line buffer: %pe",
>>>>> line_buff);
>>>>> + if (!line_buff) {
>>>>> + drm_printf(p, "Failed to allocate line buffer\n");
>>>>
>>>> btw, since this line will be included in the output, where one could
>>>> expect ascii85 data, shouldn't we print that diagnostic message with
>>>> some special prefix to make it clear there is nothing to parse? like
>>>>
>>>> "# Failed to allocate internal data\n"
>>>>
>>>> also since caller may have already provided a prefix, shouldn't we
>>>> also
>>>> include it in this diagnostic message?
>>>>
>>>> "%s%s# Failed to allocate internal data\n",
>>>> prefix ?: "",
>>>> prefix ? ": " : ""
>>>
>>> or stop printing and return an error. we are missing the `.error: ...`
>>> already that is used in other places.
>>>
>>> $ git grep '\.error: ' -- drivers/gpu/drm/xe
>>> drivers/gpu/drm/xe/xe_vm.c: drm_printf(p, "[0].error:
>>> %li\n", PTR_ERR(snap));
>>> drivers/gpu/drm/xe/xe_vm.c: drm_printf(p,
>>> "[%llx].error: %li\n", snap->snap[i].ofs,
>> This is the place that should be printing an error. The whole point
>> of this helper is that it wraps up all the blob output. However, do we
>
> note that this is not printing an error in the log. This is adding the
> error message in the place that is supposed to have the *data* for that
> key. That's why there was supposed to be a .error key to accompany this
> behavior. Right now if you look only at the devcoredump you have no
> clue the data is actually an error message, not real data.
Argh! Yes, getting myself confused. The '.data' is part of the prefix.
We should trim the prefix down to just the bit in square brackets and
have the helper print the size, the data and/or the error keys as
appropriate. Although not sure how that would work with the GuC log
being split across multiple bo's. It might be worth pushing support for
split bo's in to the helper as well. Let it take care of everything.
>
>
>> need to distinguish between a non-capture-process error (e.g. bad VM
>> object) versus an error in the capture itself (e.g. out of memory
>> converting the binary data to a text string)?
>>
>> Not sure what error routes there are in the VM capture? Are they
>> things that are important to include in the devcoredump because they
>> have significant meaning about what caused the hang? Or are the only
>> possible errors related to the capture process itself - failing to
>> allocate memory to store the capture or such?
>>
>> If the only errors are capture related then yes, just change this
>> line to print "[%prefix].error: %errno\n". But if there is use to
>> distinguish between bad VM objects and failed captures, then maybe
>> this one should be "[%prefix].capture_error: %errno\n" or something?
>
> -ENOMEM vs something else would already be a very good indicative.
But for a VM, can there be an ENOMEM error because the VM itself failed
to allocate (and thus caused the app to dereference a null pointer on
the GT and thus hit the crash)? Or can the ENOMEM only come from the
devcoredump code trying to cache and/or convert the object into a
dumpable entity?
Maybe there is no value to differentiating where the error came from.
Maybe a bad VM or other object just won't exist and won't be included in
the devcoredump in the first place. But I'm not familiar with that code
so just wanting to make sure we have thought about the possibility.
John.
>
> This discussion can continue. For now applying these patches that are
> orthogonal.
>
> Applied both to drm-xe-next.
>
> thanks,
> Lucas De Marchi
>
>>
>> John.
>>
>>
>>>
>>> Lucas De Marchi
>>>
>>>
>>>
>>>
>>>>
>>>>> return;
>>>>> }
>>>>>
>>>>
>>
next prev parent reply other threads:[~2025-02-24 22:15 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 21:26 [PATCH] drm/xe/devcoredump: Fix print typo of offset Shuicheng Lin
2025-02-19 22:48 ` ✓ CI.Patch_applied: success for drm/xe/devcoredump: Fix print typo of offset (rev2) Patchwork
2025-02-19 22:48 ` ✓ CI.checkpatch: " Patchwork
2025-02-19 22:49 ` ✓ CI.KUnit: " Patchwork
2025-02-19 23:06 ` ✓ CI.Build: " Patchwork
2025-02-19 23:08 ` ✓ CI.Hooks: " Patchwork
2025-02-19 23:10 ` ✓ CI.checksparse: " Patchwork
2025-02-19 23:33 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-20 0:17 ` [PATCH v3 0/2] drm/xe/devcoredump: Fix print typo of offset Shuicheng Lin
2025-02-20 0:17 ` [PATCH v3 1/2] " Shuicheng Lin
2025-02-20 14:29 ` Upadhyay, Tejas
2025-02-20 0:17 ` [PATCH v3 2/2] drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc Shuicheng Lin
2025-02-20 14:36 ` Upadhyay, Tejas
2025-02-20 16:29 ` Michal Wajdeczko
2025-02-20 23:54 ` Lucas De Marchi
2025-02-21 1:36 ` John Harrison
2025-02-24 21:22 ` Lin, Shuicheng
2025-02-24 21:38 ` Lucas De Marchi
2025-02-24 22:14 ` John Harrison [this message]
2025-02-20 0:56 ` ✓ CI.Patch_applied: success for drm/xe/devcoredump: Fix print typo of offset (rev3) Patchwork
2025-02-20 0:57 ` ✓ CI.checkpatch: " Patchwork
2025-02-20 0:58 ` ✓ CI.KUnit: " Patchwork
2025-02-20 1:14 ` ✓ CI.Build: " Patchwork
2025-02-20 1:17 ` ✓ CI.Hooks: " Patchwork
2025-02-20 1:18 ` ✓ CI.checksparse: " Patchwork
2025-02-20 1:38 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-20 21:31 ` ✗ Xe.CI.Full: failure for drm/xe/devcoredump: Fix print typo of offset (rev2) Patchwork
2025-02-21 0:11 ` ✗ Xe.CI.Full: failure for drm/xe/devcoredump: Fix print typo of offset (rev3) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a5b6c4b-c06b-4bbc-bd15-f1b62ef07436@intel.com \
--to=john.c.harrison@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=shuicheng.lin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox