Re: [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: John Harrison <john.c.harrison@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
	"Souza, Jose" <jose.souza@intel.com>
Cc: "Intel-Xe@Lists.FreeDesktop.Org" <Intel-Xe@lists.freedesktop.org>,
	"Vivi,  Rodrigo" <rodrigo.vivi@intel.com>,
	"Filipchuk, Julia" <julia.filipchuk@intel.com>
Subject: Re: [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function
Date: Thu, 12 Dec 2024 11:14:23 -0800	[thread overview]
Message-ID: <208f50d9-58e8-413e-915f-95ca077e0fa0@intel.com> (raw)
In-Reply-To: <7qhmppwrqj5fsr3tkvufwua3iw7noadrefajrii66x7yydfeba@lqhgd7b5decp>

On 12/12/2024 10:45, Lucas De Marchi wrote:
> On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
>> On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a need to include the GuC log and other large binary objects
>>> in core dumps and via dmesg. So add a helper for dumping to a printer
>>> function via conversion to ASCII85 encoding.
>>>
>>> Another issue with dumping such a large buffer is that it can be slow,
>>> especially if dumping to dmesg over a serial port. So add a yield to
>>> prevent the 'task has been stuck for 120s' kernel hang check feature
>>> from firing.
>>>
>>> v2: Add a prefix to the output string. Fix memory allocation bug.
>>> v3: Correct a string size calculation and clean up a define (review
>>> feedback from Julia F).
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_devcoredump.c | 87 +++++++++++++++++++++++++++++
>>>  drivers/gpu/drm/xe/xe_devcoredump.h |  6 ++
>>>  2 files changed, 93 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c 
>>> b/drivers/gpu/drm/xe/xe_devcoredump.c
>>> index 2690f1d1cde4..0884c49942fe 100644
>>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>> @@ -6,6 +6,7 @@
>>>  #include "xe_devcoredump.h"
>>>  #include "xe_devcoredump_types.h"
>>>
>>> +#include <linux/ascii85.h>
>>>  #include <linux/devcoredump.h>
>>>  #include <generated/utsrelease.h>
>>>
>>> @@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
>>>  }
>>>
>>>  #endif
>>> +
>>> +/**
>>> + * xe_print_blob_ascii85 - print a BLOB to some useful location in 
>>> ASCII85
>>> + *
>>> + * The output is split to multiple lines because some print 
>>> targets, e.g. dmesg
>>> + * cannot handle arbitrarily long lines. Note also that printing to 
>>> dmesg in
>>> + * piece-meal fashion is not possible, each separate call to 
>>> drm_puts() has a
>>> + * line-feed automatically added! Therefore, the entire output line 
>>> must be
>>> + * constructed in a local buffer first, then printed in one atomic 
>>> output call.
>>> + *
>>> + * There is also a scheduler yield call to prevent the 'task has 
>>> been stuck for
>>> + * 120s' kernel hang check feature from firing when printing to a 
>>> slow target
>>> + * such as dmesg over a serial port.
>>> + *
>>> + * TODO: Add compression prior to the ASCII85 encoding to shrink 
>>> huge buffers down.
>>> + *
>>> + * @p: the printer object to output to
>>> + * @prefix: optional prefix to add to output string
>>> + * @blob: the Binary Large OBject to dump out
>>> + * @offset: offset in bytes to skip from the front of the BLOB, 
>>> must be a multiple of sizeof(u32)
>>> + * @size: the size in bytes of the BLOB, must be a multiple of 
>>> sizeof(u32)
>>> + */
>>> +void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
>>> +               const void *blob, size_t offset, size_t size)
>>> +{
>>> +    const u32 *blob32 = (const u32 *)blob;
>>> +    char buff[ASCII85_BUFSZ], *line_buff;
>>> +    size_t line_pos = 0;
>>> +
>>> +#define DMESG_MAX_LINE_LEN    800
>>> +#define MIN_SPACE        (ASCII85_BUFSZ + 2)        /* 85 + "\n\0" */
>>> +
>>> +    if (size & 3)
>>> +        drm_printf(p, "Size not word aligned: %zu", size);
>>> +    if (offset & 3)
>>> +        drm_printf(p, "Offset not word aligned: %zu", size);
>>> +
>>> +    line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>> +    if (IS_ERR_OR_NULL(line_buff)) {
>>> +        drm_printf(p, "Failed to allocate line buffer: %pe", 
>>> line_buff);
>>> +        return;
>>> +    }
>>> +
>>> +    blob32 += offset / sizeof(*blob32);
>>> +    size /= sizeof(*blob32);
>>> +
>>> +    if (prefix) {
>>> +        strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN - MIN_SPACE - 
>>> 2);
>>> +        line_pos = strlen(line_buff);
>>> +
>>> +        line_buff[line_pos++] = ':';
>>> +        line_buff[line_pos++] = ' ';
>>> +    }
>>> +
>>> +    while (size--) {
>>> +        u32 val = *(blob32++);
>>> +
>>> +        strscpy(line_buff + line_pos, ascii85_encode(val, buff),
>>> +            DMESG_MAX_LINE_LEN - line_pos);
>>> +        line_pos += strlen(line_buff + line_pos);
>>> +
>>> +        if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
>>> +            line_buff[line_pos++] = '\n';
>>> +            line_buff[line_pos++] = 0;
>>
>> This breaks ascii85 parser that we had up to now.
It should not break the decoding of existing sections. This helper is 
only being used for the new GuC objects at present. As per the TODO, the 
intent is to use it for all blobs in the future. But that was 
deliberately left to a future update (which hasn't been posted yet) to 
avoid breaking the mesa tool.


>> And I think there is not safe way to parse it now, how would the 
>> parser know that the blob reach to end?
The GuC decoder tool simply ignores leading/trailing whitespace and 
keeps decoding until it finds a line with a new field - i.e. anything 
with a non-ASCII85 character such as ':' or '*'. It is totally reliable 
for me.

>
> Just looked at this code to check if we also had a "Size" as I remember
> seeing it in previous related patches, but we don't. If
> we did then you could use that to calculate how much you still had to
> read, ignoring the \n. With the current scheme I think one
> way would be to continue the previous key when the line doesn't start
> with a new key. Awful, yes, and not future proof.
Any new field is guaranteed to have a colon in it and any new section 
header is guaranteed to have a star in it. Sounds pretty reliable to me.

The other option would be to have an explicit prefix at the start of 
each continuation line. E.g. "A85:".

The problem with a size field is that ASCII85 encoding is data dependent 
- it has extremely basic run length encoding of strings of zeros. So you 
only know the size after the encoding is complete. Which means either 
the size field is in the dump after the encoded data (which is therefore 
useless) or you can't stream the encode and must encode to an in-memory 
buffer first, then print the size, then print your pre-encoded data.


>
> John, I explicitly said *we can't break the existent users*,
> pointed to the one in the mesa repo and confirmed with José it would be
> indeed a breakage. Please check again the past email thread at
> https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/ 
>
That discussion was a question for Jose not a statement. As there was no 
response for several weeks, I assumed that it wasn't a problem after all.

>
> Did you at least prepare the mesa parser for that additional \n?
>
> We shouldn't have merged the kernel patch as is with the excuse it reads
> better in dmesg, when we also said we should not print that garbage to
> dmesg :(
Not following. Should not print what garbage in dmesg? As I keep saying, 
none of this goes to dmesg by default except in the case of catastrophic 
CT failure. And in that case, it is extremely useful and the only way to 
debug issues.

John.

>
> Lucas De Marchi

next prev parent reply	other threads:[~2024-12-12 19:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03  0:46 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 01/11] drm/xe/guc: Remove spurious line feed in debug print John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 02/11] drm/xe/devcoredump: Use drm_puts and already cached local variables John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info John.C.Harrison
2024-12-12 18:17   ` Souza, Jose
2024-12-12 18:59     ` John Harrison
2024-12-12 19:31       ` Souza, Jose
2024-12-12 20:06         ` John Harrison
2024-12-12 20:30           ` Souza, Jose
2024-12-12 20:38             ` John Harrison
2024-10-03  0:46 ` [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison
2024-12-12 17:41   ` Souza, Jose
2024-12-12 18:45     ` Lucas De Marchi
2024-12-12 19:14       ` John Harrison [this message]
2024-12-12 20:52         ` Lucas De Marchi
2024-12-12 21:04           ` John Harrison
2024-12-13  0:32             ` Lucas De Marchi
2024-12-13 16:36               ` John Harrison
2024-12-13 17:20                 ` Lucas De Marchi
2024-12-13 17:34                   ` John Harrison
2024-12-13 14:18             ` Rodrigo Vivi
2024-12-13 16:42               ` John Harrison
2024-10-03  0:46 ` [PATCH v9 05/11] drm/xe/guc: Copy GuC log prior to dumping John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 06/11] drm/xe/guc: Use a two stage dump for GuC logs and add more info John.C.Harrison
2024-10-08 21:18   ` [v9, " Kees Bakker
2024-10-03  0:46 ` [PATCH v9 07/11] drm/print: Introduce drm_line_printer John.C.Harrison
2024-10-04 13:57   ` Maarten Lankhorst
2024-10-03  0:46 ` [PATCH v9 08/11] drm/xe/guc: Dead CT helper John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 09/11] drm/xe/guc: Dump entire CTB on errors John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 10/11] drm/xe/guc: Add GuC log to devcoredump captures John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 11/11] drm/xe/guc: Add a helper function for dumping GuC log to dmesg John.C.Harrison
2024-10-03  1:15 ` ✓ CI.Patch_applied: success for drm/xe/guc: Improve GuC log dumping and add to devcoredump (rev6) Patchwork
2024-10-03  1:15 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-03  1:17 ` ✓ CI.KUnit: success " Patchwork
2024-10-03  1:28 ` ✓ CI.Build: " Patchwork
2024-10-03  1:30 ` ✓ CI.Hooks: " Patchwork
2024-10-03  1:32 ` ✗ CI.checksparse: warning " Patchwork
2024-10-03  1:49 ` ✓ CI.BAT: success " Patchwork
2024-10-03  2:40 ` ✗ CI.FULL: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-10-02 21:14 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-02 21:14 ` [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=208f50d9-58e8-413e-915f-95ca077e0fa0@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=Intel-Xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=julia.filipchuk@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox