Re: [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: John Harrison <john.c.harrison@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>,
	"Souza, Jose" <jose.souza@intel.com>,
	"Intel-Xe@Lists.FreeDesktop.Org" <Intel-Xe@lists.freedesktop.org>,
	"Filipchuk, Julia" <julia.filipchuk@intel.com>,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function
Date: Fri, 13 Dec 2024 09:18:04 -0500	[thread overview]
Message-ID: <Z1xCHJHd4JI3UNVk@intel.com> (raw)
In-Reply-To: <db943f8a-0d92-49bc-827e-46284739e8be@intel.com>

On Thu, Dec 12, 2024 at 01:04:23PM -0800, John Harrison wrote:
> On 12/12/2024 12:52, Lucas De Marchi wrote:
> > On Thu, Dec 12, 2024 at 11:14:23AM -0800, John Harrison wrote:
> > > On 12/12/2024 10:45, Lucas De Marchi wrote:
> > > > On Thu, Dec 12, 2024 at 05:41:41PM +0000, Jose Souza wrote:
> > > > > On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison@Intel.com wrote:
> > > > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > > > 
> > > > > > There is a need to include the GuC log and other large binary objects
> > > > > > in core dumps and via dmesg. So add a helper for dumping to a printer
> > > > > > function via conversion to ASCII85 encoding.
> > > > > > 
> > > > > > Another issue with dumping such a large buffer is that
> > > > > > it can be slow,
> > > > > > especially if dumping to dmesg over a serial port. So add a yield to
> > > > > > prevent the 'task has been stuck for 120s' kernel hang check feature
> > > > > > from firing.
> > > > > > 
> > > > > > v2: Add a prefix to the output string. Fix memory allocation bug.
> > > > > > v3: Correct a string size calculation and clean up a define (review
> > > > > > feedback from Julia F).
> > > > > > 
> > > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > > > Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_devcoredump.c | 87
> > > > > > +++++++++++++++++++++++++++++
> > > > > >  drivers/gpu/drm/xe/xe_devcoredump.h |  6 ++
> > > > > >  2 files changed, 93 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > index 2690f1d1cde4..0884c49942fe 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > @@ -6,6 +6,7 @@
> > > > > >  #include "xe_devcoredump.h"
> > > > > >  #include "xe_devcoredump_types.h"
> > > > > > 
> > > > > > +#include <linux/ascii85.h>
> > > > > >  #include <linux/devcoredump.h>
> > > > > >  #include <generated/utsrelease.h>
> > > > > > 
> > > > > > @@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe)
> > > > > >  }
> > > > > > 
> > > > > >  #endif
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_print_blob_ascii85 - print a BLOB to some useful
> > > > > > location in ASCII85
> > > > > > + *
> > > > > > + * The output is split to multiple lines because some
> > > > > > print targets, e.g. dmesg
> > > > > > + * cannot handle arbitrarily long lines. Note also that
> > > > > > printing to dmesg in
> > > > > > + * piece-meal fashion is not possible, each separate
> > > > > > call to drm_puts() has a
> > > > > > + * line-feed automatically added! Therefore, the entire
> > > > > > output line must be
> > > > > > + * constructed in a local buffer first, then printed in
> > > > > > one atomic output call.
> > > > > > + *
> > > > > > + * There is also a scheduler yield call to prevent the
> > > > > > 'task has been stuck for
> > > > > > + * 120s' kernel hang check feature from firing when
> > > > > > printing to a slow target
> > > > > > + * such as dmesg over a serial port.
> > > > > > + *
> > > > > > + * TODO: Add compression prior to the ASCII85 encoding
> > > > > > to shrink huge buffers down.
> > > > > > + *
> > > > > > + * @p: the printer object to output to
> > > > > > + * @prefix: optional prefix to add to output string
> > > > > > + * @blob: the Binary Large OBject to dump out
> > > > > > + * @offset: offset in bytes to skip from the front of
> > > > > > the BLOB, must be a multiple of sizeof(u32)
> > > > > > + * @size: the size in bytes of the BLOB, must be a
> > > > > > multiple of sizeof(u32)
> > > > > > + */
> > > > > > +void xe_print_blob_ascii85(struct drm_printer *p, const
> > > > > > char *prefix,
> > > > > > +               const void *blob, size_t offset, size_t size)
> > > > > > +{
> > > > > > +    const u32 *blob32 = (const u32 *)blob;
> > > > > > +    char buff[ASCII85_BUFSZ], *line_buff;
> > > > > > +    size_t line_pos = 0;
> > > > > > +
> > > > > > +#define DMESG_MAX_LINE_LEN    800
> > > > > > +#define MIN_SPACE        (ASCII85_BUFSZ + 2)        /*
> > > > > > 85 + "\n\0" */
> > > > > > +
> > > > > > +    if (size & 3)
> > > > > > +        drm_printf(p, "Size not word aligned: %zu", size);
> > > > > > +    if (offset & 3)
> > > > > > +        drm_printf(p, "Offset not word aligned: %zu", size);
> > > > > > +
> > > > > > +    line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
> > > > > > +    if (IS_ERR_OR_NULL(line_buff)) {
> > > > > > +        drm_printf(p, "Failed to allocate line buffer:
> > > > > > %pe", line_buff);
> > > > > > +        return;
> > > > > > +    }
> > > > > > +
> > > > > > +    blob32 += offset / sizeof(*blob32);
> > > > > > +    size /= sizeof(*blob32);
> > > > > > +
> > > > > > +    if (prefix) {
> > > > > > +        strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN -
> > > > > > MIN_SPACE - 2);
> > > > > > +        line_pos = strlen(line_buff);
> > > > > > +
> > > > > > +        line_buff[line_pos++] = ':';
> > > > > > +        line_buff[line_pos++] = ' ';
> > > > > > +    }
> > > > > > +
> > > > > > +    while (size--) {
> > > > > > +        u32 val = *(blob32++);
> > > > > > +
> > > > > > +        strscpy(line_buff + line_pos, ascii85_encode(val, buff),
> > > > > > +            DMESG_MAX_LINE_LEN - line_pos);
> > > > > > +        line_pos += strlen(line_buff + line_pos);
> > > > > > +
> > > > > > +        if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
> > > > > > +            line_buff[line_pos++] = '\n';
> > > > > > +            line_buff[line_pos++] = 0;
> > > > > 
> > > > > This breaks ascii85 parser that we had up to now.
> > > It should not break the decoding of existing sections. This helper
> > > is only being used for the new GuC objects at present. As per the
> > > TODO, the intent is to use it for all blobs in the future. But that
> > > was deliberately left to a future update (which hasn't been posted
> > > yet) to avoid breaking the mesa tool.
> > 
> > it broke nonetheless because then it fails to decode this line the way
> > it was doing:  each line starts a new key.
> It barfs on any line it doesn't understand? Seems like it could be made to
> be more robust in general.

We do not break userspace.
When we do we do not blame userspace.

I had warned it and was ignored:

"We shouldn't be breaking the current userspace tools. Any change like this would
need to be synchronized between all the current decode tools."

[1] https://lore.kernel.org/intel-xe/ZuLzSMH_hBl9RWdv@intel.com/ and ignored 

Next time, please respect the only Linux rule: No regression.

Revert sent and applying it soon...

> 
> > 
> > > 
> > > 
> > > > > And I think there is not safe way to parse it now, how would
> > > > > the parser know that the blob reach to end?
> > > The GuC decoder tool simply ignores leading/trailing whitespace and
> > > keeps decoding until it finds a line with a new field - i.e.
> > > anything with a non-ASCII85 character such as ':' or '*'. It is
> > > totally
> > 
> > character set for ascii85 is [33, 117] and 122, which includes both ':'
> > and '*'. Hopefully it's hard to hit a collision, but we shouldn't design
> > it in a way that is possible.
> Sorry, getting myself confused - it was a while ago when I was writing this.
> Yes, punctuation marks are part of the ASCII85 character set.
> 
> The GuC decoder looks for white space - a blank line or a line with a space
> in it. Any new field is guaranteed to be "name: data" so will have a space.
> And sections are delimited with blank lines, plus section headers are "****
> name ****" so also guaranteed to have a space.
> 
> John.
> 
> 
> > 
> > 
> > > reliable for me.
> > > 
> > > > 
> > > > Just looked at this code to check if we also had a "Size" as I remember
> > > > seeing it in previous related patches, but we don't. If
> > > > we did then you could use that to calculate how much you still had to
> > > > read, ignoring the \n. With the current scheme I think one
> > > > way would be to continue the previous key when the line doesn't start
> > > > with a new key. Awful, yes, and not future proof.
> > > Any new field is guaranteed to have a colon in it and any new
> > > section header is guaranteed to have a star in it. Sounds pretty
> > > reliable to me.
> > 
> > that's what I said by "continue the previous key when the line doesn't
> > start with a new one".
> > 
> > 
> > > 
> > > The other option would be to have an explicit prefix at the start of
> > > each continuation line. E.g. "A85:".
> > 
> > let's not make it worse than it already is. And not future proof by
> > given the encoding algorithm the meaning of "continuation line".
> > 
> > Lucas De Marchi
> > 
> > > 
> > > The problem with a size field is that ASCII85 encoding is data
> > > dependent - it has extremely basic run length encoding of strings of
> > > zeros. So you only know the size after the encoding is complete.
> > > Which means either the size field is in the dump after the encoded
> > > data (which is therefore useless) or you can't stream the encode and
> > > must encode to an in-memory buffer first, then print the size, then
> > > print your pre-encoded data.
> > > 
> > > 
> > > > 
> > > > John, I explicitly said *we can't break the existent users*,
> > > > pointed to the one in the mesa repo and confirmed with José it would be
> > > > indeed a breakage. Please check again the past email thread at
> > > > https://lore.kernel.org/all/3jexgpnh3br3gqi4ol4c3hx3tyhwevq5nqo5xssyie3xglqohz@e7mnj4dewupu/
> > > > 
> > > > 
> > > That discussion was a question for Jose not a statement. As there
> > > was no response for several weeks, I assumed that it wasn't a
> > > problem after all.
> > > 
> > > > 
> > > > Did you at least prepare the mesa parser for that additional \n?
> > > > 
> > > > We shouldn't have merged the kernel patch as is with the excuse
> > > > it reads
> > > > better in dmesg, when we also said we should not print that garbage to
> > > > dmesg :(
> > > Not following. Should not print what garbage in dmesg? As I keep
> > > saying, none of this goes to dmesg by default except in the case of
> > > catastrophic CT failure. And in that case, it is extremely useful
> > > and the only way to debug issues.
> > > 
> > > John.
> > > 
> > > > 
> > > > Lucas De Marchi
> > > 
>

next prev parent reply	other threads:[~2024-12-13 14:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03  0:46 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 01/11] drm/xe/guc: Remove spurious line feed in debug print John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 02/11] drm/xe/devcoredump: Use drm_puts and already cached local variables John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info John.C.Harrison
2024-12-12 18:17   ` Souza, Jose
2024-12-12 18:59     ` John Harrison
2024-12-12 19:31       ` Souza, Jose
2024-12-12 20:06         ` John Harrison
2024-12-12 20:30           ` Souza, Jose
2024-12-12 20:38             ` John Harrison
2024-10-03  0:46 ` [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison
2024-12-12 17:41   ` Souza, Jose
2024-12-12 18:45     ` Lucas De Marchi
2024-12-12 19:14       ` John Harrison
2024-12-12 20:52         ` Lucas De Marchi
2024-12-12 21:04           ` John Harrison
2024-12-13  0:32             ` Lucas De Marchi
2024-12-13 16:36               ` John Harrison
2024-12-13 17:20                 ` Lucas De Marchi
2024-12-13 17:34                   ` John Harrison
2024-12-13 14:18             ` Rodrigo Vivi [this message]
2024-12-13 16:42               ` John Harrison
2024-10-03  0:46 ` [PATCH v9 05/11] drm/xe/guc: Copy GuC log prior to dumping John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 06/11] drm/xe/guc: Use a two stage dump for GuC logs and add more info John.C.Harrison
2024-10-08 21:18   ` [v9, " Kees Bakker
2024-10-03  0:46 ` [PATCH v9 07/11] drm/print: Introduce drm_line_printer John.C.Harrison
2024-10-04 13:57   ` Maarten Lankhorst
2024-10-03  0:46 ` [PATCH v9 08/11] drm/xe/guc: Dead CT helper John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 09/11] drm/xe/guc: Dump entire CTB on errors John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 10/11] drm/xe/guc: Add GuC log to devcoredump captures John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 11/11] drm/xe/guc: Add a helper function for dumping GuC log to dmesg John.C.Harrison
2024-10-03  1:15 ` ✓ CI.Patch_applied: success for drm/xe/guc: Improve GuC log dumping and add to devcoredump (rev6) Patchwork
2024-10-03  1:15 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-03  1:17 ` ✓ CI.KUnit: success " Patchwork
2024-10-03  1:28 ` ✓ CI.Build: " Patchwork
2024-10-03  1:30 ` ✓ CI.Hooks: " Patchwork
2024-10-03  1:32 ` ✗ CI.checksparse: warning " Patchwork
2024-10-03  1:49 ` ✓ CI.BAT: success " Patchwork
2024-10-03  2:40 ` ✗ CI.FULL: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-10-02 21:14 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-02 21:14 ` [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1xCHJHd4JI3UNVk@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=Intel-Xe@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=jose.souza@intel.com \
    --cc=julia.filipchuk@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox