Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Souza, Jose" <jose.souza@intel.com>
To: "Harrison, John C" <john.c.harrison@intel.com>,
	"De Marchi, Lucas" <lucas.demarchi@intel.com>
Cc: "Intel-Xe@lists.freedesktop.org" <Intel-Xe@lists.freedesktop.org>,
	"Brost,  Matthew" <matthew.brost@intel.com>
Subject: Re: [PATCH v7 03/10] drm/xe/devcoredump: Add ASCII85 dump helper function
Date: Wed, 11 Sep 2024 19:54:56 +0000	[thread overview]
Message-ID: <0daf2e5c9ece014d5f85b5bcb0cb44dafa2449c0.camel@intel.com> (raw)
In-Reply-To: <7c75dda1-e62b-47ff-a6e2-0da2f080126f@intel.com>

On Wed, 2024-09-11 at 12:35 -0700, John Harrison wrote:
> On 9/11/2024 12:30, Souza, Jose wrote:
> > On Wed, 2024-09-11 at 14:12 -0500, Lucas De Marchi wrote:
> > > On Tue, Sep 10, 2024 at 01:17:11PM GMT, John Harrison wrote:
> > > > On 9/10/2024 12:43, Lucas De Marchi wrote:
> > > > > On Mon, Sep 09, 2024 at 06:31:41PM GMT, John Harrison wrote:
> > > > > > On 9/6/2024 19:06, John Harrison wrote:
> > > > > > > On 9/5/2024 20:04, Lucas De Marchi wrote:
> > > > > > > > On Thu, Sep 05, 2024 at 07:01:33PM GMT, John Harrison wrote:
> > > > > > > > > On 9/5/2024 18:54, Lucas De Marchi wrote:
> > > > > > > > > > On Thu, Sep 05, 2024 at 01:50:58PM GMT,
> > > > > > > > > > John.C.Harrison@Intel.com wrote:
> > > > > > > > > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > > > > > > > > 
> > > > > > > > > > > There is a need to include the GuC log and other large
> > > > > > > > > > > binary objects
> > > > > > > > > > > in core dumps and via dmesg. So add a helper for dumping
> > > > > > > > > > > to a printer
> > > > > > > > > > > function via conversion to ASCII85 encoding.
> > > > > > > > > > why are we not dumping the binary data directly to devcoredump?
> > > > > > > > > As per earlier comments, there is a WiFi driver or some such
> > > > > > > > > that does exactly that. But all they are dumping is a binary
> > > > > > > > > blob.
> > > > > > > > In your v5 I see you mentioned
> > > > > > > > drivers/net/wireless/ath/ath10k/coredump.c, but that is a
> > > > > > > > precedence for
> > > > > > > > including it as is from the device rather converting it to ASCII85 or
> > > > > > > > something else. It seems odd to do that type of conversion in kernel
> > > > > > > > space when it could be perfectly done in userspace.
> > > > > > > It really can't. An end user could maybe be expected to zip or
> > > > > > > tar a coredump file before attaching it to a bug report but they
> > > > > > > are certainly not going to try to ASCII85 encode random bits of
> > > > > > > it. Whereas, putting that in the kernel means it is just there.
> > > > > > > It is done. And it is pretty trivial - just call a helper
> > > > > > > function and it does everything for you. Also, I very much doubt
> > > > > > > you can spew raw binary data via dmesg. Even if the kernel would
> > > > > > > print it for you (which I doubt), the user tools like syslogd
> > > > > > > and dmesg itself are going to filter it to make it ASCII safe.
> > > > > > > 
> > > > > > > The i915 error dumps have been ASCII85 encoded using the
> > > > > > > kernel's ASCII85 encoding helper function since forever. This
> > > > > > > patch is just a wrapper around the kernel's existing
> > > > > > > implementation in order to make it more compatible with printing
> > > > > > > to dmesg. This is not creating a new precedent. It already
> > > > > > > exists.
> > > > > > > 
> > > > > > > > $ git grep ascii85.h
> > > > > > > > drivers/gpu/drm/i915/i915_gpu_error.c:#include <linux/ascii85.h>
> > > > > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:#include <linux/ascii85.h>
> > > > > > > > drivers/gpu/drm/msm/adreno/adreno_gpu.c:#include <linux/ascii85.h>
> > > > > > > > drivers/gpu/drm/xe/xe_lrc.c:#include <linux/ascii85.h>
> > > > > > > > drivers/gpu/drm/xe/xe_vm.c:#include <linux/ascii85.h>
> > > > > > > And the list of drivers which dump raw binary data in a coredump
> > > > > > > file is... ath10k. ASCII85 wins 3 to 1.
> > > > > > > 
> > > > > > > 
> > > > > > > > > We want the devcoredump file to still be human readable.
> > > > > > > > > That won't be the case if you stuff binary data in the
> > > > > > > > > middle of it. Most obvious problem - the zeros in the data
> > > > > > > > > will terminate your text file at that point. Potentially
> > > > > > > > > bigger problem for end users - random fake ANSI codes will
> > > > > > > > > destroy your terminal window if you try to cat the file to
> > > > > > > > > read it.
> > > > > > > > Users don't get a coredump and cat it to the terminal.
> > > > > > > > =(lk%A8`T7AKYH#FD,6++EqOABHUhsG%5H2ARoq#E$/V$Bl7Q+@<5pmBe<q;Bk;0mCj@.3DIal2FD5Q-+E_RBART+X@VfTuGA2/4Dfp.E@3BN0DfB9.+E1b0F(KAV+:8
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Lucas De Marchi
> > > > > > > They might. Either intentionally or accidentally. I've certainly
> > > > > > > done it myself. And people will certainly want to look at it in
> > > > > > > any random choice of text editor, pager, etc. 'cos you know, it
> > > > > > > is meant to be read by humans. If it is full of binary data then
> > > > > > > that becomes even more difficult than simply being full of ASCII
> > > > > > > gibberish. No matter what you are doing, the ASCII version is
> > > > > > > safer and easier to look at the rest of the file around it.
> > > > > > > 
> > > > > > > I don't understand why you are so desperate to have raw binary
> > > > > > > data in the middle of a text file. The disadvantages are
> > > > > > > multiple but the only advantage is a slightly smaller file. And
> > > > > > > the true route to smaller files is to add compression like we
> > > > > > > have in i915.
> > > > > > > 
> > > > > > > John.
> > > > > > > 
> > > > > > PS: Also meant to add that one of the important uses cases for
> > > > > > dumping logs to dmesg is for the really hard to repro bugs that
> > > > > > show up in CI extremely rarely. We get the driver to dump an error
> > > > > > capture to dmesg and pull that out from the CI logs. Even if you
> > > > > > could get binary data through dmesg, pretty sure the CI tools
> > > > > > would also not be happy with it. Anything non-printable will get
> > > > > > munged for sure when turning it into a web page.
> > > > > I think that's the main source of confusion on what we are discussing.
> > > > > I was not talking about dmesg at all. I'm only complaining about feeding
> > > > > ascii85-encoded data into a *devcoredump* when apparently there isn't a
> > > > > good reason to do so. I'd rather copy the binary data to the
> > > > > devcoredump.
> > > > But the intent is to dump a devcoredump to dmesg. It makes much sense
> > > It seems like an awful idea to dump hundreds of MB to dmesg.  When we
> > > talked about printing to dmesg it was about **GuC log** and on very
> > > initial states of driver probe where we didn't actually have a good
> > > interface for that. And the log wouldn't be so big. If we can already
> > > capture the devcoredump, what would be the reason to dump to dmesg
> > > (other than the non-valid "our CI captures dmesg, and doesn't
> > > capture devcoredump", which should be fixed).
> > > 
> > > If any sysadmin have their serial console flooded by such garbage there
> > > are 2 reactions: 1) someone got in control of my machine; 2) something
> > > went really bad with this machine. It's not "fear not, wait for it to
> > > complete, it's just normal debug data I will attach to an issue in
> > > gitlab".  And I'm mentioning a serial console here due to that
> > > cond_resched() added, which is only needed because you are trying to do
> > > in kernel space what should be in userspace.
> > > 
> > > Oh well... looking at this the main reason to use ascii85 I can see is
> > > because we already have parts of *our* devcoredump using it, and
> > > userspace relying on that. That's new to me. Let's stop bringing dmesg
> > > into this discussion.
> > > 
> > > > to have a single implementation that can be used for multiple
> > > > purposes. Otherwise you are duplicating a lot of code unnecessarily.
> > > > 
> > > > And I still think it is a *very* bad idea to be including binary data
> > > > in a text file. The devcoredump is supposed to be human readable. It
> > > no, it's not. devcoredump doesn't dictate the format, it's up to the
> > > drivers to do that. See their documentation.
> > > 
> > > > is supposed to be obtained by end users and passed around. Having
> > > > binary data in there just makes everything more complex and error
> > > > prone. We want this to be as simple, easy and safe as possible.
> > > > 
> > > > > For dmesg, there's a reason to encode it as you pointed out... but
> > > > > no users shouldn't actually see it - we should be getting all of those
> > > > > cases in CI. For the escape scenarios, yeah... better having it
> > > > > ascii85-encoded.
> > > > > 
> > > > > What you are adding to devcoredump also doesn't even seem to be an
> > > > > ascii85 representation, but a multiple lines that should be concatenated
> > > > > to form the ascii85 representation. For dmesg it makes sense. Not for
> > > > > devcoredump.  We should also probabaly need a length field (correctly
> > > > > accounting for the additional characters for each line) so we don't
> > > > > have an implicit dependency on what's the next field to know how much to
> > > > > parse.
> > > > The decoding is pretty trivial given that line feeds are not part of
> > > > the ASCII85 character set and so can just be dropped. Besides The
> > > > output is already not 'pure' ASCII85 because the ASCII85 data is
> > > > embedded within a devcoredump. There is all sorts of other text about,
> > > > including on the start of the line. There are multiple ASCII85 blobs
> > > > in there that need to be decoded separately. This is nothing new to my
> > > > patch set. All of that is already there. And as per comments on the
> > > > previous devcoredump patches from Matthew B, the object data can many
> > > > hundreds of MBs in size. Yet no-one batted an eyelid when that was
> > > > added. So why the sudden paranoia about adding a couple of MB of GuC
> > > > log in the same form?
> > > I suppose you are talking about commit 4d5242a003bb ("drm/xe: Implement capture of
> > > HWSP and HWCTX").  Probably because I haven't seen that commit doing an
> > > ascii85 encoding before, otherwise I'd have similar review feedback.
> > > 
> > > Looking at this just now, so I will also have to balance the previous
> > > users and existing userspace consuming it.
> > > 
> > > +José, would it be ok from the userspace POV to start adding the \n?
> > > Then we can at least have all fields in our devcoredump to follow the
> > > same format. Are these the decoder parts on the mesa side?
> > > 
> > > 	src/intel/tools/aubinator_error_decode.c
> > > 	src/intel/tools/error2hangdump.c?
> > > 
> > >   From a quick look, read_xe_data_file() already continues the previous
> > > topic when it reads a newline, but the parsers for HWCTX and HWSP
> > > seems to expect to to have the entire topic in a single line. But I may
> > > be missing something.
> > Sorry I'm not following up this thread.
> > Add a '\n' where exactly?
> To break very long ASCII85 streams across multiple lines. That will 
> allow the devcoredump file to output via dmesg for the situations where 
> reading from sysfs is not possible.
> 
> This patch is adding an ASCII85 encoding helper that is more friendly to 
> output via dmesg. The patch does not currently change the existing 
> ASCII85 encodings of VMs and hw contexts. However, the intention would 
> be to update that code to use this helper eventually.

It will break the parser and I don't think we are allowed to break it at this point.

What I can suggest is add another sysfs with a coredump that skips the binary dumps so it is readable by humans.

> 
> John.
> 
> > 
> > > Lucas De Marchi
> > > 
> > > > And again, arbitrarily long lines (potentially many thousands of
> > > > characters wide) in a text file can cause problems. Having it line
> > > > wrapped gets rid of those potential problems and so is safer. Anything
> > > > that reduces the risk of an error report being broken is a good thing
> > > > IMHO. Robustness is worthwhile!
> > > > 
> > > > John.
> > > > 
> > > > > Lucas De Marchi
> > > > > 
> > > > > > John.
> > > > > > 
> 


  reply	other threads:[~2024-09-11 19:55 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-05 20:50 [PATCH v7 00/10] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-09-05 20:50 ` [PATCH v7 01/10] drm/xe/guc: Remove spurious line feed in debug print John.C.Harrison
2024-09-10 19:32   ` Julia Filipchuk
2024-09-05 20:50 ` [PATCH v7 02/10] drm/xe/devcoredump: Add a section heading for the submission backend John.C.Harrison
2024-09-10 19:33   ` Julia Filipchuk
2024-09-05 20:50 ` [PATCH v7 03/10] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison
2024-09-06  1:54   ` Lucas De Marchi
2024-09-06  2:01     ` John Harrison
2024-09-06  3:04       ` Lucas De Marchi
2024-09-07  2:06         ` John Harrison
2024-09-10  1:31           ` John Harrison
2024-09-10 19:43             ` Lucas De Marchi
2024-09-10 20:17               ` John Harrison
2024-09-11 19:12                 ` Lucas De Marchi
2024-09-11 19:30                   ` Souza, Jose
2024-09-11 19:35                     ` John Harrison
2024-09-11 19:54                       ` Souza, Jose [this message]
2024-09-11 19:59                         ` John Harrison
2024-09-12 13:57                           ` Rodrigo Vivi
2024-09-12 18:06                             ` John Harrison
2024-09-16 15:32                               ` Rodrigo Vivi
2024-09-16 17:46                                 ` John Harrison
2024-09-11 19:31                   ` John Harrison
2024-09-10 19:33   ` Julia Filipchuk
2024-09-11  1:27     ` John Harrison
2024-09-05 20:50 ` [PATCH v7 04/10] drm/xe/guc: Copy GuC log prior to dumping John.C.Harrison
2024-09-11  0:48   ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 05/10] drm/xe/guc: Use a two stage dump for GuC logs and add more info John.C.Harrison
2024-09-11  0:48   ` Julia Filipchuk
2024-09-11  1:14     ` John Harrison
2024-09-05 20:51 ` [PATCH v7 06/10] drm/print: Introduce drm_line_printer John.C.Harrison
2024-09-05 20:51 ` [PATCH v7 07/10] drm/xe/guc: Dead CT helper John.C.Harrison
2024-09-11  0:09   ` John Harrison
2024-09-11 19:55   ` Julia Filipchuk
2024-09-11 20:13     ` John Harrison
2024-09-11 20:57       ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 08/10] drm/xe/guc: Dump entire CTB on errors John.C.Harrison
2024-09-11 20:12   ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 09/10] drm/xe/guc: Add GuC log to devcoredump captures John.C.Harrison
2024-09-11 20:25   ` Julia Filipchuk
2024-09-05 20:51 ` [PATCH v7 10/10] drm/xe/guc: Add a helper function for dumping GuC log to dmesg John.C.Harrison
2024-09-11 20:36   ` Julia Filipchuk
2024-09-11 20:41     ` John Harrison
2024-09-05 20:57 ` ✓ CI.Patch_applied: success for drm/xe/guc: Improve GuC log dumping and add to devcoredump (rev2) Patchwork
2024-09-05 20:57 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-05 20:58 ` ✓ CI.KUnit: success " Patchwork
2024-09-05 21:10 ` ✓ CI.Build: " Patchwork
2024-09-05 21:13 ` ✓ CI.Hooks: " Patchwork
2024-09-05 21:14 ` ✗ CI.checksparse: warning " Patchwork
2024-09-05 22:06 ` ✗ CI.BAT: failure " Patchwork
2024-09-08  0:02 ` ✗ CI.FULL: " Patchwork
2024-09-12  9:16 ` [PATCH v7 00/10] drm/xe/guc: Improve GuC log dumping and add to devcoredump Jani Nikula
2024-09-12 18:50   ` John Harrison
2024-09-13  7:26     ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0daf2e5c9ece014d5f85b5bcb0cb44dafa2449c0.camel@intel.com \
    --to=jose.souza@intel.com \
    --cc=Intel-Xe@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox