Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Souza, Jose" <jose.souza@intel.com>
To: "Intel-Xe@Lists.FreeDesktop.Org" <Intel-Xe@Lists.FreeDesktop.Org>,
	"Harrison, John C" <john.c.harrison@intel.com>,
	"Vivi, Rodrigo" <rodrigo.vivi@intel.com>,
	"De Marchi, Lucas" <lucas.demarchi@intel.com>
Cc: "Filipchuk, Julia" <julia.filipchuk@intel.com>
Subject: Re: [PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info
Date: Thu, 12 Dec 2024 20:30:59 +0000	[thread overview]
Message-ID: <2e25469bbaab8d1ee3d70b1bbbf295faa6220dd8.camel@intel.com> (raw)
In-Reply-To: <ea46ec11-b7a5-4103-8fdb-493e2489a688@intel.com>

On Thu, 2024-12-12 at 12:06 -0800, John Harrison wrote:
> On 12/12/2024 11:31, Souza, Jose wrote:
> > On Thu, 2024-12-12 at 10:59 -0800, John Harrison wrote:
> > > On 12/12/2024 10:17, Souza, Jose wrote:
> > > > On Wed, 2024-10-02 at 17:46 -0700, John.C.Harrison@Intel.com wrote:
> > > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > > 
> > > > > The xe_guc_exec_queue_snapshot is not really a GuC internal thing and
> > > > > is definitely not a GuC CT thing. So give it its own section heading.
> > > > > The snapshot itself is really a capture of the submission backend's
> > > > > internal state. Although all it currently prints out is the submission
> > > > > contexts. So label it as 'Contexts'. If more general state is added
> > > > > later then it could be change to 'Submission backend' or some such.
> > > > > 
> > > > > Further, everything from the GuC CT section onwards is GT specific but
> > > > > there was no indication of which GT it was related to (and that is
> > > > > impossible to work out from the other fields that are given). So add a
> > > > > GT section heading. Also include the tile id of the GT, because again
> > > > > significant information.
> > > > > 
> > > > > Lastly, drop a couple of unnecessary line feeds within sections.
> > > > > 
> > > > > v2: Add GT section heading, add tile id to device section.
> > > > > 
> > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > > > Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_devcoredump.c       | 5 +++++
> > > > >    drivers/gpu/drm/xe/xe_devcoredump_types.h | 3 ++-
> > > > >    drivers/gpu/drm/xe/xe_device.c            | 1 +
> > > > >    drivers/gpu/drm/xe/xe_guc_submit.c        | 2 +-
> > > > >    drivers/gpu/drm/xe/xe_hw_engine.c         | 1 -
> > > > >    5 files changed, 9 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > index d23719d5c2a3..2690f1d1cde4 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > @@ -96,8 +96,13 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
> > > > >    	drm_printf(&p, "Process: %s\n", ss->process_name);
> > > > >    	xe_device_snapshot_print(xe, &p);
> > > > >    
> > > > > +	drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id);
> > > > > +	drm_printf(&p, "\tTile: %d\n", ss->gt->tile->id);
> > > > > +
> > > > >    	drm_puts(&p, "\n**** GuC CT ****\n");
> > > > >    	xe_guc_ct_snapshot_print(ss->ct, &p);
> > > > > +
> > > > > +	drm_puts(&p, "\n**** Contexts ****\n");
> > > > >    	xe_guc_exec_queue_snapshot_print(ss->ge, &p);
> > > > This broke Mesa parser!
> > > > It can't now parse the exec_queue context because it was expected to be on the '**** GuC CT ****' section.
> > > Then the mesa parse needs to be updated. That was clearly a bug - exec
> > > queue contexts are absolutely not GuC CT data and should not be in the
> > > GuC CT section.
> > Don't matter if it is a bug or not, it broke the parser.
> > If this is not reverted we will have older Kernel versions that don't work with newer Mesa and newer Kernel versions that don't with old Mesa.
> Debug tools cannot count as UAPI that must never change.

That is not my understating from previous threads.

Imagine that a big costumer file a bug to us and attach the devcoredump of a older kernel version.
devcoredump parser will not work. If the developer is aware of this "contract" break he can checkout to a older UMD version, build it and then parse
the devcoredump. Then checkout again to main/master branch and work on the fix... Not viable at all.

At least UMD teams should be notified. At the moment Mesa debugging is blocked because of this patches.

> 
> The devcoredump contains much information that is essentially the 
> internals of the kernel. It is going to change. That is about the only 
> guarantee that we can make about it. And saying that we must 
> intentionally break the output of a developer only debug feature in 
> order to support older mesa is plain wrong. End users do not care about 
> debug tools. All user applications will still work just perfectly.
> 
> We can start adding version numbers to the devcoredump format if we 
> really need to. But that was already shot down as a bad idea. It is 
> debug information and not UAPI. So version incompatibilities are 
> expected from time to time.
> 
> John.
> 
> 
> > 
> > > John.
> > > 
> > > > >    
> > > > >    	drm_puts(&p, "\n**** Job ****\n");
> > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > index 440d05d77a5a..3cc2f095fdfb 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > > > @@ -37,7 +37,8 @@ struct xe_devcoredump_snapshot {
> > > > >    	/* GuC snapshots */
> > > > >    	/** @ct: GuC CT snapshot */
> > > > >    	struct xe_guc_ct_snapshot *ct;
> > > > > -	/** @ge: Guc Engine snapshot */
> > > > > +
> > > > > +	/** @ge: GuC Submission Engine snapshot */
> > > > >    	struct xe_guc_submit_exec_queue_snapshot *ge;
> > > > >    
> > > > >    	/** @hwe: HW Engine snapshot array */
> > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > > index 09a7ad830e69..030cf703e970 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > @@ -961,6 +961,7 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p)
> > > > >    
> > > > >    	for_each_gt(gt, xe, id) {
> > > > >    		drm_printf(p, "GT id: %u\n", id);
> > > > > +		drm_printf(p, "\tTile: %u\n", gt->tile->id);
> > > > >    		drm_printf(p, "\tType: %s\n",
> > > > >    			   gt->info.type == XE_GT_TYPE_MAIN ? "main" : "media");
> > > > >    		drm_printf(p, "\tIP ver: %u.%u.%u\n",
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > index 0ac4a19ec9cc..8690df699170 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > @@ -2240,7 +2240,7 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
> > > > >    	if (!snapshot)
> > > > >    		return;
> > > > >    
> > > > > -	drm_printf(p, "\nGuC ID: %d\n", snapshot->guc.id);
> > > > > +	drm_printf(p, "GuC ID: %d\n", snapshot->guc.id);
> > > > >    	drm_printf(p, "\tName: %s\n", snapshot->name);
> > > > >    	drm_printf(p, "\tClass: %d\n", snapshot->class);
> > > > >    	drm_printf(p, "\tLogical mask: 0x%x\n", snapshot->logical_mask);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > index ea6d9ef7fab6..6c9c27304cdc 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > @@ -1084,7 +1084,6 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot,
> > > > >    	if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE)
> > > > >    		drm_printf(p, "\tRCU_MODE: 0x%08x\n",
> > > > >    			   snapshot->reg.rcu_mode);
> > > > > -	drm_puts(p, "\n");
> > > > >    }
> > > > >    
> > > > >    /**
> 


  reply	other threads:[~2024-12-12 20:31 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03  0:46 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 01/11] drm/xe/guc: Remove spurious line feed in debug print John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 02/11] drm/xe/devcoredump: Use drm_puts and already cached local variables John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info John.C.Harrison
2024-12-12 18:17   ` Souza, Jose
2024-12-12 18:59     ` John Harrison
2024-12-12 19:31       ` Souza, Jose
2024-12-12 20:06         ` John Harrison
2024-12-12 20:30           ` Souza, Jose [this message]
2024-12-12 20:38             ` John Harrison
2024-10-03  0:46 ` [PATCH v9 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function John.C.Harrison
2024-12-12 17:41   ` Souza, Jose
2024-12-12 18:45     ` Lucas De Marchi
2024-12-12 19:14       ` John Harrison
2024-12-12 20:52         ` Lucas De Marchi
2024-12-12 21:04           ` John Harrison
2024-12-13  0:32             ` Lucas De Marchi
2024-12-13 16:36               ` John Harrison
2024-12-13 17:20                 ` Lucas De Marchi
2024-12-13 17:34                   ` John Harrison
2024-12-13 14:18             ` Rodrigo Vivi
2024-12-13 16:42               ` John Harrison
2024-10-03  0:46 ` [PATCH v9 05/11] drm/xe/guc: Copy GuC log prior to dumping John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 06/11] drm/xe/guc: Use a two stage dump for GuC logs and add more info John.C.Harrison
2024-10-08 21:18   ` [v9, " Kees Bakker
2024-10-03  0:46 ` [PATCH v9 07/11] drm/print: Introduce drm_line_printer John.C.Harrison
2024-10-04 13:57   ` Maarten Lankhorst
2024-10-03  0:46 ` [PATCH v9 08/11] drm/xe/guc: Dead CT helper John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 09/11] drm/xe/guc: Dump entire CTB on errors John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 10/11] drm/xe/guc: Add GuC log to devcoredump captures John.C.Harrison
2024-10-03  0:46 ` [PATCH v9 11/11] drm/xe/guc: Add a helper function for dumping GuC log to dmesg John.C.Harrison
2024-10-03  1:15 ` ✓ CI.Patch_applied: success for drm/xe/guc: Improve GuC log dumping and add to devcoredump (rev6) Patchwork
2024-10-03  1:15 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-03  1:17 ` ✓ CI.KUnit: success " Patchwork
2024-10-03  1:28 ` ✓ CI.Build: " Patchwork
2024-10-03  1:30 ` ✓ CI.Hooks: " Patchwork
2024-10-03  1:32 ` ✗ CI.checksparse: warning " Patchwork
2024-10-03  1:49 ` ✓ CI.BAT: success " Patchwork
2024-10-03  2:40 ` ✗ CI.FULL: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-10-02 21:14 [PATCH v9 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump John.C.Harrison
2024-10-02 21:14 ` [PATCH v9 03/11] drm/xe/devcoredump: Improve section headings and add tile info John.C.Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e25469bbaab8d1ee3d70b1bbbf295faa6220dd8.camel@intel.com \
    --to=jose.souza@intel.com \
    --cc=Intel-Xe@Lists.FreeDesktop.Org \
    --cc=john.c.harrison@intel.com \
    --cc=julia.filipchuk@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox