public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH v5 06/10] drm/i915/guc: Update GuC's log-buffer-state access for error capture.
Date: Tue, 8 Feb 2022 19:34:37 -0800	[thread overview]
Message-ID: <20220209033436.GA25278@jons-linux-dev-box> (raw)
In-Reply-To: <f3cdc0a3a56eb452216b74bf0a12b112896b47f7.camel@intel.com>

On Tue, Feb 08, 2022 at 02:55:07PM -0800, Teres Alexis, Alan Previn wrote:
> 
> On Tue, 2022-02-08 at 14:18 -0800, Matthew Brost wrote:
> > On Tue, Feb 08, 2022 at 11:38:13AM -0800, Teres Alexis, Alan Previn wrote:
> > > Hi Matt, thank you for taking the time to review the codes.
> > > Answering your question inline below.
> > >
> > >
> > > On Fri, 2022-02-04 at 10:19 -0800, Matthew Brost wrote:
> > > > On Wed, Jan 26, 2022 at 02:48:18AM -0800, Alan Previn wrote:
> > > > > GuC log buffer regions for debug-log-events, crash-dumps and
> > > > > error-state-capture are all a single bo allocation that includes
> > > > > the guc_log_buffer_state structures.
> > > > >
> > > > > Since the error-capture region is accessed with high priority at non-
> > > > > deterministic times (as part of gpu coredump) while the debug-log-event
> > > > > region is populated and accessed with different priorities, timings and
> > > > > consumers, let's split out separate locks for buffer-state accesses
> > > > > of each region.
> > > > >
> > > > > Also, ensure a global mapping is made up front for the entire bo
> > > > > throughout GuC operation so that dynamic mapping and unmapping isn't
> > > > > required for error capture log access if relay-logging isn't running.
> > > > >
> > > > > Additionally, while here, make some readibility improvements:
> > > > > 1. change previous function names with "capture_logs" to
> > > > >    "copy_debug_logs" to help make the distinction clearer.
> > > > > 2. Update the guc log region mapping comments to order them
> > > > >    according to the enum definition as per the GuC interface.
> > > > >
> > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
> > > > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  47 ++++++
> > > > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 +
> > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 135 +++++++++++-------
> > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  16 ++-
> > > > >  5 files changed, 141 insertions(+), 60 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > index 4e819853ec2e..be1ad7fa2bf8 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > @@ -34,6 +34,8 @@ struct intel_guc {
> > > > >     struct intel_uc_fw fw;
> > > > >     /** @log: sub-structure containing GuC log related data and objects */
> > > > >     struct intel_guc_log log;
> > > > > +   /** @log_state: states and locks for each subregion of GuC's log buffer */
> > > > > +   struct intel_guc_log_stats log_state[GUC_MAX_LOG_BUFFER];
> > > >
> > > > Why move this? This still probably should be sub-structure of
> > > > intel_guc_log. Most of the access is from functions that pass in
> > > > intel_guc_log, then retrieve the GuC object, only to access this new
> > > > intel_guc_log_stats object. That layering seems wrong, if the argument
> > > > to a function is intel_guc_log it should really try to access members
> > > > within that object or below. Obv some exceptions exist but here it seems
> > > > clear to me this is in the wrong place.
> > > >
> > > So the reasoning i had was because because intel_guc_log module only managed
> > > guc-relay-logging and guc-log-dumping for log-events but allocates the buffer
> > > for 3 separate subregion/usages (i.e. log-events, crash-dump and error-capture).
> > > That said, I did not want intel_guc_capture and relay-logging (or log-dumping)
> > > to have to contend with the same lock because these two subregions are completely
> > > independant of each other in terms of when they are being accessed and why.
> > >
> >
> > All that is fine, I just think the 'struct intel_guc_log_stats' should
> > be sub-substure of struct intel_guc_log.
> >
> > > However, after the redesign of rev5 (this rev), I now believe the intel_guc_capture
> > > module does not require a lock because its only ever accessing its log
> > > subregion in response to the ctb handler functions that run out of the same queue.
> > > As we know intel_guc_capture reacts to G2H-error-capture-notification and G2H-context-reset
> > > (that trickles into i915_gpu_coredump). At the point of i915_error_state dump,
> > > intel_guc_capture module does not access the buffer - it merely dumps the already-parsed
> > > and engine-dump-node (that was detached from error-capture's internal linked-list
> > > and attached to the gpu_coredump structure in the second G2H above).
> > >
> > > And of course, intel_guc_log only ever accesses the log-events subregion
> > > and never the error-capture regions.
> > >
> > > That said, i could revert the lock structure to the original and not have
> > > intel_guc_capture using locks. What do you think?
> > >
> >
> > Again my comment has nothing to do with locking, it is where the
> > structure lives.
> >
> >
> IMHO, based on my understanding of the codes and the GuC interface,
> i see intel_guc_log and intel_guc_capture as 2 subsystems at equal level
> under the intel_guc framework. Both these subsystems have completely independent
> functions with completely separate input streams with completely separate
> content and usage. They do happen to share the same bo but have independent
> regions. That said, I believe the stats array-structure and even the vma
> ptr should go into the parent - i.e. intel_guc, the parent of both intel_guc_log and
> intel_guc_capture. Alernatively, each subsystem that has its own stats structure.
> 
> However, without the need for the locks, i guess i dont need to change anything
> other than have intel_guc_log skip over the capture region and let intel_guc_capture
> deal with its region independantly without sharing any member variable
> from intel_guc_log.
> 
> I am admittedly new to the GuC infrastructure so please do let me know if I am mistaken
> and if intel_guc_capture is supposed to be a subsystem of intel_guc_log.
> 

If the object lives at the GuC level, operate on it at the GuC level.

e.g.
intel_guc_log_init_early calls mutex init on guc->log_state - that is
wrong and breaks the layering. intel_guc_log_init_early should only
operate on guc_log or below objects, not above it.

The key here is consisteny, if the GuC level owns the object it should
be initialized there + passed into the lower levels if possible. The
lower levels should avoid reaching back to GuC level for objects
whenever possible.

You could have 2 intel_guc_log_stats objects below the guc_log object
and 1 intel_guc_log_stats object for capture at the GuC level. That's
likely the right approach here.

I know this is kinda a nit but actually important to have a well
structured / layered driver. The i915 isn't a great example of this but
we should avoid making this worse.

Matt

> > Matt
> >
> > > ...alan
> > >
> > > > Another nit, I'd personally break this out into multiple patches.
> > > >
> > > > e.g. 1 to rename relay log functions, 1 introducing intel_guc_log_stats
> > > > + lock, and 1 adding intel_guc_capture_output_min_size_est. Maybe I'm
> > > > missing another patch or two in there.
> > > >
> > > > Not a blocker but it does make reviews easier.
> > > >
> > > Will do.
> > >
> > > > Matt
> > > >
> > > > >     /** @ct: the command transport communication channel */
> > > > >     struct intel_guc_ct ct;
> > > > >     /** @slpc: sub-structure containing SLPC related data and objects */
> > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > index 70d2ee841289..e7f99d051636 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > @@ -651,6 +651,53 @@ int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u3
> > > > >     return PAGE_ALIGN(alloc_size);
> > > > >  }
> > > > >
> > > > > 2.25.1
> > > > >
> 

  reply	other threads:[~2022-02-09  3:40 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 10:48 [Intel-gfx] [PATCH v5 00/10] Add GuC Error Capture Support Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 01/10] drm/i915/guc: Update GuC ADS size for error capture lists Alan Previn
2022-01-26 18:09   ` Jani Nikula
2022-01-26 18:15   ` Jani Nikula
2022-01-26 22:46   ` Lucas De Marchi
2022-02-03 19:03     ` Matthew Brost
2022-02-03 20:04       ` Lucas De Marchi
2022-02-03 20:37         ` Teres Alexis, Alan Previn
2022-02-03 20:40           ` Teres Alexis, Alan Previn
2022-02-03 21:40             ` Lucas De Marchi
2022-02-03 21:58               ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 02/10] drm/i915/guc: Add XE_LP registers for GuC error state capture Alan Previn
2022-01-26 18:13   ` Jani Nikula
2022-01-26 21:46     ` Teres Alexis, Alan Previn
2022-01-27  9:30       ` Jani Nikula
2022-01-28 16:54         ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 03/10] drm/i915/guc: Add DG2 " Alan Previn
2022-02-05  1:28   ` Umesh Nerlige Ramappa
2022-02-10  5:17     ` Teres Alexis, Alan Previn
2022-02-11 19:24     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 04/10] drm/i915/guc: Add Gen9 " Alan Previn
2022-02-07 19:14   ` Umesh Nerlige Ramappa
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 05/10] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 06/10] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2022-01-27  4:26   ` Teres Alexis, Alan Previn
2022-02-04 18:19   ` Matthew Brost
2022-02-08 19:38     ` Teres Alexis, Alan Previn
2022-02-08 22:18       ` Matthew Brost
2022-02-08 22:55         ` Teres Alexis, Alan Previn
2022-02-09  3:34           ` Matthew Brost [this message]
2022-02-09  3:41             ` Matthew Brost
2022-02-09  3:51             ` Teres Alexis, Alan Previn
2022-02-14 19:20               ` Teres Alexis, Alan Previn
2022-02-15  1:22                 ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 07/10] drm/i915/guc: Extract GuC error capture lists on G2H notification Alan Previn
2022-02-11  1:36   ` Umesh Nerlige Ramappa
2022-02-13 19:47     ` Teres Alexis, Alan Previn
2022-02-17 19:21       ` Umesh Nerlige Ramappa
2022-02-25  7:21         ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 08/10] drm/i915/guc: Plumb GuC-capture into gpu_coredump Alan Previn
2022-02-11  2:11   ` Umesh Nerlige Ramappa
2022-02-12  0:31     ` Teres Alexis, Alan Previn
2022-02-12  0:38     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 09/10] drm/i915/guc: Follow legacy register names Alan Previn
2022-02-03 19:09   ` Matthew Brost
2022-02-04 18:53     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 10/10] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2022-02-07 21:43   ` Umesh Nerlige Ramappa
2022-01-26 18:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support (rev5) Patchwork
2022-01-26 18:52 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-26 19:25 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220209033436.GA25278@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox