All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH v5 06/10] drm/i915/guc: Update GuC's log-buffer-state access for error capture.
Date: Tue, 8 Feb 2022 19:41:03 -0800	[thread overview]
Message-ID: <20220209034103.GA25307@jons-linux-dev-box> (raw)
In-Reply-To: <20220209033436.GA25278@jons-linux-dev-box>

On Tue, Feb 08, 2022 at 07:34:37PM -0800, Matthew Brost wrote:
> On Tue, Feb 08, 2022 at 02:55:07PM -0800, Teres Alexis, Alan Previn wrote:
> > 
> > On Tue, 2022-02-08 at 14:18 -0800, Matthew Brost wrote:
> > > On Tue, Feb 08, 2022 at 11:38:13AM -0800, Teres Alexis, Alan Previn wrote:
> > > > Hi Matt, thank you for taking the time to review the codes.
> > > > Answering your question inline below.
> > > >
> > > >
> > > > On Fri, 2022-02-04 at 10:19 -0800, Matthew Brost wrote:
> > > > > On Wed, Jan 26, 2022 at 02:48:18AM -0800, Alan Previn wrote:
> > > > > > GuC log buffer regions for debug-log-events, crash-dumps and
> > > > > > error-state-capture are all a single bo allocation that includes
> > > > > > the guc_log_buffer_state structures.
> > > > > >
> > > > > > Since the error-capture region is accessed with high priority at non-
> > > > > > deterministic times (as part of gpu coredump) while the debug-log-event
> > > > > > region is populated and accessed with different priorities, timings and
> > > > > > consumers, let's split out separate locks for buffer-state accesses
> > > > > > of each region.
> > > > > >
> > > > > > Also, ensure a global mapping is made up front for the entire bo
> > > > > > throughout GuC operation so that dynamic mapping and unmapping isn't
> > > > > > required for error capture log access if relay-logging isn't running.
> > > > > >
> > > > > > Additionally, while here, make some readibility improvements:
> > > > > > 1. change previous function names with "capture_logs" to
> > > > > >    "copy_debug_logs" to help make the distinction clearer.
> > > > > > 2. Update the guc log region mapping comments to order them
> > > > > >    according to the enum definition as per the GuC interface.
> > > > > >
> > > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  47 ++++++
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 +
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 135 +++++++++++-------
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  16 ++-
> > > > > >  5 files changed, 141 insertions(+), 60 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > index 4e819853ec2e..be1ad7fa2bf8 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > > > @@ -34,6 +34,8 @@ struct intel_guc {
> > > > > >     struct intel_uc_fw fw;
> > > > > >     /** @log: sub-structure containing GuC log related data and objects */
> > > > > >     struct intel_guc_log log;
> > > > > > +   /** @log_state: states and locks for each subregion of GuC's log buffer */
> > > > > > +   struct intel_guc_log_stats log_state[GUC_MAX_LOG_BUFFER];
> > > > >
> > > > > Why move this? This still probably should be sub-structure of
> > > > > intel_guc_log. Most of the access is from functions that pass in
> > > > > intel_guc_log, then retrieve the GuC object, only to access this new
> > > > > intel_guc_log_stats object. That layering seems wrong, if the argument
> > > > > to a function is intel_guc_log it should really try to access members
> > > > > within that object or below. Obv some exceptions exist but here it seems
> > > > > clear to me this is in the wrong place.
> > > > >
> > > > So the reasoning i had was because because intel_guc_log module only managed
> > > > guc-relay-logging and guc-log-dumping for log-events but allocates the buffer
> > > > for 3 separate subregion/usages (i.e. log-events, crash-dump and error-capture).
> > > > That said, I did not want intel_guc_capture and relay-logging (or log-dumping)
> > > > to have to contend with the same lock because these two subregions are completely
> > > > independant of each other in terms of when they are being accessed and why.
> > > >
> > >
> > > All that is fine, I just think the 'struct intel_guc_log_stats' should
> > > be sub-substure of struct intel_guc_log.
> > >
> > > > However, after the redesign of rev5 (this rev), I now believe the intel_guc_capture
> > > > module does not require a lock because its only ever accessing its log
> > > > subregion in response to the ctb handler functions that run out of the same queue.
> > > > As we know intel_guc_capture reacts to G2H-error-capture-notification and G2H-context-reset
> > > > (that trickles into i915_gpu_coredump). At the point of i915_error_state dump,
> > > > intel_guc_capture module does not access the buffer - it merely dumps the already-parsed
> > > > and engine-dump-node (that was detached from error-capture's internal linked-list
> > > > and attached to the gpu_coredump structure in the second G2H above).
> > > >
> > > > And of course, intel_guc_log only ever accesses the log-events subregion
> > > > and never the error-capture regions.
> > > >
> > > > That said, i could revert the lock structure to the original and not have
> > > > intel_guc_capture using locks. What do you think?
> > > >
> > >
> > > Again my comment has nothing to do with locking, it is where the
> > > structure lives.
> > >
> > >
> > IMHO, based on my understanding of the codes and the GuC interface,
> > i see intel_guc_log and intel_guc_capture as 2 subsystems at equal level
> > under the intel_guc framework. Both these subsystems have completely independent
> > functions with completely separate input streams with completely separate
> > content and usage. They do happen to share the same bo but have independent
> > regions. That said, I believe the stats array-structure and even the vma
> > ptr should go into the parent - i.e. intel_guc, the parent of both intel_guc_log and
> > intel_guc_capture. Alernatively, each subsystem that has its own stats structure.
> > 
> > However, without the need for the locks, i guess i dont need to change anything
> > other than have intel_guc_log skip over the capture region and let intel_guc_capture
> > deal with its region independantly without sharing any member variable
> > from intel_guc_log.
> > 
> > I am admittedly new to the GuC infrastructure so please do let me know if I am mistaken
> > and if intel_guc_capture is supposed to be a subsystem of intel_guc_log.
> > 
> 
> If the object lives at the GuC level, operate on it at the GuC level.
> 
> e.g.
> intel_guc_log_init_early calls mutex init on guc->log_state - that is
> wrong and breaks the layering. intel_guc_log_init_early should only
> operate on guc_log or below objects, not above it.
> 
> The key here is consisteny, if the GuC level owns the object it should
> be initialized there + passed into the lower levels if possible. The
> lower levels should avoid reaching back to GuC level for objects
> whenever possible.
> 

Let me clarifiy this even more, reaching back to get an object to pass
to another fucntion - ok, reaching back to read something - meh,
reaching back to modify something - no.

Matt 

> You could have 2 intel_guc_log_stats objects below the guc_log object
> and 1 intel_guc_log_stats object for capture at the GuC level. That's
> likely the right approach here.
> 
> I know this is kinda a nit but actually important to have a well
> structured / layered driver. The i915 isn't a great example of this but
> we should avoid making this worse.
> 
> Matt
> 
> > > Matt
> > >
> > > > ...alan
> > > >
> > > > > Another nit, I'd personally break this out into multiple patches.
> > > > >
> > > > > e.g. 1 to rename relay log functions, 1 introducing intel_guc_log_stats
> > > > > + lock, and 1 adding intel_guc_capture_output_min_size_est. Maybe I'm
> > > > > missing another patch or two in there.
> > > > >
> > > > > Not a blocker but it does make reviews easier.
> > > > >
> > > > Will do.
> > > >
> > > > > Matt
> > > > >
> > > > > >     /** @ct: the command transport communication channel */
> > > > > >     struct intel_guc_ct ct;
> > > > > >     /** @slpc: sub-structure containing SLPC related data and objects */
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > > index 70d2ee841289..e7f99d051636 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > > > > @@ -651,6 +651,53 @@ int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u3
> > > > > >     return PAGE_ALIGN(alloc_size);
> > > > > >  }
> > > > > >
> > > > > > 2.25.1
> > > > > >
> > 

  reply	other threads:[~2022-02-09  3:47 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 10:48 [Intel-gfx] [PATCH v5 00/10] Add GuC Error Capture Support Alan Previn
2022-01-26 10:48 ` Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 01/10] drm/i915/guc: Update GuC ADS size for error capture lists Alan Previn
2022-01-26 18:09   ` Jani Nikula
2022-01-26 18:15   ` Jani Nikula
2022-01-26 22:46   ` Lucas De Marchi
2022-02-03 19:03     ` Matthew Brost
2022-02-03 20:04       ` Lucas De Marchi
2022-02-03 20:37         ` Teres Alexis, Alan Previn
2022-02-03 20:40           ` Teres Alexis, Alan Previn
2022-02-03 21:40             ` Lucas De Marchi
2022-02-03 21:58               ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 02/10] drm/i915/guc: Add XE_LP registers for GuC error state capture Alan Previn
2022-01-26 18:13   ` Jani Nikula
2022-01-26 21:46     ` Teres Alexis, Alan Previn
2022-01-27  9:30       ` Jani Nikula
2022-01-28 16:54         ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 03/10] drm/i915/guc: Add DG2 " Alan Previn
2022-02-05  1:28   ` Umesh Nerlige Ramappa
2022-02-10  5:17     ` Teres Alexis, Alan Previn
2022-02-11 19:24     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 04/10] drm/i915/guc: Add Gen9 " Alan Previn
2022-02-07 19:14   ` Umesh Nerlige Ramappa
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 05/10] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 06/10] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2022-01-27  4:26   ` Teres Alexis, Alan Previn
2022-02-04 18:19   ` Matthew Brost
2022-02-08 19:38     ` Teres Alexis, Alan Previn
2022-02-08 22:18       ` Matthew Brost
2022-02-08 22:55         ` Teres Alexis, Alan Previn
2022-02-09  3:34           ` Matthew Brost
2022-02-09  3:41             ` Matthew Brost [this message]
2022-02-09  3:51             ` Teres Alexis, Alan Previn
2022-02-14 19:20               ` Teres Alexis, Alan Previn
2022-02-15  1:22                 ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 07/10] drm/i915/guc: Extract GuC error capture lists on G2H notification Alan Previn
2022-02-11  1:36   ` Umesh Nerlige Ramappa
2022-02-13 19:47     ` Teres Alexis, Alan Previn
2022-02-17 19:21       ` Umesh Nerlige Ramappa
2022-02-25  7:21         ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 08/10] drm/i915/guc: Plumb GuC-capture into gpu_coredump Alan Previn
2022-02-11  2:11   ` Umesh Nerlige Ramappa
2022-02-12  0:31     ` Teres Alexis, Alan Previn
2022-02-12  0:38     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 09/10] drm/i915/guc: Follow legacy register names Alan Previn
2022-02-03 19:09   ` Matthew Brost
2022-02-04 18:53     ` Teres Alexis, Alan Previn
2022-01-26 10:48 ` [Intel-gfx] [PATCH v5 10/10] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2022-02-07 21:43   ` Umesh Nerlige Ramappa
2022-01-26 18:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support (rev5) Patchwork
2022-01-26 18:52 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-26 19:25 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220209034103.GA25307@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.