Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: daniel.vetter@intel.com, intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 13/20] drm/i915/guc: Relax CTB response timeout
Date: Fri, 4 Jun 2021 11:35:39 -0700	[thread overview]
Message-ID: <20210604183539.GA26392@sdutt-i7> (raw)
In-Reply-To: <YLnlQyPJZygHTHxk@phenom.ffwll.local>

On Fri, Jun 04, 2021 at 10:33:07AM +0200, Daniel Vetter wrote:
> On Wed, Jun 02, 2021 at 10:16:23PM -0700, Matthew Brost wrote:
> > From: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > 
> > In upcoming patch we will allow more CTB requests to be sent in
> > parallel to the GuC for processing, so we shouldn't assume any more
> > that GuC will always reply without 10ms.
> > 
> > Use bigger value from CONFIG_DRM_I915_GUC_CTB_TIMEOUT instead.
> > 
> > v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
> > 
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
> So this is a rant, but for upstream we really need to do better than
> internal:
> 
> - The driver must work by default in the optimal configuration.
> 
> - Any config change that we haven't validated _must_ taint the kernel
>   (this is especially for module options, but also for config settings)
> 
> - Config need a real reason beyond "was useful for bring-up".
> 
> Our internal tree is an absolute disaster right now, with multi-line
> kernel configs (different on each platform) and bespoke kernel config or
> the driver just fails. We're the expert on our own hw, we should know how
> it works, not offload that to users essentially asking them "how shitty do
> you think Intel hw is in responding timely".
> 
> Yes I know there's a lot of these there already, they don't make a lot of
> sense either.
> 
> Except if there's a real reason for this (aside from us just offloading
> testing to our users instead of doing it ourselves properly) I think we
> should hardcode this, with a comment explaining why. Maybe with a switch
> between the PF/VF case once that's landed.
> 
> > ---
> >  drivers/gpu/drm/i915/Kconfig.profile      | 10 ++++++++++
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  5 ++++-
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > index 39328567c200..0d5475b5f28a 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -38,6 +38,16 @@ config DRM_I915_USERFAULT_AUTOSUSPEND
> >  	  May be 0 to disable the extra delay and solely use the device level
> >  	  runtime pm autosuspend delay tunable.
> >  
> > +config DRM_I915_GUC_CTB_TIMEOUT
> > +	int "How long to wait for the GuC to make forward progress on CTBs (ms)"
> > +	default 1500 # milliseconds
> > +	range 10 60000
> 
> Also range is definitely off, drm/scheduler will probably nuke you
> beforehand :-)
> 
> That's kinda another issue I have with all these kconfig knobs: Maybe we
> need a knob for "relax with reset attempts, my workloads overload my gpus
> routinely", which then scales _all_ timeouts proportionally. But letting
> the user set them all, with silly combiniations like resetting the
> workload before heartbeat or stuff like that doesn't make much sense.
>

Yes, the code as is the user could do some wacky stuff that doesn't make
sense at all.
 
> Anyway, tiny patch so hopefully I can leave this one out for now until
> we've closed this.

No issue leaving this out as blocking CTBs are never really used anyways
until SRIOV aside from setup / debugging. That being said, we might
still want a higher hardcoded value in the meantime, perhaps around a
second. I can follow up on that if needed.

Matt

> -Daniel
> 
> > +	help
> > +	  Configures the default timeout waiting for GuC the to make forward
> > +	  progress on CTBs. e.g. Waiting for a response to a requeset.
> > +
> > +	  A range of 10 ms to 60000 ms is allowed.
> > +
> >  config DRM_I915_HEARTBEAT_INTERVAL
> >  	int "Interval between heartbeat pulses (ms)"
> >  	default 2500 # milliseconds
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 916c2b80c841..cf1fb09ef766 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -436,6 +436,7 @@ static int ct_write(struct intel_guc_ct *ct,
> >   */
> >  static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  {
> > +	long timeout;
> >  	int err;
> >  
> >  	/*
> > @@ -443,10 +444,12 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	 * up to that length of time, then switch to a slower sleep-wait loop.
> >  	 * No GuC command should ever take longer than 10ms.
> >  	 */
> > +	timeout = CONFIG_DRM_I915_GUC_CTB_TIMEOUT;
> > +
> >  #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
> >  	err = wait_for_us(done, 10);
> >  	if (err)
> > -		err = wait_for(done, 10);
> > +		err = wait_for(done, timeout);
> >  #undef done
> >  
> >  	if (unlikely(err))
> > -- 
> > 2.28.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-06-04 18:42 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03  5:16 [Intel-gfx] [PATCH 00/20] GuC CTBs changes + a few misc patches Matthew Brost
2021-06-03  5:10 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-06-03  5:11 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-06-03  5:16 ` [Intel-gfx] [PATCH 01/20] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 02/20] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 03/20] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 04/20] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 05/20] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 06/20] drm/i915/guc: Drop guc->interrupts.enabled Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 07/20] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 08/20] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
2021-06-03 21:35   ` Daniel Vetter
2021-06-04  2:02     ` Matthew Brost
2021-06-04  8:11       ` Daniel Vetter
2021-06-03  5:16 ` [Intel-gfx] [PATCH 09/20] drm/i915/guc: Only rely on own CTB size Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 10/20] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 11/20] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
2021-06-03  7:25   ` kernel test robot
2021-06-03 21:37     ` Daniel Vetter
2021-06-03 22:44       ` [Intel-gfx] [PATCH 1/2] " Matthew Brost
2021-06-03 22:44         ` [Intel-gfx] [PATCH 2/2] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
2021-06-03 23:04       ` [Intel-gfx] [v3 PATCH 1/2] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
2021-06-03 23:04         ` [Intel-gfx] [v3 PATCH 2/2] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
2021-06-04  8:20           ` Daniel Vetter
2021-06-04  8:49             ` Michal Wajdeczko
2021-06-03  5:16 ` [Intel-gfx] [PATCH 12/20] " Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 13/20] drm/i915/guc: Relax CTB response timeout Matthew Brost
2021-06-04  8:33   ` Daniel Vetter
2021-06-04 18:35     ` Matthew Brost [this message]
2021-06-09 13:24       ` Daniel Vetter
2021-06-03  5:16 ` [Intel-gfx] [PATCH 14/20] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
2021-06-04  8:35   ` Daniel Vetter
2021-06-03  5:16 ` [Intel-gfx] [PATCH 15/20] drm/i915/guc: Ensure H2G buffer updates visible before tail update Matthew Brost
2021-06-03  9:44   ` Michal Wajdeczko
2021-06-03 16:10     ` Matthew Brost
2021-06-04  8:39       ` Daniel Vetter
2021-06-03  5:16 ` [Intel-gfx] [PATCH 16/20] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 17/20] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 18/20] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 19/20] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
2021-06-03  5:16 ` [Intel-gfx] [PATCH 20/20] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
2021-06-04  8:44   ` Daniel Vetter
2021-06-04 18:12     ` Matthew Brost
2021-06-03  5:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success for GuC CTBs changes + a few misc patches Patchwork
2021-06-03  6:50 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210604183539.GA26392@sdutt-i7 \
    --to=matthew.brost@intel.com \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox