From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Brost <matthew.brost@intel.com>
Cc: daniel.vetter@intel.com, intel-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 13/20] drm/i915/guc: Relax CTB response timeout
Date: Fri, 4 Jun 2021 10:33:07 +0200 [thread overview]
Message-ID: <YLnlQyPJZygHTHxk@phenom.ffwll.local> (raw)
In-Reply-To: <20210603051630.2635-14-matthew.brost@intel.com>
On Wed, Jun 02, 2021 at 10:16:23PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko <michal.wajdeczko@intel.com>
>
> In upcoming patch we will allow more CTB requests to be sent in
> parallel to the GuC for processing, so we shouldn't assume any more
> that GuC will always reply without 10ms.
>
> Use bigger value from CONFIG_DRM_I915_GUC_CTB_TIMEOUT instead.
>
> v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
So this is a rant, but for upstream we really need to do better than
internal:
- The driver must work by default in the optimal configuration.
- Any config change that we haven't validated _must_ taint the kernel
(this is especially for module options, but also for config settings)
- Config need a real reason beyond "was useful for bring-up".
Our internal tree is an absolute disaster right now, with multi-line
kernel configs (different on each platform) and bespoke kernel config or
the driver just fails. We're the expert on our own hw, we should know how
it works, not offload that to users essentially asking them "how shitty do
you think Intel hw is in responding timely".
Yes I know there's a lot of these there already, they don't make a lot of
sense either.
Except if there's a real reason for this (aside from us just offloading
testing to our users instead of doing it ourselves properly) I think we
should hardcode this, with a comment explaining why. Maybe with a switch
between the PF/VF case once that's landed.
> ---
> drivers/gpu/drm/i915/Kconfig.profile | 10 ++++++++++
> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index 39328567c200..0d5475b5f28a 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -38,6 +38,16 @@ config DRM_I915_USERFAULT_AUTOSUSPEND
> May be 0 to disable the extra delay and solely use the device level
> runtime pm autosuspend delay tunable.
>
> +config DRM_I915_GUC_CTB_TIMEOUT
> + int "How long to wait for the GuC to make forward progress on CTBs (ms)"
> + default 1500 # milliseconds
> + range 10 60000
Also range is definitely off, drm/scheduler will probably nuke you
beforehand :-)
That's kinda another issue I have with all these kconfig knobs: Maybe we
need a knob for "relax with reset attempts, my workloads overload my gpus
routinely", which then scales _all_ timeouts proportionally. But letting
the user set them all, with silly combiniations like resetting the
workload before heartbeat or stuff like that doesn't make much sense.
Anyway, tiny patch so hopefully I can leave this one out for now until
we've closed this.
-Daniel
> + help
> + Configures the default timeout waiting for GuC the to make forward
> + progress on CTBs. e.g. Waiting for a response to a requeset.
> +
> + A range of 10 ms to 60000 ms is allowed.
> +
> config DRM_I915_HEARTBEAT_INTERVAL
> int "Interval between heartbeat pulses (ms)"
> default 2500 # milliseconds
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 916c2b80c841..cf1fb09ef766 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -436,6 +436,7 @@ static int ct_write(struct intel_guc_ct *ct,
> */
> static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> {
> + long timeout;
> int err;
>
> /*
> @@ -443,10 +444,12 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> * up to that length of time, then switch to a slower sleep-wait loop.
> * No GuC command should ever take longer than 10ms.
> */
> + timeout = CONFIG_DRM_I915_GUC_CTB_TIMEOUT;
> +
> #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
> err = wait_for_us(done, 10);
> if (err)
> - err = wait_for(done, 10);
> + err = wait_for(done, timeout);
> #undef done
>
> if (unlikely(err))
> --
> 2.28.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-06-04 8:33 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-03 5:16 [Intel-gfx] [PATCH 00/20] GuC CTBs changes + a few misc patches Matthew Brost
2021-06-03 5:10 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-06-03 5:11 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-06-03 5:16 ` [Intel-gfx] [PATCH 01/20] drm/i915/guc: skip disabling CTBs before sanitizing the GuC Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 02/20] drm/i915/guc: use probe_error log for CT enablement failure Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 03/20] drm/i915/guc: enable only the user interrupt when using GuC submission Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 04/20] drm/i915/guc: Remove sample_forcewake h2g action Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 05/20] drm/i915/guc: Keep strict GuC ABI definitions Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 06/20] drm/i915/guc: Drop guc->interrupts.enabled Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 07/20] drm/i915/guc: Stop using fence/status from CTB descriptor Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 08/20] drm/i915: Promote ptrdiff() to i915_utils.h Matthew Brost
2021-06-03 21:35 ` Daniel Vetter
2021-06-04 2:02 ` Matthew Brost
2021-06-04 8:11 ` Daniel Vetter
2021-06-03 5:16 ` [Intel-gfx] [PATCH 09/20] drm/i915/guc: Only rely on own CTB size Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 10/20] drm/i915/guc: Don't repeat CTB layout calculations Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 11/20] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
2021-06-03 7:25 ` kernel test robot
2021-06-03 21:37 ` Daniel Vetter
2021-06-03 22:44 ` [Intel-gfx] [PATCH 1/2] " Matthew Brost
2021-06-03 22:44 ` [Intel-gfx] [PATCH 2/2] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
2021-06-03 23:04 ` [Intel-gfx] [v3 PATCH 1/2] drm/i915/guc: Replace CTB array with explicit members Matthew Brost
2021-06-03 23:04 ` [Intel-gfx] [v3 PATCH 2/2] drm/i915/guc: Update sizes of CTB buffers Matthew Brost
2021-06-04 8:20 ` Daniel Vetter
2021-06-04 8:49 ` Michal Wajdeczko
2021-06-03 5:16 ` [Intel-gfx] [PATCH 12/20] " Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 13/20] drm/i915/guc: Relax CTB response timeout Matthew Brost
2021-06-04 8:33 ` Daniel Vetter [this message]
2021-06-04 18:35 ` Matthew Brost
2021-06-09 13:24 ` Daniel Vetter
2021-06-03 5:16 ` [Intel-gfx] [PATCH 14/20] drm/i915/guc: Start protecting access to CTB descriptors Matthew Brost
2021-06-04 8:35 ` Daniel Vetter
2021-06-03 5:16 ` [Intel-gfx] [PATCH 15/20] drm/i915/guc: Ensure H2G buffer updates visible before tail update Matthew Brost
2021-06-03 9:44 ` Michal Wajdeczko
2021-06-03 16:10 ` Matthew Brost
2021-06-04 8:39 ` Daniel Vetter
2021-06-03 5:16 ` [Intel-gfx] [PATCH 16/20] drm/i915/guc: Stop using mutex while sending CTB messages Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 17/20] drm/i915/guc: Don't receive all G2H messages in irq handler Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 18/20] drm/i915/guc: Always copy CT message to new allocation Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 19/20] drm/i915/guc: Early initialization of GuC send registers Matthew Brost
2021-06-03 5:16 ` [Intel-gfx] [PATCH 20/20] drm/i915/guc: Use guc_class instead of engine_class in fw interface Matthew Brost
2021-06-04 8:44 ` Daniel Vetter
2021-06-04 18:12 ` Matthew Brost
2021-06-03 5:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success for GuC CTBs changes + a few misc patches Patchwork
2021-06-03 6:50 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YLnlQyPJZygHTHxk@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=daniel.vetter@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox