Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Alan Previn <alan.previn.teres.alexis@intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: Mousumi Jana <mousumi.jana@intel.com>,
	intel.com@freedesktop.org, dri-devel@lists.freedesktop.org,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending
Date: Wed, 27 Sep 2023 10:02:33 +0100	[thread overview]
Message-ID: <9ca17c5c-7bb4-ff6b-69cb-3983299729c1@linux.intel.com> (raw)
In-Reply-To: <20230926190518.105393-4-alan.previn.teres.alexis@intel.com>


On 26/09/2023 20:05, Alan Previn wrote:
> When suspending, add a timeout when calling
> intel_gt_pm_wait_for_idle else if we have a lost
> G2H event that holds a wakeref (which would be
> indicative of a bug elsewhere in the driver),
> driver will at least complete the suspend-resume
> cycle, (albeit not hitting all the targets for
> low power hw counters), instead of hanging in the kernel.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Tested-by: Mousumi Jana <mousumi.jana@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c     |  6 +++++-
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  7 ++++++-
>   drivers/gpu/drm/i915/intel_wakeref.c      | 14 ++++++++++----
>   drivers/gpu/drm/i915/intel_wakeref.h      |  6 ++++--
>   5 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 84a75c95f3f7..9c6151b78e1d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -687,7 +687,7 @@ void intel_engines_release(struct intel_gt *gt)
>   		if (!engine->release)
>   			continue;
>   
> -		intel_wakeref_wait_for_idle(&engine->wakeref);
> +		intel_wakeref_wait_for_idle(&engine->wakeref, 0);
>   		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
>   
>   		engine->release(engine);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 59b5658a17fb..820217c06dc7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -289,6 +289,7 @@ int intel_gt_resume(struct intel_gt *gt)
>   
>   static void wait_for_suspend(struct intel_gt *gt)
>   {
> +	int timeout_ms = CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT ? : 10000;

CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT is in ns so assigning it to _ms is 
a bit to arbitrary.

Why not the existing I915_GT_SUSPEND_IDLE_TIMEOUT for instance?

>   	/*
>   	 * On rare occasions, we've observed the fence completion trigger
>   	 * free_engines asynchronously via rcu_call. Ensure those are done.
> @@ -308,7 +309,10 @@ static void wait_for_suspend(struct intel_gt *gt)
>   		intel_gt_retire_requests(gt);
>   	}
>   
> -	intel_gt_pm_wait_for_idle(gt);
> +	/* we are suspending, so we shouldn't be waiting forever */
> +	if (intel_gt_pm_wait_timeout_for_idle(gt, timeout_ms) == -ETIMEDOUT)
> +		gt_warn(gt, "bailing from %s after %d milisec timeout\n",
> +			__func__, timeout_ms);

Does the timeout in intel_gt_pm_wait_timeout_for_idle always comes in 
pair with the timeout first in intel_gt_wait_for_idle?

Also, is the timeout here hit from the intel_gt_suspend_prepare, 
intel_gt_suspend_late, or can be both?

Main concern is that we need to be sure there are no possible 
ill-effects, like letting the GPU/GuC scribble on some memory we 
unmapped (or will unmap), having let the suspend continue after timing 
out, and not perhaps doing the forced wedge like wait_for_suspend() does 
on the existing timeout path.

Would it be possible to handle the lost G2H events directly in the 
respective component instead of here? Like apply the timeout during the 
step which explicitly idles the CT for suspend (presumably that 
exists?), and so cleanup from there once declared a lost event.

Regards,

Tvrtko

>   }
>   
>   void intel_gt_suspend_prepare(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index 6c9a46452364..5358acc2b5b1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   
>   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>   {
> -	return intel_wakeref_wait_for_idle(&gt->wakeref);
> +	return intel_wakeref_wait_for_idle(&gt->wakeref, 0);
> +}
> +
> +static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int timeout_ms)
> +{
> +	return intel_wakeref_wait_for_idle(&gt->wakeref, timeout_ms);
>   }
>   
>   void intel_gt_pm_init_early(struct intel_gt *gt);
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
> index 718f2f1b6174..383a37521415 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.c
> +++ b/drivers/gpu/drm/i915/intel_wakeref.c
> @@ -111,14 +111,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
>   			 "wakeref.work", &key->work, 0);
>   }
>   
> -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
> +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
>   {
> -	int err;
> +	int err = 0;
>   
>   	might_sleep();
>   
> -	err = wait_var_event_killable(&wf->wakeref,
> -				      !intel_wakeref_is_active(wf));
> +	if (!timeout_ms)
> +		err = wait_var_event_killable(&wf->wakeref,
> +					      !intel_wakeref_is_active(wf));
> +	else if (wait_var_event_timeout(&wf->wakeref,
> +					!intel_wakeref_is_active(wf),
> +					msecs_to_jiffies(timeout_ms)) < 1)
> +		err = -ETIMEDOUT;
> +
>   	if (err)
>   		return err;
>   
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
> index ec881b097368..302694a780d2 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.h
> +++ b/drivers/gpu/drm/i915/intel_wakeref.h
> @@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf)
>   /**
>    * intel_wakeref_wait_for_idle: Wait until the wakeref is idle
>    * @wf: the wakeref
> + * @timeout_ms: Timeout in ms, 0 means never timeout.
>    *
>    * Wait for the earlier asynchronous release of the wakeref. Note
>    * this will wait for any third party as well, so make sure you only wait
>    * when you have control over the wakeref and trust no one else is acquiring
>    * it.
>    *
> - * Return: 0 on success, error code if killed.
> + * Returns 0 on success, -ETIMEDOUT upon a timeout, or the unlikely
> + * error propagation from wait_var_event_killable if timeout_ms is 0.
>    */
> -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf);
> +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms);
>   
>   struct intel_wakeref_auto {
>   	struct drm_i915_private *i915;

  reply	other threads:[~2023-09-27  9:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 19:05 [Intel-gfx] [PATCH v4 0/3] Resolve suspend-resume racing with GuC destroy-context-worker Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 1/3] drm/i915/guc: Flush context destruction worker at suspend Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-loss Alan Previn
2023-10-04  6:34   ` Gupta, Anshuman
2023-10-04 16:46     ` Teres Alexis, Alan Previn
2023-09-26 19:05 ` [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending Alan Previn
2023-09-27  9:02   ` Tvrtko Ursulin [this message]
2023-09-27 16:36     ` Teres Alexis, Alan Previn
2023-09-28 12:46       ` Tvrtko Ursulin
2023-10-04 17:59         ` Teres Alexis, Alan Previn
2023-10-25 12:58           ` Tvrtko Ursulin
2023-11-13 17:57             ` Teres Alexis, Alan Previn
2023-11-14 17:27               ` Tvrtko Ursulin
2023-11-14 17:36                 ` Rodrigo Vivi
2023-11-14 19:34                   ` Teres Alexis, Alan Previn
2023-11-14 17:37                 ` Teres Alexis, Alan Previn
2023-11-14 17:52                   ` Tvrtko Ursulin
2023-11-14 19:48                     ` Teres Alexis, Alan Previn
2023-11-16 10:19                       ` Tvrtko Ursulin
2023-09-27  1:23 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Resolve suspend-resume racing with GuC destroy-context-worker (rev4) Patchwork
2023-09-27  1:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-09-27 13:17 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ca17c5c-7bb4-ff6b-69cb-3983299729c1@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel.com@freedesktop.org \
    --cc=mousumi.jana@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox