All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jani Nikula <jani.nikula@linux.intel.com>
To: John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches
Date: Wed, 02 Nov 2022 14:12:00 +0200	[thread overview]
Message-ID: <87k04d7dyn.fsf@intel.com> (raw)
In-Reply-To: <20221101235053.1650364-1-John.C.Harrison@Intel.com>

On Tue, 01 Nov 2022, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> At the end of each test, IGT does a drop caches call via sysfs with

sysfs?

> special flags set. One of the possible paths waits for idle with an
> infinite timeout. That causes problems for debugging issues when CI
> catches a "can't go idle" test failure. Best case, the CI system times
> out (after 90s), attempts a bunch of state dump actions and then
> reboots the system to recover it. Worst case, the CI system can't do
> anything at all and then times out (after 1000s) and simply reboots.
> Sometimes a serial port log of dmesg might be available, sometimes not.
>
> So rather than making life hard for ourselves, change the timeout to
> be 10s rather than infinite. Also, trigger the standard
> wedge/reset/recover sequence so that testing can continue with a
> working system (if possible).
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ae987e92251dd..9d916fbbfc27c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -641,6 +641,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
>  		  DROP_RESET_ACTIVE | \
>  		  DROP_RESET_SEQNO | \
>  		  DROP_RCU)
> +
> +#define DROP_IDLE_TIMEOUT	(HZ * 10)

I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only used
here.

I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.

I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in intel_gt_pm.c.

I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.

My head spins.


BR,
Jani.


> +
>  static int
>  i915_drop_caches_get(void *data, u64 *val)
>  {
> @@ -661,7 +664,9 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>  		intel_gt_retire_requests(gt);
>  
>  	if (val & (DROP_IDLE | DROP_ACTIVE)) {
> -		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
> +		ret = intel_gt_wait_for_idle(gt, DROP_IDLE_TIMEOUT);
> +		if (ret == -ETIME)
> +			intel_gt_set_wedged(gt);
>  		if (ret)
>  			return ret;
>  	}

-- 
Jani Nikula, Intel Open Source Graphics Center

WARNING: multiple messages have this Message-ID (diff)
From: Jani Nikula <jani.nikula@linux.intel.com>
To: John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org,
	John Harrison <John.C.Harrison@Intel.com>
Subject: Re: [PATCH] drm/i915: Don't wait forever in drop_caches
Date: Wed, 02 Nov 2022 14:12:00 +0200	[thread overview]
Message-ID: <87k04d7dyn.fsf@intel.com> (raw)
In-Reply-To: <20221101235053.1650364-1-John.C.Harrison@Intel.com>

On Tue, 01 Nov 2022, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> At the end of each test, IGT does a drop caches call via sysfs with

sysfs?

> special flags set. One of the possible paths waits for idle with an
> infinite timeout. That causes problems for debugging issues when CI
> catches a "can't go idle" test failure. Best case, the CI system times
> out (after 90s), attempts a bunch of state dump actions and then
> reboots the system to recover it. Worst case, the CI system can't do
> anything at all and then times out (after 1000s) and simply reboots.
> Sometimes a serial port log of dmesg might be available, sometimes not.
>
> So rather than making life hard for ourselves, change the timeout to
> be 10s rather than infinite. Also, trigger the standard
> wedge/reset/recover sequence so that testing can continue with a
> working system (if possible).
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ae987e92251dd..9d916fbbfc27c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -641,6 +641,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
>  		  DROP_RESET_ACTIVE | \
>  		  DROP_RESET_SEQNO | \
>  		  DROP_RCU)
> +
> +#define DROP_IDLE_TIMEOUT	(HZ * 10)

I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only used
here.

I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.

I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in intel_gt_pm.c.

I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.

My head spins.


BR,
Jani.


> +
>  static int
>  i915_drop_caches_get(void *data, u64 *val)
>  {
> @@ -661,7 +664,9 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>  		intel_gt_retire_requests(gt);
>  
>  	if (val & (DROP_IDLE | DROP_ACTIVE)) {
> -		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
> +		ret = intel_gt_wait_for_idle(gt, DROP_IDLE_TIMEOUT);
> +		if (ret == -ETIME)
> +			intel_gt_set_wedged(gt);
>  		if (ret)
>  			return ret;
>  	}

-- 
Jani Nikula, Intel Open Source Graphics Center

  parent reply	other threads:[~2022-11-02 12:12 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 23:50 [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches John.C.Harrison
2022-11-01 23:50 ` John.C.Harrison
2022-11-02  0:10 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for " Patchwork
2022-11-02  0:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-02  9:13 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-11-02 12:12 ` Jani Nikula [this message]
2022-11-02 12:12   ` [PATCH] " Jani Nikula
2022-11-02 14:20   ` [Intel-gfx] " Tvrtko Ursulin
2022-11-03  1:33     ` John Harrison
2022-11-03  9:18       ` Tvrtko Ursulin
2022-11-03  9:38         ` Tvrtko Ursulin
2022-11-03 19:16           ` John Harrison
2022-11-04 10:01             ` Tvrtko Ursulin
2022-11-04 17:45               ` John Harrison
2022-11-04 17:45                 ` John Harrison
2022-11-07 14:09                 ` Tvrtko Ursulin
2022-11-07 14:09                   ` Tvrtko Ursulin
2022-11-07 19:45                   ` John Harrison
2022-11-07 19:45                     ` John Harrison
2022-11-08  9:08                     ` Tvrtko Ursulin
2022-11-08  9:08                       ` Tvrtko Ursulin
2022-11-08 19:37                       ` John Harrison
2022-11-08 19:37                         ` John Harrison
2022-11-09 11:35                         ` Tvrtko Ursulin
2022-11-09 11:35                           ` Tvrtko Ursulin
2022-11-10  6:20                           ` John Harrison
2022-11-10  6:20                             ` John Harrison
2022-11-03 19:37         ` John Harrison
2022-11-03 10:45       ` Jani Nikula
2022-11-03 19:39         ` John Harrison
  -- strict thread matches above, loose matches on Subject: below --
2022-11-03  1:35 John.C.Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k04d7dyn.fsf@intel.com \
    --to=jani.nikula@linux.intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=John.C.Harrison@Intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.