Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 19/24] drm/i915/selftests: Be a little more lenient for reset workers
Date: Fri, 28 Feb 2020 17:38:42 +0200	[thread overview]
Message-ID: <87wo867md9.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20200228082330.2411941-19-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Give the reset worker a kick before losing help when waiting for hang
> recovery, as the CPU scheduler is a little unreliable.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_lrc.c | 74 ++++++++++++++++++--------
>  1 file changed, 52 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 95da6b880e3f..af5b3da6d894 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -90,6 +90,48 @@ static int wait_for_submit(struct intel_engine_cs *engine,
>  	return -ETIME;
>  }
>  
> +static int wait_for_reset(struct intel_engine_cs *engine,
> +			  struct i915_request *rq,
> +			  unsigned long timeout)
> +{
> +	timeout += jiffies;
> +	do {
> +		cond_resched();
> +		intel_engine_flush_submission(engine);
> +
> +		if (READ_ONCE(engine->execlists.pending[0]))
> +			continue;
> +
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		if (READ_ONCE(rq->fence.error))
> +			break;
> +	} while (time_before(jiffies, timeout));
> +
> +	flush_scheduled_work();
> +
> +	if (rq->fence.error != -EIO) {
> +		pr_err("%s: hanging request %llx:%lld not reset\n",
> +		       engine->name,
> +		       rq->fence.context,
> +		       rq->fence.seqno);
> +		return -EINVAL;
> +	}
> +
> +	/* Give the request a jiffie to complete after flushing the worker */
> +	if (i915_request_wait(rq, 0,
> +			      max(0l, (long)(timeout - jiffies)) + 1) < 0) {
> +		pr_err("%s: hanging request %llx:%lld did not complete\n",
> +		       engine->name,
> +		       rq->fence.context,
> +		       rq->fence.seqno);
> +		return -ETIME;
> +	}
> +
> +	return 0;
> +}
> +
>  static int live_sanitycheck(void *arg)
>  {
>  	struct intel_gt *gt = arg;
> @@ -1805,14 +1847,9 @@ static int __cancel_active0(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> -		err = -EIO;
> -		goto out;
> -	}
> -
> -	if (rq->fence.error != -EIO) {
> -		pr_err("Cancelled inflight0 request did not report -EIO\n");
> -		err = -EINVAL;
> +	err = wait_for_reset(arg->engine, rq, HZ / 2);
> +	if (err) {
> +		pr_err("Cancelled inflight0 request did not reset\n");
>  		goto out;
>  	}
>  
> @@ -1870,10 +1907,9 @@ static int __cancel_active1(struct live_preempt_cancel *arg)
>  		goto out;
>  
>  	igt_spinner_end(&arg->a.spin);
> -	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
> -		err = -EIO;
> +	err = wait_for_reset(arg->engine, rq[1], HZ / 2);
> +	if (err)
>  		goto out;
> -	}
>  
>  	if (rq[0]->fence.error != 0) {
>  		pr_err("Normal inflight0 request did not complete\n");
> @@ -1953,10 +1989,9 @@ static int __cancel_queued(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq[2], 0, HZ / 5) < 0) {
> -		err = -EIO;
> +	err = wait_for_reset(arg->engine, rq[2], HZ / 2);
> +	if (err)
>  		goto out;
> -	}
>  
>  	if (rq[0]->fence.error != -EIO) {
>  		pr_err("Cancelled inflight0 request did not report -EIO\n");
> @@ -2014,14 +2049,9 @@ static int __cancel_hostile(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> -		err = -EIO;
> -		goto out;
> -	}
> -
> -	if (rq->fence.error != -EIO) {
> -		pr_err("Cancelled inflight0 request did not report -EIO\n");
> -		err = -EINVAL;
> +	err = wait_for_reset(arg->engine, rq, HZ / 2);
> +	if (err) {
> +		pr_err("Cancelled inflight0 request did not reset\n");
>  		goto out;
>  	}
>  
> -- 
> 2.25.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-02-28 15:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28  8:23 [Intel-gfx] [PATCH 01/24] drm/i915/gt: Check engine-is-awake on reset later Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 02/24] drm/i915: Skip barriers inside waits Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 03/24] drm/i915/perf: Mark up the racy use of perf->exclusive_stream Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 04/24] drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 05/24] drm/i915/perf: Reintroduce wait on OA configuration completion Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 06/24] drm/i915: Wrap i915_active in a simple kreffed struct Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 07/24] drm/i915: Extend i915_request_await_active to use all timelines Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 08/24] drm/i915/perf: Schedule oa_config after modifying the contexts Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 09/24] drm/i915/gem: Consolidate ctx->engines[] release Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 10/24] drm/i915/gt: Prevent allocation on a banned context Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 11/24] drm/i915/gem: Check that the context wasn't closed during setup Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 12/24] drm/i915/selftests: Disable heartbeat around manual pulse tests Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 13/24] drm/i915/gt: Reset queue_priority_hint after wedging Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 14/24] drm/i915/gt: Pull marking vm as closed underneath the vm->mutex Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 15/24] drm/i915: Protect i915_request_await_start from early waits Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 16/24] drm/i915/selftests: Verify LRC isolation Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 17/24] drm/i915/selftests: Check recovery from corrupted LRC Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 18/24] drm/i915/selftests: Wait for the kernel context switch Chris Wilson
2020-02-28 15:09   ` Mika Kuoppala
2020-02-28 15:14     ` Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 19/24] drm/i915/selftests: Be a little more lenient for reset workers Chris Wilson
2020-02-28 15:38   ` Mika Kuoppala [this message]
2020-02-28  8:23 ` [Intel-gfx] [PATCH 20/24] drm/i915/selftests: Add request throughput measurement to perf Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 21/24] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 22/24] drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 23/24] drm/i915/execlists: Check the sentinel is alone in the ELSP Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 24/24] drm/i915/execlists: Reduce preempt-to-busy roundtrip delay Chris Wilson
2020-02-28  8:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/24] drm/i915/gt: Check engine-is-awake on reset later Patchwork
2020-02-28  8:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wo867md9.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox