public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/i915/selftests: Exercise resetting during non-user payloads
Date: Fri, 15 Feb 2019 17:18:08 +0200	[thread overview]
Message-ID: <878syhytan.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20190215102732.15520-1-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> In selftests/live_hangcheck, we have a lot of tests for resetting simple
> spinners, but nothing quite prepared us for how the GPU reacted to
> triggering a reset outside of the safe spinner. These two subtests fill
> the ring with plain old empty, non-spinning requests, and then triggers
> a reset. Without a user-payload to blame, these requests will exercise
> the 'non-started' paths and mostly be replayed verbatim.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  | 217 ++++++++++++++++++
>  1 file changed, 217 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 92475596ff40..f75cb56ff8ad 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -415,6 +415,221 @@ static bool wait_for_idle(struct intel_engine_cs *engine)
>  	return wait_for(intel_engine_is_idle(engine), IGT_IDLE_TIMEOUT) == 0;
>  }
>  
> +static int igt_reset_nop(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context *ctx;
> +	unsigned int reset_count, count;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	struct drm_file *file;
> +	IGT_TIMEOUT(end_time);
> +	int err = 0;
> +
> +	/* Check that we can reset during non-user portions of requests */
> +
> +	file = mock_file(i915);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	ctx = live_context(i915, file);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	if (IS_ERR(ctx)) {
> +		err = PTR_ERR(ctx);
> +		goto out;
> +	}
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	reset_count = i915_reset_count(&i915->gpu_error);
> +	count = 0;
> +	do {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		for_each_engine(engine, i915, id) {
> +			int i;
> +
> +			for (i = 0; i < 16; i++) {
> +				struct i915_request *rq;
> +
> +				rq = i915_request_alloc(engine, ctx);
> +				if (IS_ERR(rq)) {
> +					err = PTR_ERR(rq);
> +					break;
> +				}
> +
> +				i915_request_add(rq);
> +			}
> +		}
> +		mutex_unlock(&i915->drm.struct_mutex);
> +
> +		igt_global_reset_lock(i915);
> +		i915_reset(i915, ALL_ENGINES, NULL);
> +		igt_global_reset_unlock(i915);
> +		if (i915_terminally_wedged(&i915->gpu_error)) {
> +			err = -EIO;
> +			break;
> +		}
> +
> +		if (i915_reset_count(&i915->gpu_error) !=
> +		    reset_count + ++count) {
> +			pr_err("Full GPU reset not recorded!\n");
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (!i915_reset_flush(i915)) {
> +			struct drm_printer p =
> +				drm_info_printer(i915->drm.dev);
> +
> +			pr_err("%s failed to idle after reset\n",
> +			       engine->name);
> +			intel_engine_dump(engine, &p,
> +					  "%s\n", engine->name);
> +
> +			err = -EIO;
> +			break;
> +		}
> +
> +		err = igt_flush_test(i915, 0);
> +		if (err)
> +			break;
> +	} while (time_before(jiffies, end_time));
> +	pr_info("%s: %d resets\n", __func__, count);
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	err = igt_flush_test(i915, I915_WAIT_LOCKED);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	intel_runtime_pm_put(i915, wakeref);
> +
> +out:
> +	mock_file_free(i915, file);
> +	if (i915_terminally_wedged(&i915->gpu_error))
> +		err = -EIO;
> +	return err;
> +}
> +
> +static int igt_reset_nop_engine(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context *ctx;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	struct drm_file *file;
> +	int err = 0;

err = 0 for the gcc as it seems unnecessary. Regardless
I didn't spot anything out of place in this patch. 

The amount of 16 requests feels low as a number but
we should have plenty of time to hit the button during.

Nice addition to reset tests. And as you mentioned
we should have plenty of resets going in on idle engines
on other tests. Not that someone would object one explicit
test into this file :)

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> +
> +	/* Check that we can engine-reset during non-user portions */
> +
> +	if (!intel_has_reset_engine(i915))
> +		return 0;
> +
> +	file = mock_file(i915);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	ctx = live_context(i915, file);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	if (IS_ERR(ctx)) {
> +		err = PTR_ERR(ctx);
> +		goto out;
> +	}
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +	for_each_engine(engine, i915, id) {
> +		unsigned int reset_count, reset_engine_count;
> +		unsigned int count;
> +		IGT_TIMEOUT(end_time);
> +
> +		reset_count = i915_reset_count(&i915->gpu_error);
> +		reset_engine_count = i915_reset_engine_count(&i915->gpu_error,
> +							     engine);
> +		count = 0;
> +
> +		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
> +		do {
> +			int i;
> +
> +			if (!wait_for_idle(engine)) {
> +				pr_err("%s failed to idle before reset\n",
> +				       engine->name);
> +				err = -EIO;
> +				break;
> +			}
> +
> +			mutex_lock(&i915->drm.struct_mutex);
> +			for (i = 0; i < 16; i++) {
> +				struct i915_request *rq;
> +
> +				rq = i915_request_alloc(engine, ctx);
> +				if (IS_ERR(rq)) {
> +					err = PTR_ERR(rq);
> +					break;
> +				}
> +
> +				i915_request_add(rq);
> +			}
> +			mutex_unlock(&i915->drm.struct_mutex);
> +
> +			err = i915_reset_engine(engine, NULL);
> +			if (err) {
> +				pr_err("i915_reset_engine failed\n");
> +				break;
> +			}
> +
> +			if (i915_reset_count(&i915->gpu_error) != reset_count) {
> +				pr_err("Full GPU reset recorded! (engine reset expected)\n");
> +				err = -EINVAL;
> +				break;
> +			}
> +
> +			if (i915_reset_engine_count(&i915->gpu_error, engine) !=
> +			    reset_engine_count + ++count) {
> +				pr_err("%s engine reset not recorded!\n",
> +				       engine->name);
> +				err = -EINVAL;
> +				break;
> +			}
> +
> +			if (!i915_reset_flush(i915)) {
> +				struct drm_printer p =
> +					drm_info_printer(i915->drm.dev);
> +
> +				pr_err("%s failed to idle after reset\n",
> +				       engine->name);
> +				intel_engine_dump(engine, &p,
> +						  "%s\n", engine->name);
> +
> +				err = -EIO;
> +				break;
> +			}
> +		} while (time_before(jiffies, end_time));
> +		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
> +		pr_info("%s(%s): %d resets\n", __func__, engine->name, count);
> +
> +		if (err)
> +			break;
> +
> +		err = igt_flush_test(i915, 0);
> +		if (err)
> +			break;
> +	}
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	err = igt_flush_test(i915, I915_WAIT_LOCKED);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	intel_runtime_pm_put(i915, wakeref);
> +out:
> +	mock_file_free(i915, file);
> +	if (i915_terminally_wedged(&i915->gpu_error))
> +		err = -EIO;
> +	return err;
> +}
> +
>  static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>  {
>  	struct intel_engine_cs *engine;
> @@ -1647,6 +1862,8 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(igt_global_reset), /* attempt to recover GPU first */
>  		SUBTEST(igt_wedged_reset),
>  		SUBTEST(igt_hang_sanitycheck),
> +		SUBTEST(igt_reset_nop),
> +		SUBTEST(igt_reset_nop_engine),
>  		SUBTEST(igt_reset_idle_engine),
>  		SUBTEST(igt_reset_active_engine),
>  		SUBTEST(igt_reset_engines),
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2019-02-15 15:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 10:27 [PATCH 1/2] drm/i915/selftests: Exercise resetting during non-user payloads Chris Wilson
2019-02-15 10:27 ` [PATCH 2/2] drm/i915/selftests: Drop unnecessary struct_mutex around i915_reset() Chris Wilson
2019-02-15 15:21   ` Mika Kuoppala
2019-02-15 15:28     ` Chris Wilson
2019-02-15 15:42       ` Mika Kuoppala
2019-02-15 15:18 ` Mika Kuoppala [this message]
2019-02-15 15:33   ` [PATCH 1/2] drm/i915/selftests: Exercise resetting during non-user payloads Chris Wilson
2019-02-15 17:35 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] " Patchwork
2019-02-15 18:11 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-02-15 18:41   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878syhytan.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox