All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed
Date: Mon, 11 Nov 2019 12:54:14 +0200	[thread overview]
Message-ID: <878som65ux.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20191110185806.17413-7-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we detect a hang in a closed context, just flush all of its requests
> and cancel any remaining execution along the context. Note that after
> closing the context, the last reference to the context may be dropped,
> leaving it only valid under RCU.

Sound good. But is there a window for userspace to start
to see -EIO if it resubmits to a closed context?

In other words, after userspace doing gem_ctx_destroy(ctx_handle),
we would return -EINVAL due to ctx_handle being stale
earlier than we check for banned status and return -EIO?

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index f03e000051c1..a6b0d00c3a51 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -81,6 +81,11 @@ static bool context_mark_guilty(struct i915_gem_context *ctx)
>  	bool banned;
>  	int i;
>  
> +	if (i915_gem_context_is_closed(ctx)) {
> +		i915_gem_context_set_banned(ctx);
> +		return true;
> +	}
> +
>  	atomic_inc(&ctx->guilty_count);
>  
>  	/* Cool contexts are too cool to be banned! (Used for reset testing.) */
> @@ -124,6 +129,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>  
>  	GEM_BUG_ON(i915_request_completed(rq));
>  
> +	rcu_read_lock(); /* protect the GEM context */
>  	if (guilty) {
>  		i915_request_skip(rq, -EIO);
>  		if (context_mark_guilty(rq->gem_context))
> @@ -132,6 +138,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>  		dma_fence_set_error(&rq->fence, -EAGAIN);
>  		context_mark_innocent(rq->gem_context);
>  	}
> +	rcu_read_unlock();
>  }
>  
>  static bool i915_in_reset(struct pci_dev *pdev)
> -- 
> 2.24.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed
Date: Mon, 11 Nov 2019 12:54:14 +0200	[thread overview]
Message-ID: <878som65ux.fsf@gaia.fi.intel.com> (raw)
Message-ID: <20191111105414.HXdy1tIt-UE8aMZRHhIv_LOcslWiS4mRuaw4vR22seE@z> (raw)
In-Reply-To: <20191110185806.17413-7-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we detect a hang in a closed context, just flush all of its requests
> and cancel any remaining execution along the context. Note that after
> closing the context, the last reference to the context may be dropped,
> leaving it only valid under RCU.

Sound good. But is there a window for userspace to start
to see -EIO if it resubmits to a closed context?

In other words, after userspace doing gem_ctx_destroy(ctx_handle),
we would return -EINVAL due to ctx_handle being stale
earlier than we check for banned status and return -EIO?

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index f03e000051c1..a6b0d00c3a51 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -81,6 +81,11 @@ static bool context_mark_guilty(struct i915_gem_context *ctx)
>  	bool banned;
>  	int i;
>  
> +	if (i915_gem_context_is_closed(ctx)) {
> +		i915_gem_context_set_banned(ctx);
> +		return true;
> +	}
> +
>  	atomic_inc(&ctx->guilty_count);
>  
>  	/* Cool contexts are too cool to be banned! (Used for reset testing.) */
> @@ -124,6 +129,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>  
>  	GEM_BUG_ON(i915_request_completed(rq));
>  
> +	rcu_read_lock(); /* protect the GEM context */
>  	if (guilty) {
>  		i915_request_skip(rq, -EIO);
>  		if (context_mark_guilty(rq->gem_context))
> @@ -132,6 +138,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>  		dma_fence_set_error(&rq->fence, -EAGAIN);
>  		context_mark_innocent(rq->gem_context);
>  	}
> +	rcu_read_unlock();
>  }
>  
>  static bool i915_in_reset(struct pci_dev *pdev)
> -- 
> 2.24.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-11-11 10:54 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-10 18:57 [PATCH 01/25] drm/i915: Protect context while grabbing its name for the request Chris Wilson
2019-11-10 18:57 ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 02/25] drm/i915/gem: Embed context/timeline name inside the GEM context Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 11:20   ` Mika Kuoppala
2019-11-11 11:20     ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 03/25] drm/i915/gem: Update context name on closing Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 10:47   ` Mika Kuoppala
2019-11-11 10:47     ` [Intel-gfx] " Mika Kuoppala
2019-11-11 10:58     ` Chris Wilson
2019-11-11 10:58       ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 04/25] drm/i915/execlists: Move reset_active() from schedule-out to schedule-in Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 05/25] drm/i915/pmu: "Frequency" is reported as accumulated cycles Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 06/25] drm/i915/userptr: Handle unlocked gup retries Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 10:54   ` Mika Kuoppala [this message]
2019-11-11 10:54     ` Mika Kuoppala
2019-11-11 11:04     ` Chris Wilson
2019-11-11 11:04       ` [Intel-gfx] " Chris Wilson
2019-11-11 11:25       ` Mika Kuoppala
2019-11-11 11:25         ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 08/25] drm/i915: Show guilty context name on GPU reset Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 11:26   ` Mika Kuoppala
2019-11-11 11:26     ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 09/25] drm/i915/icl: Refine PG_HYSTERESIS Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 10:59   ` Mika Kuoppala
2019-11-11 10:59     ` [Intel-gfx] " Mika Kuoppala
2019-11-10 18:57 ` [PATCH 10/25] drm/i915/execlists: Reduce barrier on context switch to a wmb() Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 13:19   ` Joonas Lahtinen
2019-11-11 13:19     ` [Intel-gfx] " Joonas Lahtinen
2019-11-10 18:57 ` [PATCH 11/25] drm/i915/gem: Silence sparse for RCU protection inside the constructor Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 12/25] drm/i915/userptr: Try to acquire the page lock around set_page_dirty() Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 13/25] drm/i915: Taint the kernel on dumping the GEM ftrace buffer Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 12:44   ` Joonas Lahtinen
2019-11-11 12:44     ` [Intel-gfx] " Joonas Lahtinen
2019-11-10 18:57 ` [PATCH 14/25] drm/i915/selftests: Exercise parallel blit operations on a single ctx Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 12:10   ` Matthew Auld
2019-11-11 12:10     ` [Intel-gfx] " Matthew Auld
2019-11-10 18:57 ` [PATCH 15/25] drm/i915/selftests: Perform some basic cycle counting of MI ops Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 17:10   ` Matthew Auld
2019-11-11 17:10     ` [Intel-gfx] " Matthew Auld
2019-11-11 17:16     ` Chris Wilson
2019-11-11 17:16       ` [Intel-gfx] " Chris Wilson
2019-11-11 17:20       ` Matthew Auld
2019-11-11 17:20         ` [Intel-gfx] " Matthew Auld
2019-11-10 18:57 ` [PATCH 16/25] drm/i915/selftests: Mock the engine sorting for easy validation Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 17/25] drm/i915/selftests: Fill all the drm_vma_manager holes Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-11 12:01   ` Matthew Auld
2019-11-11 12:01     ` [Intel-gfx] " Matthew Auld
2019-11-11 12:09     ` Chris Wilson
2019-11-11 12:09       ` [Intel-gfx] " Chris Wilson
2019-11-10 18:57 ` [PATCH 18/25] Revert "drm/i915: use a separate context for gpu relocs" Chris Wilson
2019-11-10 18:57   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 19/25] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 20/25] drm/i915: Drop GEM context as a direct link from i915_request Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 21/25] drm/i915: Push the use-semaphore marker onto the intel_context Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 22/25] drm/i915: Remove i915->kernel_context Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 23/25] drm/i915: Move i915_gem_init_contexts() earlier Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 24/25] drm/i915/uc: Use an internal buffer for firmware images Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 18:58 ` [PATCH 25/25] drm/i915/gt: Pull GT initialisation under intel_gt_init() Chris Wilson
2019-11-10 18:58   ` [Intel-gfx] " Chris Wilson
2019-11-10 19:20 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/25] drm/i915: Protect context while grabbing its name for the request Patchwork
2019-11-10 19:20   ` [Intel-gfx] " Patchwork
2019-11-10 19:40 ` ✓ Fi.CI.BAT: success " Patchwork
2019-11-10 19:40   ` [Intel-gfx] " Patchwork
2019-11-11 15:34 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-11-11 15:34   ` [Intel-gfx] " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878som65ux.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.