Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events
Date: Fri, 10 Jul 2020 13:30:09 +0100	[thread overview]
Message-ID: <5b0e8dfd-43d2-a491-9134-e3b0e1a0ac5b@linux.intel.com> (raw)
In-Reply-To: <20200710121609.6775-1-chris@chris-wilson.co.uk>


On 10/07/2020 13:16, Chris Wilson wrote:
> If the HW throws a curve ball and reports either en event before it is
> possible, or just a completely impossible event, we have to grin and
> bear it. The first few events, we will likely not notice as we would be
> expecting some event, but as soon as we stop expecting an event and yet
> they still keep coming, then we enter into undefined state territory.
> In which case, bail out, stop processing the events, and reset the
> engine and our set of queued requests to recover.
> 
> The sporadic hangs and warnings will continue to plague CI, but at least
> system stability should not be compromised.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index fbcfeaed6441..c86324d2d2bb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2567,6 +2567,7 @@ static void process_csb(struct intel_engine_cs *engine)
>   	tail = READ_ONCE(*execlists->csb_write);
>   	if (unlikely(head == tail))
>   		return;
> +	execlists->csb_head = tail;

This deserves a comment...

>   
>   	/*
>   	 * Hopefully paired with a wmb() in HW!
> @@ -2613,6 +2614,9 @@ static void process_csb(struct intel_engine_cs *engine)
>   		if (promote) {
>   			struct i915_request * const *old = execlists->active;
>   
> +			if (GEM_WARN_ON(!*execlists->pending))
> +				break;
> +

... but why not continue? You think nothing good can come out of trying 
further and break simply expedites the hang? We have to be confident we 
can cope with any random i915 state caused by skipping maybe valid entries.

Conclusion will define what kind of comment to put above. "Assume we 
always consume all CSB entries, or things are really bad and we mark all 
as invalid upon finding first bad entry"?

Regards,

Tvrtko

>   			ring_set_paused(engine, 0);
>   
>   			/* Point active to the new ELSP; prevent overwriting */
> @@ -2635,7 +2639,8 @@ static void process_csb(struct intel_engine_cs *engine)
>   
>   			WRITE_ONCE(execlists->pending[0], NULL);
>   		} else {
> -			GEM_BUG_ON(!*execlists->active);
> +			if (GEM_WARN_ON(!*execlists->active))
> +				break;
>   
>   			/* port0 completed, advanced to port1 */
>   			trace_ports(execlists, "completed", execlists->active);
> @@ -2686,7 +2691,6 @@ static void process_csb(struct intel_engine_cs *engine)
>   		}
>   	} while (head != tail);
>   
> -	execlists->csb_head = head;
>   	set_timeslice(engine);
>   
>   	/*
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-07-10 12:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10 12:07 [Intel-gfx] [PATCH] drm/i915/gt: Be defensive in the face of false CS events Chris Wilson
2020-07-10 12:15 ` Chris Wilson
2020-07-10 12:16 ` Chris Wilson
2020-07-10 12:30   ` Tvrtko Ursulin [this message]
2020-07-10 12:35     ` Chris Wilson
2020-07-10 12:49       ` Tvrtko Ursulin
2020-07-10 17:23   ` Ruhl, Michael J
2020-07-10 13:05 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-07-10 13:14   ` Tvrtko Ursulin
2020-07-10 13:27     ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-07-10 13:31     ` Chris Wilson
2020-07-10 13:43       ` Tvrtko Ursulin
2020-07-10 14:00 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Be defensive in the face of false CS events (rev5) Patchwork
2020-07-10 16:29 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b0e8dfd-43d2-a491-9134-e3b0e1a0ac5b@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox