All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 07/15] drm/i915: Protect the request->global_seqno with the engine->timeline lock
Date: Wed, 22 Feb 2017 12:29:00 +0000	[thread overview]
Message-ID: <20f64a9b-9a35-3619-e19f-cb9b4e7c4b99@linux.intel.com> (raw)
In-Reply-To: <20170222114610.5819-8-chris@chris-wilson.co.uk>


On 22/02/2017 11:46, Chris Wilson wrote:
> A request is assigned a global seqno only when it is on the hardware
> execution queue. The global seqno can be used to maintain a list of
> requests on the same engine in retirement order, for example for
> constructing a priority queue for waiting. Prior to its execution, or
> if it is subsequently removed in the event of preemption, its global
> seqno is zero. As both insertion and removal from the execution queue
> may operate in IRQ context, it is not guarded by the usual struct_mutex
> BKL. Instead those relying on the global seqno must be prepared for its
> value to change between reads. Only when the request is complete can
> the global seqno be stable (due to the memory barriers on submitting
> the commands to the hardware to write the breadcrumb, if the HWS shows
> that it has passed the global seqno and the global seqno is unchanged
> after the read, it is indeed complete).

Missed some questions I've raised on this one in the previous round.

I never got round re-reading it if you were waiting for that by any chance.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h          | 16 ++++++--
>  drivers/gpu/drm/i915/i915_gem.c          | 16 +++++---
>  drivers/gpu/drm/i915/i915_gem_request.c  | 46 ++++++++++++++--------
>  drivers/gpu/drm/i915/i915_gem_request.h  | 66 +++++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 11 ++++--
>  5 files changed, 114 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 994929660027..3e4feeaeef60 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -4022,14 +4022,24 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
>  }
>
>  static inline bool
> -__i915_request_irq_complete(struct drm_i915_gem_request *req)
> +__i915_request_irq_complete(const struct drm_i915_gem_request *req)
>  {
>  	struct intel_engine_cs *engine = req->engine;
> +	u32 seqno = i915_gem_request_global_seqno(req);
> +
> +	/* The request was dequeued before we were awoken. We check after
> +	 * inspecting the hw to confirm that this was the same request
> +	 * that generated the HWS update. The memory barriers within
> +	 * the request execution are sufficient to ensure that a check
> +	 * after reading the value from hw matches this request.
> +	 */
> +	if (!seqno)
> +		return false;
>
>  	/* Before we do the heavier coherent read of the seqno,
>  	 * check the value (hopefully) in the CPU cacheline.
>  	 */
> -	if (__i915_gem_request_completed(req))
> +	if (__i915_gem_request_completed(req, seqno))
>  		return true;
>
>  	/* Ensure our read of the seqno is coherent so that we
> @@ -4080,7 +4090,7 @@ __i915_request_irq_complete(struct drm_i915_gem_request *req)
>  			wake_up_process(tsk);
>  		rcu_read_unlock();
>
> -		if (__i915_gem_request_completed(req))
> +		if (__i915_gem_request_completed(req, seqno))
>  			return true;
>  	}
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index fce2cec8e665..f950cedb6516 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -397,7 +397,7 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>  	if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
>  		i915_gem_request_retire_upto(rq);
>
> -	if (rps && rq->global_seqno == intel_engine_last_submit(rq->engine)) {
> +	if (rps && i915_gem_request_global_seqno(rq) == intel_engine_last_submit(rq->engine)) {
>  		/* The GPU is now idle and this client has stalled.
>  		 * Since no other client has submitted a request in the
>  		 * meantime, assume that this client is the only one
> @@ -2609,7 +2609,8 @@ static void i915_gem_context_mark_innocent(struct i915_gem_context *ctx)
>  struct drm_i915_gem_request *
>  i915_gem_find_active_request(struct intel_engine_cs *engine)
>  {
> -	struct drm_i915_gem_request *request;
> +	struct drm_i915_gem_request *request, *active = NULL;
> +	unsigned long flags;
>
>  	/* We are called by the error capture and reset at a random
>  	 * point in time. In particular, note that neither is crucially
> @@ -2619,17 +2620,22 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
>  	 * extra delay for a recent interrupt is pointless. Hence, we do
>  	 * not need an engine->irq_seqno_barrier() before the seqno reads.
>  	 */
> +	spin_lock_irqsave(&engine->timeline->lock, flags);
>  	list_for_each_entry(request, &engine->timeline->requests, link) {
> -		if (__i915_gem_request_completed(request))
> +		if (__i915_gem_request_completed(request,
> +						 request->global_seqno))
>  			continue;
>
>  		GEM_BUG_ON(request->engine != engine);
>  		GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>  				    &request->fence.flags));
> -		return request;
> +
> +		active = request;
> +		break;
>  	}
> +	spin_unlock_irqrestore(&engine->timeline->lock, flags);
>
> -	return NULL;
> +	return active;
>  }
>
>  static bool engine_stalled(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 477e8fc125ce..d18f450977e0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -418,7 +418,6 @@ void __i915_gem_request_submit(struct drm_i915_gem_request *request)
>  		intel_engine_enable_signaling(request);
>  	spin_unlock(&request->lock);
>
> -	GEM_BUG_ON(!request->global_seqno);
>  	engine->emit_breadcrumb(request,
>  				request->ring->vaddr + request->postfix);
>
> @@ -505,7 +504,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>  	/* Move the oldest request to the slab-cache (if not in use!) */
>  	req = list_first_entry_or_null(&engine->timeline->requests,
>  				       typeof(*req), link);
> -	if (req && __i915_gem_request_completed(req))
> +	if (req && i915_gem_request_completed(req))
>  		i915_gem_request_retire(req);
>
>  	/* Beware: Dragons be flying overhead.
> @@ -611,6 +610,7 @@ static int
>  i915_gem_request_await_request(struct drm_i915_gem_request *to,
>  			       struct drm_i915_gem_request *from)
>  {
> +	u32 seqno;
>  	int ret;
>
>  	GEM_BUG_ON(to == from);
> @@ -633,14 +633,15 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
>  		return ret < 0 ? ret : 0;
>  	}
>
> -	if (!from->global_seqno) {
> +	seqno = i915_gem_request_global_seqno(from);
> +	if (!seqno) {
>  		ret = i915_sw_fence_await_dma_fence(&to->submit,
>  						    &from->fence, 0,
>  						    GFP_KERNEL);
>  		return ret < 0 ? ret : 0;
>  	}
>
> -	if (from->global_seqno <= to->timeline->sync_seqno[from->engine->id])
> +	if (seqno <= to->timeline->sync_seqno[from->engine->id])
>  		return 0;
>
>  	trace_i915_gem_ring_sync_to(to, from);
> @@ -658,7 +659,7 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
>  			return ret;
>  	}
>
> -	to->timeline->sync_seqno[from->engine->id] = from->global_seqno;
> +	to->timeline->sync_seqno[from->engine->id] = seqno;
>  	return 0;
>  }
>
> @@ -939,7 +940,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>  }
>
>  bool __i915_spin_request(const struct drm_i915_gem_request *req,
> -			 int state, unsigned long timeout_us)
> +			 u32 seqno, int state, unsigned long timeout_us)
>  {
>  	struct intel_engine_cs *engine = req->engine;
>  	unsigned int irq, cpu;
> @@ -957,7 +958,11 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
>  	irq = atomic_read(&engine->irq_count);
>  	timeout_us += local_clock_us(&cpu);
>  	do {
> -		if (__i915_gem_request_completed(req))
> +		if (seqno != i915_gem_request_global_seqno(req))
> +			break;
> +
> +		if (i915_seqno_passed(intel_engine_get_seqno(req->engine),
> +				      seqno))
>  			return true;
>
>  		/* Seqno are meant to be ordered *before* the interrupt. If
> @@ -1029,11 +1034,15 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  	if (flags & I915_WAIT_LOCKED)
>  		add_wait_queue(errq, &reset);
>
> +	intel_wait_init(&wait, i915_gem_request_global_seqno(req));
> +
>  	reset_wait_queue(&req->execute, &exec);
> -	if (!req->global_seqno) {
> +	if (!wait.seqno) {
>  		do {
>  			set_current_state(state);
> -			if (req->global_seqno)
> +
> +			wait.seqno = i915_gem_request_global_seqno(req);
> +			if (wait.seqno)
>  				break;
>
>  			if (flags & I915_WAIT_LOCKED &&
> @@ -1061,7 +1070,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  		if (timeout < 0)
>  			goto complete;
>
> -		GEM_BUG_ON(!req->global_seqno);
> +		GEM_BUG_ON(!wait.seqno);
>  	}
>  	GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
>
> @@ -1070,7 +1079,6 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  		goto complete;
>
>  	set_current_state(state);
> -	intel_wait_init(&wait, req->global_seqno);
>  	if (intel_engine_add_wait(req->engine, &wait))
>  		/* In order to check that we haven't missed the interrupt
>  		 * as we enabled it, we need to kick ourselves to do a
> @@ -1091,7 +1099,8 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>
>  		timeout = io_schedule_timeout(timeout);
>
> -		if (intel_wait_complete(&wait))
> +		if (intel_wait_complete(&wait) &&
> +		    i915_gem_request_global_seqno(req) == wait.seqno)
>  			break;
>
>  		set_current_state(state);
> @@ -1142,14 +1151,21 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  static void engine_retire_requests(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_gem_request *request, *next;
> +	u32 seqno = intel_engine_get_seqno(engine);
> +	LIST_HEAD(retire);
>
> +	spin_lock_irq(&engine->timeline->lock);
>  	list_for_each_entry_safe(request, next,
>  				 &engine->timeline->requests, link) {
> -		if (!__i915_gem_request_completed(request))
> -			return;
> +		if (!i915_seqno_passed(seqno, request->global_seqno))
> +			break;
>
> -		i915_gem_request_retire(request);
> +		list_move_tail(&request->link, &retire);
>  	}
> +	spin_unlock_irq(&engine->timeline->lock);
> +
> +	list_for_each_entry_safe(request, next, &retire, link)
> +		i915_gem_request_retire(request);
>  }
>
>  void i915_gem_retire_requests(struct drm_i915_private *dev_priv)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 467d3e13fce0..b81f6709905c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -135,6 +135,11 @@ struct drm_i915_gem_request {
>  	struct i915_priotree priotree;
>  	struct i915_dependency dep;
>
> +	/** GEM sequence number associated with this request on the
> +	 * global execution timeline. It is zero when the request is not
> +	 * on the HW queue (i.e. not on the engine timeline list).
> +	 * Its value is guarded by the timeline spinlock.
> +	 */
>  	u32 global_seqno;
>
>  	/** Position in the ring of the start of the request */
> @@ -229,6 +234,30 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>  	*pdst = src;
>  }
>
> +/**
> + * i915_gem_request_global_seqno - report the current global seqno
> + * @request - the request
> + *
> + * A request is assigned a global seqno only when it is on the hardware
> + * execution queue. The global seqno can be used to maintain a list of
> + * requests on the same engine in retirement order, for example for
> + * constructing a priority queue for waiting. Prior to its execution, or
> + * if it is subsequently removed in the event of preemption, its global
> + * seqno is zero. As both insertion and removal from the execution queue
> + * may operate in IRQ context, it is not guarded by the usual struct_mutex
> + * BKL. Instead those relying on the global seqno must be prepared for its
> + * value to change between reads. Only when the request is complete can
> + * the global seqno be stable (due to the memory barriers on submitting
> + * the commands to the hardware to write the breadcrumb, if the HWS shows
> + * that it has passed the global seqno and the global seqno is unchanged
> + * after the read, it is indeed complete).
> + */
> +static u32
> +i915_gem_request_global_seqno(const struct drm_i915_gem_request *request)
> +{
> +	return READ_ONCE(request->global_seqno);
> +}
> +
>  int
>  i915_gem_request_await_object(struct drm_i915_gem_request *to,
>  			      struct drm_i915_gem_object *obj,
> @@ -269,46 +298,55 @@ static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
>  }
>
>  static inline bool
> -__i915_gem_request_started(const struct drm_i915_gem_request *req)
> +__i915_gem_request_started(const struct drm_i915_gem_request *req, u32 seqno)
>  {
> -	GEM_BUG_ON(!req->global_seqno);
> +	GEM_BUG_ON(!seqno);
>  	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
> -				 req->global_seqno - 1);
> +				 seqno - 1);
>  }
>
>  static inline bool
>  i915_gem_request_started(const struct drm_i915_gem_request *req)
>  {
> -	if (!req->global_seqno)
> +	u32 seqno;
> +
> +	seqno = i915_gem_request_global_seqno(req);
> +	if (!seqno)
>  		return false;
>
> -	return __i915_gem_request_started(req);
> +	return __i915_gem_request_started(req, seqno);
>  }
>
>  static inline bool
> -__i915_gem_request_completed(const struct drm_i915_gem_request *req)
> +__i915_gem_request_completed(const struct drm_i915_gem_request *req, u32 seqno)
>  {
> -	GEM_BUG_ON(!req->global_seqno);
> -	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
> -				 req->global_seqno);
> +	GEM_BUG_ON(!seqno);
> +	return i915_seqno_passed(intel_engine_get_seqno(req->engine), seqno) &&
> +		seqno == i915_gem_request_global_seqno(req);
>  }
>
>  static inline bool
>  i915_gem_request_completed(const struct drm_i915_gem_request *req)
>  {
> -	if (!req->global_seqno)
> +	u32 seqno;
> +
> +	seqno = i915_gem_request_global_seqno(req);
> +	if (!seqno)
>  		return false;
>
> -	return __i915_gem_request_completed(req);
> +	return __i915_gem_request_completed(req, seqno);
>  }
>
>  bool __i915_spin_request(const struct drm_i915_gem_request *request,
> -			 int state, unsigned long timeout_us);
> +			 u32 seqno, int state, unsigned long timeout_us);
>  static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
>  				     int state, unsigned long timeout_us)
>  {
> -	return (__i915_gem_request_started(request) &&
> -		__i915_spin_request(request, state, timeout_us));
> +	u32 seqno;
> +
> +	seqno = i915_gem_request_global_seqno(request);
> +	return (__i915_gem_request_started(request, seqno) &&
> +		__i915_spin_request(request, seqno, state, timeout_us));
>  }
>
>  /* We treat requests as fences. This is not be to confused with our
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 4f4e703d1b14..d5bf4c0b2deb 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -545,6 +545,7 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>  	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>  	struct rb_node *parent, **p;
>  	bool first, wakeup;
> +	u32 seqno;
>
>  	/* Note that we may be called from an interrupt handler on another
>  	 * device (e.g. nouveau signaling a fence completion causing us
> @@ -555,11 +556,13 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>
>  	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
>  	assert_spin_locked(&request->lock);
> -	if (!request->global_seqno)
> +
> +	seqno = i915_gem_request_global_seqno(request);
> +	if (!seqno)
>  		return;
>
>  	request->signaling.wait.tsk = b->signaler;
> -	request->signaling.wait.seqno = request->global_seqno;
> +	request->signaling.wait.seqno = seqno;
>  	i915_gem_request_get(request);
>
>  	spin_lock(&b->lock);
> @@ -583,8 +586,8 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>  	p = &b->signals.rb_node;
>  	while (*p) {
>  		parent = *p;
> -		if (i915_seqno_passed(request->global_seqno,
> -				      to_signaler(parent)->global_seqno)) {
> +		if (i915_seqno_passed(seqno,
> +				      to_signaler(parent)->signaling.wait.seqno)) {
>  			p = &parent->rb_right;
>  			first = false;
>  		} else {
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-02-22 12:29 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-22 11:45 Pre-emption pre-enablement Chris Wilson
2017-02-22 11:45 ` [PATCH v2 01/15] drm/i915: Keep a global seqno per-engine Chris Wilson
2017-02-22 12:24   ` Tvrtko Ursulin
2017-02-22 11:45 ` [PATCH v2 02/15] drm/i915: Move reeerve_seqno() next to unreserve_seqno() Chris Wilson
2017-02-22 12:23   ` Joonas Lahtinen
2017-02-22 11:45 ` [PATCH v2 03/15] drm/i915: Use a local to shorten req->i915->gpu_error.wait_queue Chris Wilson
2017-02-22 11:45 ` [PATCH v2 04/15] drm/i915: Add ourselves to the gpu error waitqueue for the entire wait Chris Wilson
2017-02-22 11:46 ` [PATCH v2 05/15] drm/i915: Inline __i915_gem_request_wait_for_execute() Chris Wilson
2017-02-22 11:46 ` [PATCH v2 06/15] drm/i915: Deconstruct execute fence Chris Wilson
2017-02-22 11:46 ` [PATCH v2 07/15] drm/i915: Protect the request->global_seqno with the engine->timeline lock Chris Wilson
2017-02-22 12:29   ` Tvrtko Ursulin [this message]
2017-02-22 12:45     ` Chris Wilson
2017-02-22 13:13       ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 08/15] drm/i915: Take a reference whilst processing the signaler request Chris Wilson
2017-02-22 12:35   ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 09/15] drm/i915: Allow an request to be cancelled Chris Wilson
2017-02-22 13:08   ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 10/15] drm/i915: Remove the preempted request from the execution queue Chris Wilson
2017-02-22 13:33   ` Tvrtko Ursulin
2017-02-22 13:40     ` Chris Wilson
2017-02-22 13:50       ` Tvrtko Ursulin
2017-02-22 18:53     ` [PATCH v3] " Chris Wilson
2017-02-22 11:46 ` [PATCH v2 11/15] drm/i915: Exercise request cancellation using a mock selftest Chris Wilson
2017-02-22 13:46   ` Tvrtko Ursulin
2017-02-22 13:59     ` Chris Wilson
2017-02-22 14:03       ` Chris Wilson
2017-02-22 14:05       ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 12/15] drm/i915: Replace reset_wait_queue with default_wake_function Chris Wilson
2017-02-22 14:13   ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 13/15] drm/i915: Refactor direct GPU reset from request waiters Chris Wilson
2017-02-22 14:16   ` Tvrtko Ursulin
2017-02-22 14:26     ` Chris Wilson
2017-02-22 15:07       ` Tvrtko Ursulin
2017-02-22 11:46 ` [PATCH v2 14/15] drm/i915: Immediately process a reset before starting waiting Chris Wilson
2017-02-22 11:46 ` [PATCH v2 15/15] drm/i915: Remove one level of indention from wait-for-execute Chris Wilson
2017-02-22 14:22   ` Tvrtko Ursulin
2017-02-22 13:22 ` ✗ Fi.CI.BAT: failure for series starting with [v2,01/15] drm/i915: Keep a global seqno per-engine Patchwork
2017-02-22 20:52 ` ✗ Fi.CI.BAT: warning for series starting with [v2,01/15] drm/i915: Keep a global seqno per-engine (rev2) Patchwork
2017-02-22 20:56   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20f64a9b-9a35-3619-e19f-cb9b4e7c4b99@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.