intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/i915: Tidy max/display priority macro
@ 2017-01-21  9:25 Chris Wilson
  2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-21  9:25 UTC (permalink / raw)
  To: intel-gfx

Convert I915_PRIORITY_DISPLAY to an enum for easier ranking with new
priorities in later patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 1 -
 drivers/gpu/drm/i915/i915_gem_request.h | 4 ++++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1b340eb0db19..562e8bfa5501 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3401,7 +3401,6 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
 				  int priority);
-#define I915_PRIORITY_DISPLAY I915_PRIORITY_MAX
 
 int __must_check
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 316c86c98b6a..da71ae280b2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -73,6 +73,10 @@ struct i915_priotree {
 #define I915_PRIORITY_MIN (-I915_PRIORITY_MAX)
 };
 
+enum {
+	I915_PRIORITY_DISPLAY = I915_PRIORITY_MAX
+};
+
 struct i915_gem_capture_list {
 	struct i915_gem_capture_list *next;
 	struct i915_vma *vma;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-21  9:25 [PATCH 1/3] drm/i915: Tidy max/display priority macro Chris Wilson
@ 2017-01-21  9:25 ` Chris Wilson
  2017-01-23  8:03   ` Joonas Lahtinen
  2017-01-23 10:43   ` Tvrtko Ursulin
  2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
  2017-01-23  8:31 ` [PATCH 1/3] drm/i915: Tidy max/display priority macro Joonas Lahtinen
  2 siblings, 2 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-21  9:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

We always try to do an unlocked wait before resorting to having a
blocking wait under the mutex - so we very rarely have to sleep under
the struct_mutex. However, when we do we want that wait to be as short
as possible as the struct_mutex is our BKL that will stall the driver and
all clients.

There should be no impact for all typical workloads.

v2: Move down a layer to apply to all waits.
v3: Make the priority boost explicit. This makes the paths where we want
boosting under the mutex clear and prevents boosting priority uselessly
for when we are waiting for idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c         | 10 +++++++++-
 drivers/gpu/drm/i915/i915_gem_request.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_gem_request.h | 10 +++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c |  4 +++-
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d1fd3ef7fad4..0c1694fef903 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -330,6 +330,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
+				   I915_WAIT_PRIORITY |
 				   I915_WAIT_ALL,
 				   MAX_SCHEDULE_TIMEOUT,
 				   NULL);
@@ -755,7 +756,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
-				   I915_WAIT_LOCKED,
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_PRIORITY,
 				   MAX_SCHEDULE_TIMEOUT,
 				   NULL);
 	if (ret)
@@ -806,6 +808,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
+				   I915_WAIT_PRIORITY |
 				   I915_WAIT_ALL,
 				   MAX_SCHEDULE_TIMEOUT,
 				   NULL);
@@ -3074,6 +3077,8 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 {
 	int ret;
 
+	GEM_BUG_ON(flags & I915_WAIT_PRIORITY);
+
 	if (flags & I915_WAIT_LOCKED) {
 		struct i915_gem_timeline *tl;
 
@@ -3163,6 +3168,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
+				   I915_WAIT_PRIORITY |
 				   (write ? I915_WAIT_ALL : 0),
 				   MAX_SCHEDULE_TIMEOUT,
 				   NULL);
@@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		ret = i915_gem_object_wait(obj,
 					   I915_WAIT_INTERRUPTIBLE |
 					   I915_WAIT_LOCKED |
+					   I915_WAIT_PRIORITY |
 					   I915_WAIT_ALL,
 					   MAX_SCHEDULE_TIMEOUT,
 					   NULL);
@@ -3566,6 +3573,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
+				   I915_WAIT_PRIORITY |
 				   (write ? I915_WAIT_ALL : 0),
 				   MAX_SCHEDULE_TIMEOUT,
 				   NULL);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index bacb875a6ef3..9910b565d1a0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -1046,6 +1046,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 		   !!lockdep_is_held(&req->i915->drm.struct_mutex) !=
 		   !!(flags & I915_WAIT_LOCKED));
 #endif
+	GEM_BUG_ON((flags & I915_WAIT_PRIORITY) && !(flags & I915_WAIT_LOCKED));
 	GEM_BUG_ON(timeout < 0);
 
 	if (i915_gem_request_completed(req))
@@ -1056,6 +1057,15 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 
 	trace_i915_gem_request_wait_begin(req);
 
+	/* Very rarely do we wait whilst holding the mutex. We try to always
+	 * do an unlocked wait before using a locked wait. However, when we
+	 * have to resort to a locked wait, we want that wait to be as short
+	 * as possible as the struct_mutex is our BKL that will stall the
+	 * driver and all clients.
+	 */
+	if (flags & I915_WAIT_PRIORITY && req->engine->schedule)
+		req->engine->schedule(req, I915_PRIORITY_LOCKED);
+
 	add_wait_queue(&req->execute, &exec);
 	if (flags & I915_WAIT_LOCKED) {
 		add_wait_queue(errq, &reset);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index da71ae280b2a..ba83c507613b 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -74,7 +74,8 @@ struct i915_priotree {
 };
 
 enum {
-	I915_PRIORITY_DISPLAY = I915_PRIORITY_MAX
+	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
+	I915_PRIORITY_DISPLAY
 };
 
 struct i915_gem_capture_list {
@@ -301,7 +302,8 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 	__attribute__((nonnull(1)));
 #define I915_WAIT_INTERRUPTIBLE	BIT(0)
 #define I915_WAIT_LOCKED	BIT(1) /* struct_mutex held, handle GPU reset */
-#define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
+#define I915_WAIT_PRIORITY	BIT(2) /* struct_mutex held, boost priority */
+#define I915_WAIT_ALL		BIT(3) /* used by i915_gem_object_wait() */
 
 static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
 
@@ -726,7 +728,9 @@ i915_gem_active_retire(struct i915_gem_active *active,
 		return 0;
 
 	ret = i915_wait_request(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				I915_WAIT_INTERRUPTIBLE |
+				I915_WAIT_LOCKED |
+				I915_WAIT_PRIORITY,
 				MAX_SCHEDULE_TIMEOUT);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cf4ec937d923..2f7651ef45d5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 		return -ENOSPC;
 
 	timeout = i915_wait_request(target,
-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				    I915_WAIT_INTERRUPTIBLE |
+				    I915_WAIT_LOCKED |
+				    I915_WAIT_PRIORITY,
 				    MAX_SCHEDULE_TIMEOUT);
 	if (timeout < 0)
 		return timeout;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-21  9:25 [PATCH 1/3] drm/i915: Tidy max/display priority macro Chris Wilson
  2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
@ 2017-01-21  9:25 ` Chris Wilson
  2017-01-21 10:09   ` Chris Wilson
                     ` (2 more replies)
  2017-01-23  8:31 ` [PATCH 1/3] drm/i915: Tidy max/display priority macro Joonas Lahtinen
  2 siblings, 3 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-21  9:25 UTC (permalink / raw)
  To: intel-gfx

In order to maximise concurrency between engines, if we queue a request
to a current idle ring, reorder its dependencies to execute that request
as early as possible and ideally improve occupancy of multiple engines
simultaneously.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index ba83c507613b..7ba9cc53abe9 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -74,8 +74,9 @@ struct i915_priotree {
 };
 
 enum {
-	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
-	I915_PRIORITY_DISPLAY
+	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
+	I915_PRIORITY_LOCKED,
+	I915_PRIORITY_DISPLAY,
 };
 
 struct i915_gem_capture_list {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 50bec759989f..b46cb1bb32b8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 	struct i915_dependency stack;
 	LIST_HEAD(dfs);
 
+	if (execlists_elsp_ready(request->engine))
+		prio = max(prio, I915_PRIORITY_STALL);
+
 	if (prio <= READ_ONCE(request->priotree.priority))
 		return;
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
@ 2017-01-21 10:09   ` Chris Wilson
  2017-01-23  8:39   ` Joonas Lahtinen
  2017-01-23 10:51   ` Tvrtko Ursulin
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-21 10:09 UTC (permalink / raw)
  To: intel-gfx

On Sat, Jan 21, 2017 at 09:25:26AM +0000, Chris Wilson wrote:
> In order to maximise concurrency between engines, if we queue a request
> to a current idle ring, reorder its dependencies to execute that request
> as early as possible and ideally improve occupancy of multiple engines
> simultaneously.

This is more of a toy, but I'm interested in seeing how it fares
nevertheless. There may be a large delay between scheduling and
submitting (depending on how quick its dependencies run, and whether it
has any external deps) and so querying hw state at scheduling time is
optimistic. Furthermore it can be easily exploited to prioritise bad
clients. Still I think the actual impact will be very small, and so am
interested in the experiment on real machines - where it will be dwarfed
by display boosting.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
@ 2017-01-23  8:03   ` Joonas Lahtinen
  2017-01-23 10:43   ` Tvrtko Ursulin
  1 sibling, 0 replies; 17+ messages in thread
From: Joonas Lahtinen @ 2017-01-23  8:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On la, 2017-01-21 at 09:25 +0000, Chris Wilson wrote:
> We always try to do an unlocked wait before resorting to having a
> blocking wait under the mutex - so we very rarely have to sleep under
> the struct_mutex. However, when we do we want that wait to be as short
> as possible as the struct_mutex is our BKL that will stall the driver and
> all clients.
> 
> There should be no impact for all typical workloads.
> 
> v2: Move down a layer to apply to all waits.
> v3: Make the priority boost explicit. This makes the paths where we want
> boosting under the mutex clear and prevents boosting priority uselessly
> for when we are waiting for idle.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] drm/i915: Tidy max/display priority macro
  2017-01-21  9:25 [PATCH 1/3] drm/i915: Tidy max/display priority macro Chris Wilson
  2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
  2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
@ 2017-01-23  8:31 ` Joonas Lahtinen
  2 siblings, 0 replies; 17+ messages in thread
From: Joonas Lahtinen @ 2017-01-23  8:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On la, 2017-01-21 at 09:25 +0000, Chris Wilson wrote:
> Convert I915_PRIORITY_DISPLAY to an enum for easier ranking with new
> priorities in later patches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
  2017-01-21 10:09   ` Chris Wilson
@ 2017-01-23  8:39   ` Joonas Lahtinen
  2017-01-23  8:48     ` Chris Wilson
  2017-01-23 10:51   ` Tvrtko Ursulin
  2 siblings, 1 reply; 17+ messages in thread
From: Joonas Lahtinen @ 2017-01-23  8:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On la, 2017-01-21 at 09:25 +0000, Chris Wilson wrote:
> In order to maximise concurrency between engines, if we queue a request
> to a current idle ring, reorder its dependencies to execute that request
> as early as possible and ideally improve occupancy of multiple engines
> simultaneously.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Any specific testcase where this helped?

Code itself does what the text says;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-23  8:39   ` Joonas Lahtinen
@ 2017-01-23  8:48     ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-23  8:48 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Jan 23, 2017 at 10:39:51AM +0200, Joonas Lahtinen wrote:
> On la, 2017-01-21 at 09:25 +0000, Chris Wilson wrote:
> > In order to maximise concurrency between engines, if we queue a request
> > to a current idle ring, reorder its dependencies to execute that request
> > as early as possible and ideally improve occupancy of multiple engines
> > simultaneously.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Any specific testcase where this helped?

Measurably? Not yet. I can observe it having some effect in watching the
priority queue when using mesa. But the actual test case under
investigation I don't hold much hope for - manual inspection there
suggests that the dependency chain should extend beyond the stall across
engines.

(It is a solution searching for a problem... Eeek.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
  2017-01-23  8:03   ` Joonas Lahtinen
@ 2017-01-23 10:43   ` Tvrtko Ursulin
  2017-01-23 10:51     ` Chris Wilson
  1 sibling, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2017-01-23 10:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter


On 21/01/2017 09:25, Chris Wilson wrote:
> We always try to do an unlocked wait before resorting to having a
> blocking wait under the mutex - so we very rarely have to sleep under
> the struct_mutex. However, when we do we want that wait to be as short
> as possible as the struct_mutex is our BKL that will stall the driver and
> all clients.
>
> There should be no impact for all typical workloads.
>
> v2: Move down a layer to apply to all waits.
> v3: Make the priority boost explicit. This makes the paths where we want
> boosting under the mutex clear and prevents boosting priority uselessly
> for when we are waiting for idle.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 10 +++++++++-
>  drivers/gpu/drm/i915/i915_gem_request.c | 10 ++++++++++
>  drivers/gpu/drm/i915/i915_gem_request.h | 10 +++++++---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  4 +++-
>  4 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d1fd3ef7fad4..0c1694fef903 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -330,6 +330,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
>  				   I915_WAIT_LOCKED |
> +				   I915_WAIT_PRIORITY |
>  				   I915_WAIT_ALL,
>  				   MAX_SCHEDULE_TIMEOUT,
>  				   NULL);
> @@ -755,7 +756,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
> -				   I915_WAIT_LOCKED,
> +				   I915_WAIT_LOCKED |
> +				   I915_WAIT_PRIORITY,
>  				   MAX_SCHEDULE_TIMEOUT,
>  				   NULL);
>  	if (ret)
> @@ -806,6 +808,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
>  				   I915_WAIT_LOCKED |
> +				   I915_WAIT_PRIORITY |
>  				   I915_WAIT_ALL,
>  				   MAX_SCHEDULE_TIMEOUT,
>  				   NULL);
> @@ -3074,6 +3077,8 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
>  {
>  	int ret;
>
> +	GEM_BUG_ON(flags & I915_WAIT_PRIORITY);
> +
>  	if (flags & I915_WAIT_LOCKED) {
>  		struct i915_gem_timeline *tl;
>
> @@ -3163,6 +3168,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
>  				   I915_WAIT_LOCKED |
> +				   I915_WAIT_PRIORITY |
>  				   (write ? I915_WAIT_ALL : 0),
>  				   MAX_SCHEDULE_TIMEOUT,
>  				   NULL);
> @@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		ret = i915_gem_object_wait(obj,
>  					   I915_WAIT_INTERRUPTIBLE |
>  					   I915_WAIT_LOCKED |
> +					   I915_WAIT_PRIORITY |
>  					   I915_WAIT_ALL,
>  					   MAX_SCHEDULE_TIMEOUT,
>  					   NULL);

As mentioned before, is this not a concern? Is it not letting any 
userspace boost their prio to max by just calling set cache level after 
execbuf?

> @@ -3566,6 +3573,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  	ret = i915_gem_object_wait(obj,
>  				   I915_WAIT_INTERRUPTIBLE |
>  				   I915_WAIT_LOCKED |
> +				   I915_WAIT_PRIORITY |
>  				   (write ? I915_WAIT_ALL : 0),
>  				   MAX_SCHEDULE_TIMEOUT,
>  				   NULL);
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index bacb875a6ef3..9910b565d1a0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -1046,6 +1046,7 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  		   !!lockdep_is_held(&req->i915->drm.struct_mutex) !=
>  		   !!(flags & I915_WAIT_LOCKED));
>  #endif
> +	GEM_BUG_ON((flags & I915_WAIT_PRIORITY) && !(flags & I915_WAIT_LOCKED));
>  	GEM_BUG_ON(timeout < 0);
>
>  	if (i915_gem_request_completed(req))
> @@ -1056,6 +1057,15 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>
>  	trace_i915_gem_request_wait_begin(req);
>
> +	/* Very rarely do we wait whilst holding the mutex. We try to always
> +	 * do an unlocked wait before using a locked wait. However, when we
> +	 * have to resort to a locked wait, we want that wait to be as short
> +	 * as possible as the struct_mutex is our BKL that will stall the
> +	 * driver and all clients.
> +	 */
> +	if (flags & I915_WAIT_PRIORITY && req->engine->schedule)
> +		req->engine->schedule(req, I915_PRIORITY_LOCKED);
> +
>  	add_wait_queue(&req->execute, &exec);
>  	if (flags & I915_WAIT_LOCKED) {
>  		add_wait_queue(errq, &reset);
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index da71ae280b2a..ba83c507613b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -74,7 +74,8 @@ struct i915_priotree {
>  };
>
>  enum {
> -	I915_PRIORITY_DISPLAY = I915_PRIORITY_MAX
> +	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
> +	I915_PRIORITY_DISPLAY
>  };
>
>  struct i915_gem_capture_list {
> @@ -301,7 +302,8 @@ long i915_wait_request(struct drm_i915_gem_request *req,
>  	__attribute__((nonnull(1)));
>  #define I915_WAIT_INTERRUPTIBLE	BIT(0)
>  #define I915_WAIT_LOCKED	BIT(1) /* struct_mutex held, handle GPU reset */
> -#define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
> +#define I915_WAIT_PRIORITY	BIT(2) /* struct_mutex held, boost priority */
> +#define I915_WAIT_ALL		BIT(3) /* used by i915_gem_object_wait() */
>
>  static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
>
> @@ -726,7 +728,9 @@ i915_gem_active_retire(struct i915_gem_active *active,
>  		return 0;
>
>  	ret = i915_wait_request(request,
> -				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> +				I915_WAIT_INTERRUPTIBLE |
> +				I915_WAIT_LOCKED |
> +				I915_WAIT_PRIORITY,
>  				MAX_SCHEDULE_TIMEOUT);
>  	if (ret < 0)
>  		return ret;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cf4ec937d923..2f7651ef45d5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>  		return -ENOSPC;
>
>  	timeout = i915_wait_request(target,
> -				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> +				    I915_WAIT_INTERRUPTIBLE |
> +				    I915_WAIT_LOCKED |
> +				    I915_WAIT_PRIORITY,
>  				    MAX_SCHEDULE_TIMEOUT);

This one also look worrying unless I am missing something. Allowing 
clients who fill the ring to promote their priority?

>  	if (timeout < 0)
>  		return timeout;
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
  2017-01-21 10:09   ` Chris Wilson
  2017-01-23  8:39   ` Joonas Lahtinen
@ 2017-01-23 10:51   ` Tvrtko Ursulin
  2017-01-23 11:16     ` Chris Wilson
  2 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2017-01-23 10:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2017 09:25, Chris Wilson wrote:
> In order to maximise concurrency between engines, if we queue a request
> to a current idle ring, reorder its dependencies to execute that request
> as early as possible and ideally improve occupancy of multiple engines
> simultaneously.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index ba83c507613b..7ba9cc53abe9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -74,8 +74,9 @@ struct i915_priotree {
>  };
>
>  enum {
> -	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
> -	I915_PRIORITY_DISPLAY
> +	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
> +	I915_PRIORITY_LOCKED,
> +	I915_PRIORITY_DISPLAY,
>  };
>
>  struct i915_gem_capture_list {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 50bec759989f..b46cb1bb32b8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>  	struct i915_dependency stack;
>  	LIST_HEAD(dfs);
>
> +	if (execlists_elsp_ready(request->engine))
> +		prio = max(prio, I915_PRIORITY_STALL);
> +

It would have to be execlists_elsp_idle for it to match with the commit 
message.

But even then the idea worries me sufficiently that I would refrain from 
adding it.

I don't like the name of I915_PRIORITY_STALL either, since I think about 
stalls as ELSP transitioning to idle and no runnable requests. It could 
be I915_PRIORITY_SUBMIT, but I just can't find a good story for when 
this might be a good idea.

Perhaps if we combined it with the no other runnable requests on the 
engine it might be passable?

Regards,

Tvrtko

>  	if (prio <= READ_ONCE(request->priotree.priority))
>  		return;
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-23 10:43   ` Tvrtko Ursulin
@ 2017-01-23 10:51     ` Chris Wilson
  2017-01-23 11:41       ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-01-23 10:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx

On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
> >@@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > 		ret = i915_gem_object_wait(obj,
> > 					   I915_WAIT_INTERRUPTIBLE |
> > 					   I915_WAIT_LOCKED |
> >+					   I915_WAIT_PRIORITY |
> > 					   I915_WAIT_ALL,
> > 					   MAX_SCHEDULE_TIMEOUT,
> > 					   NULL);
> 
> As mentioned before, is this not a concern? Is it not letting any
> userspace boost their prio to max by just calling set cache level
> after execbuf?

Not any more, set-cache-ioctl now does an explicit unlocked wait first
before hitting this wait. Also, the likely cause is though page-flip
after execbuf on a fresh bo, which is a stall we don't want.

> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >@@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> > 		return -ENOSPC;
> >
> > 	timeout = i915_wait_request(target,
> >-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> >+				    I915_WAIT_INTERRUPTIBLE |
> >+				    I915_WAIT_LOCKED |
> >+				    I915_WAIT_PRIORITY,
> > 				    MAX_SCHEDULE_TIMEOUT);
> 
> This one also look worrying unless I am missing something. Allowing
> clients who fill the ring to promote their priority?

Yes. They only boost priority for very, very old requests and more
importantly these clients are now stalling the entire *system* and not
just themselves anymore. So there is an implicit priority inversion
through struct_mutex. The only long term solution is avoiding
inter-client locks - we still may have inversion on any shared resource,
most likely objects, but we can at least reduce the contention by
splitting and avoid struct_mutex.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-23 10:51   ` Tvrtko Ursulin
@ 2017-01-23 11:16     ` Chris Wilson
  2017-01-23 11:50       ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-01-23 11:16 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jan 23, 2017 at 10:51:28AM +0000, Tvrtko Ursulin wrote:
> 
> On 21/01/2017 09:25, Chris Wilson wrote:
> >In order to maximise concurrency between engines, if we queue a request
> >to a current idle ring, reorder its dependencies to execute that request
> >as early as possible and ideally improve occupancy of multiple engines
> >simultaneously.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
> > drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
> > 2 files changed, 6 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> >index ba83c507613b..7ba9cc53abe9 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >@@ -74,8 +74,9 @@ struct i915_priotree {
> > };
> >
> > enum {
> >-	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
> >-	I915_PRIORITY_DISPLAY
> >+	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
> >+	I915_PRIORITY_LOCKED,
> >+	I915_PRIORITY_DISPLAY,
> > };
> >
> > struct i915_gem_capture_list {
> >diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >index 50bec759989f..b46cb1bb32b8 100644
> >--- a/drivers/gpu/drm/i915/intel_lrc.c
> >+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >@@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
> > 	struct i915_dependency stack;
> > 	LIST_HEAD(dfs);
> >
> >+	if (execlists_elsp_ready(request->engine))
> >+		prio = max(prio, I915_PRIORITY_STALL);
> >+
> 
> It would have to be execlists_elsp_idle for it to match with the
> commit message.
> 
> But even then the idea worries me sufficiently that I would refrain
> from adding it.
> 
> I don't like the name of I915_PRIORITY_STALL either, since I think
> about stalls as ELSP transitioning to idle and no runnable requests.
> It could be I915_PRIORITY_SUBMIT, but I just can't find a good story
> for when this might be a good idea.

That's the case we are trying to address. This engine is stalled (no
requests to run), so we are boosting the priority of its deps on the
other engines so that stall time is minimised. That's why I choose STALL
because it related to the currently stalled engine.

> Perhaps if we combined it with the no other runnable requests on the
> engine it might be passable?

We don't have that information yet (trivially available at least). But it
still runs into the problem that the first request added to the engine's
implicit timeline may not be the first request actually ready-to-submit
due to external third parties.

In hopefully the near future, we could write:

     if (!request->engine->timeline->active_seqno)

which would only apply the magic boost to the very first rq added to
this engine after a period of idleness. Hmm, that's a bit too weak as we
try to avoid retiring.

Not keen on that. I'd like to push for keeping ready() first, and idle()
as a compromise if need be. The idea is that ready() is avoiding the
stall on the second submission slot - but that maybe too preemptive. I
can see plenty of arguments and counter-arguments.

The key problem is that this heuristic is simply too speculative. But
ideal information to choose the request with the shortest dep path, or
to choose the overall schedule to minimise delays.

After all that I still regard this as an interesting toy that may prove
useful, or be counterproductive. I also regard that it should only be
interesting in the short term.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-23 10:51     ` Chris Wilson
@ 2017-01-23 11:41       ` Tvrtko Ursulin
  2017-01-23 11:50         ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2017-01-23 11:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniel Vetter


On 23/01/2017 10:51, Chris Wilson wrote:
> On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
>>> @@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>> 		ret = i915_gem_object_wait(obj,
>>> 					   I915_WAIT_INTERRUPTIBLE |
>>> 					   I915_WAIT_LOCKED |
>>> +					   I915_WAIT_PRIORITY |
>>> 					   I915_WAIT_ALL,
>>> 					   MAX_SCHEDULE_TIMEOUT,
>>> 					   NULL);
>>
>> As mentioned before, is this not a concern? Is it not letting any
>> userspace boost their prio to max by just calling set cache level
>> after execbuf?
>
> Not any more, set-cache-ioctl now does an explicit unlocked wait first
> before hitting this wait. Also, the likely cause is though page-flip
> after execbuf on a fresh bo, which is a stall we don't want.

Ok I've missed that change.

>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>>> 		return -ENOSPC;
>>>
>>> 	timeout = i915_wait_request(target,
>>> -				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
>>> +				    I915_WAIT_INTERRUPTIBLE |
>>> +				    I915_WAIT_LOCKED |
>>> +				    I915_WAIT_PRIORITY,
>>> 				    MAX_SCHEDULE_TIMEOUT);
>>
>> This one also look worrying unless I am missing something. Allowing
>> clients who fill the ring to promote their priority?
>
> Yes. They only boost priority for very, very old requests and more
> importantly these clients are now stalling the entire *system* and not
> just themselves anymore. So there is an implicit priority inversion
> through struct_mutex. The only long term solution is avoiding
> inter-client locks - we still may have inversion on any shared resource,
> most likely objects, but we can at least reduce the contention by
> splitting and avoid struct_mutex.

How do you know they are stalling the entire system - haven't they just 
filled up their ringbuff? So the target request will be one of theirs.

Once scheduler is able to do fair timeslicing or something, especially 
then we should not allow clients to prioritise themselves by just 
filling their ringbuf.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-23 11:41       ` Tvrtko Ursulin
@ 2017-01-23 11:50         ` Chris Wilson
  2017-01-23 11:56           ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-01-23 11:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx

On Mon, Jan 23, 2017 at 11:41:03AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/01/2017 10:51, Chris Wilson wrote:
> >On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
> >>>@@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >>>		ret = i915_gem_object_wait(obj,
> >>>					   I915_WAIT_INTERRUPTIBLE |
> >>>					   I915_WAIT_LOCKED |
> >>>+					   I915_WAIT_PRIORITY |
> >>>					   I915_WAIT_ALL,
> >>>					   MAX_SCHEDULE_TIMEOUT,
> >>>					   NULL);
> >>
> >>As mentioned before, is this not a concern? Is it not letting any
> >>userspace boost their prio to max by just calling set cache level
> >>after execbuf?
> >
> >Not any more, set-cache-ioctl now does an explicit unlocked wait first
> >before hitting this wait. Also, the likely cause is though page-flip
> >after execbuf on a fresh bo, which is a stall we don't want.
> 
> Ok I've missed that change.
> 
> >>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>@@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >>>		return -ENOSPC;
> >>>
> >>>	timeout = i915_wait_request(target,
> >>>-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> >>>+				    I915_WAIT_INTERRUPTIBLE |
> >>>+				    I915_WAIT_LOCKED |
> >>>+				    I915_WAIT_PRIORITY,
> >>>				    MAX_SCHEDULE_TIMEOUT);
> >>
> >>This one also look worrying unless I am missing something. Allowing
> >>clients who fill the ring to promote their priority?
> >
> >Yes. They only boost priority for very, very old requests and more
> >importantly these clients are now stalling the entire *system* and not
> >just themselves anymore. So there is an implicit priority inversion
> >through struct_mutex. The only long term solution is avoiding
> >inter-client locks - we still may have inversion on any shared resource,
> >most likely objects, but we can at least reduce the contention by
> >splitting and avoid struct_mutex.
> 
> How do you know they are stalling the entire system - haven't they
> just filled up their ringbuff? So the target request will be one of
> theirs.

struct_mutex is our BKL. I view anything taking it with suspicion,
because it stops other clients, pageflips, eventually everything.
 
> Once scheduler is able to do fair timeslicing or something,
> especially then we should not allow clients to prioritise themselves
> by just filling their ringbuf.

That is still missing the impact on the system of holding struct_mutex
for any period of time.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-23 11:16     ` Chris Wilson
@ 2017-01-23 11:50       ` Tvrtko Ursulin
  2017-01-23 12:02         ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2017-01-23 11:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/01/2017 11:16, Chris Wilson wrote:
> On Mon, Jan 23, 2017 at 10:51:28AM +0000, Tvrtko Ursulin wrote:
>>
>> On 21/01/2017 09:25, Chris Wilson wrote:
>>> In order to maximise concurrency between engines, if we queue a request
>>> to a current idle ring, reorder its dependencies to execute that request
>>> as early as possible and ideally improve occupancy of multiple engines
>>> simultaneously.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
>>> drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
>>> index ba83c507613b..7ba9cc53abe9 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_request.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
>>> @@ -74,8 +74,9 @@ struct i915_priotree {
>>> };
>>>
>>> enum {
>>> -	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
>>> -	I915_PRIORITY_DISPLAY
>>> +	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
>>> +	I915_PRIORITY_LOCKED,
>>> +	I915_PRIORITY_DISPLAY,
>>> };
>>>
>>> struct i915_gem_capture_list {
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 50bec759989f..b46cb1bb32b8 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 	struct i915_dependency stack;
>>> 	LIST_HEAD(dfs);
>>>
>>> +	if (execlists_elsp_ready(request->engine))
>>> +		prio = max(prio, I915_PRIORITY_STALL);
>>> +
>>
>> It would have to be execlists_elsp_idle for it to match with the
>> commit message.
>>
>> But even then the idea worries me sufficiently that I would refrain
>> from adding it.
>>
>> I don't like the name of I915_PRIORITY_STALL either, since I think
>> about stalls as ELSP transitioning to idle and no runnable requests.
>> It could be I915_PRIORITY_SUBMIT, but I just can't find a good story
>> for when this might be a good idea.
>
> That's the case we are trying to address. This engine is stalled (no
> requests to run), so we are boosting the priority of its deps on the
> other engines so that stall time is minimised. That's why I choose STALL
> because it related to the currently stalled engine.

Here we don't know that there are no other requests to run - perhaps 
they have been just queued up before this one and not in elsp yet 
(tasklet pending).

Hm, and ports are also not protected by struct_mutex but the 
engine->timeline->lock so this looks to be open for an race.

>> Perhaps if we combined it with the no other runnable requests on the
>> engine it might be passable?
>
> We don't have that information yet (trivially available at least). But it
> still runs into the problem that the first request added to the engine's
> implicit timeline may not be the first request actually ready-to-submit
> due to external third parties.

Would it not be the engine->execlist_first?

>
> In hopefully the near future, we could write:
>
>      if (!request->engine->timeline->active_seqno)
>
> which would only apply the magic boost to the very first rq added to
> this engine after a period of idleness. Hmm, that's a bit too weak as we
> try to avoid retiring.
>
> Not keen on that. I'd like to push for keeping ready() first, and idle()
> as a compromise if need be. The idea is that ready() is avoiding the
> stall on the second submission slot - but that maybe too preemptive. I
> can see plenty of arguments and counter-arguments.
>
> The key problem is that this heuristic is simply too speculative. But
> ideal information to choose the request with the shortest dep path, or
> to choose the overall schedule to minimise delays.
>
> After all that I still regard this as an interesting toy that may prove
> useful, or be counterproductive. I also regard that it should only be
> interesting in the short term.

If in doubt leave it out. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915: Priority boost for locked waits
  2017-01-23 11:50         ` Chris Wilson
@ 2017-01-23 11:56           ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2017-01-23 11:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniel Vetter


On 23/01/2017 11:50, Chris Wilson wrote:
> On Mon, Jan 23, 2017 at 11:41:03AM +0000, Tvrtko Ursulin wrote:
>>
>> On 23/01/2017 10:51, Chris Wilson wrote:
>>> On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
>>>>> @@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>>> 		ret = i915_gem_object_wait(obj,
>>>>> 					   I915_WAIT_INTERRUPTIBLE |
>>>>> 					   I915_WAIT_LOCKED |
>>>>> +					   I915_WAIT_PRIORITY |
>>>>> 					   I915_WAIT_ALL,
>>>>> 					   MAX_SCHEDULE_TIMEOUT,
>>>>> 					   NULL);
>>>>
>>>> As mentioned before, is this not a concern? Is it not letting any
>>>> userspace boost their prio to max by just calling set cache level
>>>> after execbuf?
>>>
>>> Not any more, set-cache-ioctl now does an explicit unlocked wait first
>>> before hitting this wait. Also, the likely cause is though page-flip
>>> after execbuf on a fresh bo, which is a stall we don't want.
>>
>> Ok I've missed that change.
>>
>>>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> @@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>>>>> 		return -ENOSPC;
>>>>>
>>>>> 	timeout = i915_wait_request(target,
>>>>> -				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
>>>>> +				    I915_WAIT_INTERRUPTIBLE |
>>>>> +				    I915_WAIT_LOCKED |
>>>>> +				    I915_WAIT_PRIORITY,
>>>>> 				    MAX_SCHEDULE_TIMEOUT);
>>>>
>>>> This one also look worrying unless I am missing something. Allowing
>>>> clients who fill the ring to promote their priority?
>>>
>>> Yes. They only boost priority for very, very old requests and more
>>> importantly these clients are now stalling the entire *system* and not
>>> just themselves anymore. So there is an implicit priority inversion
>>> through struct_mutex. The only long term solution is avoiding
>>> inter-client locks - we still may have inversion on any shared resource,
>>> most likely objects, but we can at least reduce the contention by
>>> splitting and avoid struct_mutex.
>>
>> How do you know they are stalling the entire system - haven't they
>> just filled up their ringbuff? So the target request will be one of
>> theirs.
>
> struct_mutex is our BKL. I view anything taking it with suspicion,
> because it stops other clients, pageflips, eventually everything.
>
>> Once scheduler is able to do fair timeslicing or something,
>> especially then we should not allow clients to prioritise themselves
>> by just filling their ringbuf.
>
> That is still missing the impact on the system of holding struct_mutex
> for any period of time.

True, but I don't think priority boosting is a solution for that. I 
think we should rather think about approaches where the clients who fill 
up their ringbufs wait outside the struct_mutex then.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring
  2017-01-23 11:50       ` Tvrtko Ursulin
@ 2017-01-23 12:02         ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2017-01-23 12:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jan 23, 2017 at 11:50:47AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/01/2017 11:16, Chris Wilson wrote:
> >On Mon, Jan 23, 2017 at 10:51:28AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 21/01/2017 09:25, Chris Wilson wrote:
> >>>In order to maximise concurrency between engines, if we queue a request
> >>>to a current idle ring, reorder its dependencies to execute that request
> >>>as early as possible and ideally improve occupancy of multiple engines
> >>>simultaneously.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>---
> >>>drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
> >>>drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
> >>>2 files changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> >>>index ba83c507613b..7ba9cc53abe9 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >>>@@ -74,8 +74,9 @@ struct i915_priotree {
> >>>};
> >>>
> >>>enum {
> >>>-	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
> >>>-	I915_PRIORITY_DISPLAY
> >>>+	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
> >>>+	I915_PRIORITY_LOCKED,
> >>>+	I915_PRIORITY_DISPLAY,
> >>>};
> >>>
> >>>struct i915_gem_capture_list {
> >>>diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >>>index 50bec759989f..b46cb1bb32b8 100644
> >>>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>>@@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
> >>>	struct i915_dependency stack;
> >>>	LIST_HEAD(dfs);
> >>>
> >>>+	if (execlists_elsp_ready(request->engine))
> >>>+		prio = max(prio, I915_PRIORITY_STALL);
> >>>+
> >>
> >>It would have to be execlists_elsp_idle for it to match with the
> >>commit message.
> >>
> >>But even then the idea worries me sufficiently that I would refrain
> >>from adding it.
> >>
> >>I don't like the name of I915_PRIORITY_STALL either, since I think
> >>about stalls as ELSP transitioning to idle and no runnable requests.
> >>It could be I915_PRIORITY_SUBMIT, but I just can't find a good story
> >>for when this might be a good idea.
> >
> >That's the case we are trying to address. This engine is stalled (no
> >requests to run), so we are boosting the priority of its deps on the
> >other engines so that stall time is minimised. That's why I choose STALL
> >because it related to the currently stalled engine.
> 
> Here we don't know that there are no other requests to run - perhaps
> they have been just queued up before this one and not in elsp yet
> (tasklet pending).
> 
> Hm, and ports are also not protected by struct_mutex but the
> engine->timeline->lock so this looks to be open for an race.

That race is immaterial, a READ_ONCE), versus the whole schedule() is run
before the request is ready to submit, and in doing so we have no
knowledge that this request is the ideal one on which to bump the priority.

And yes, this can and will bump multiple requests, whose deps then
compete in a fifo (sadly this may be a first bumped, first out).

> >>Perhaps if we combined it with the no other runnable requests on the
> >>engine it might be passable?
> >
> >We don't have that information yet (trivially available at least). But it
> >still runs into the problem that the first request added to the engine's
> >implicit timeline may not be the first request actually ready-to-submit
> >due to external third parties.
> 
> Would it not be the engine->execlist_first?

No. That is requests queued/ready for hw, beyond those already
submitted. One the one hand, that is more relaxed than checking
elsp_ready(), on the other elsp can be ready, even if we have queued requests.

Hmm. Using first is better here in that regard, but you still will not
like it since it is too eager.

> >In hopefully the near future, we could write:
> >
> >     if (!request->engine->timeline->active_seqno)
> >
> >which would only apply the magic boost to the very first rq added to
> >this engine after a period of idleness. Hmm, that's a bit too weak as we
> >try to avoid retiring.
> >
> >Not keen on that. I'd like to push for keeping ready() first, and idle()
> >as a compromise if need be. The idea is that ready() is avoiding the
> >stall on the second submission slot - but that maybe too preemptive. I
> >can see plenty of arguments and counter-arguments.
> >
> >The key problem is that this heuristic is simply too speculative. But
> >ideal information to choose the request with the shortest dep path, or
> >to choose the overall schedule to minimise delays.
> >
> >After all that I still regard this as an interesting toy that may prove
> >useful, or be counterproductive. I also regard that it should only be
> >interesting in the short term.
> 
> If in doubt leave it out. :)

Solution in search of a problem. If we can find someplace it does good,
then we give it a spin, otherwise it remains on the toy pile.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-01-23 12:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-21  9:25 [PATCH 1/3] drm/i915: Tidy max/display priority macro Chris Wilson
2017-01-21  9:25 ` [PATCH 2/3] drm/i915: Priority boost for locked waits Chris Wilson
2017-01-23  8:03   ` Joonas Lahtinen
2017-01-23 10:43   ` Tvrtko Ursulin
2017-01-23 10:51     ` Chris Wilson
2017-01-23 11:41       ` Tvrtko Ursulin
2017-01-23 11:50         ` Chris Wilson
2017-01-23 11:56           ` Tvrtko Ursulin
2017-01-21  9:25 ` [PATCH 3/3] drm/i915: Priority boost switching to an idle ring Chris Wilson
2017-01-21 10:09   ` Chris Wilson
2017-01-23  8:39   ` Joonas Lahtinen
2017-01-23  8:48     ` Chris Wilson
2017-01-23 10:51   ` Tvrtko Ursulin
2017-01-23 11:16     ` Chris Wilson
2017-01-23 11:50       ` Tvrtko Ursulin
2017-01-23 12:02         ` Chris Wilson
2017-01-23  8:31 ` [PATCH 1/3] drm/i915: Tidy max/display priority macro Joonas Lahtinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).