* [RFC 0/5] Class/instance based execbuf plus more
@ 2017-11-13 13:09 Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 1/5] drm/i915: Select engines via class and instance in execbuffer2 Tvrtko Ursulin
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Now that the engine class concept is in, it is time to re-send the old proposal
of using it for engine selection in execbuf.
Idea is primarily to fix the situation with the current VCS engine selection ABI
by introducing a new, cleaner, method of selecting the VCS engine.
Then there are two new pieces of uAPI proposal, engine capabilities and
concurrent contexts, which for instance enable the VA-API driver to let the i915
balance it's batch buffers dynamically.
This enables better utilization of resources on GT3/GT4 parts where:
a) a single stream can now use both engines
b) it opens the door of extending the i915 scheduler with more advanced
load balancing approaches to support the multiple-streams use cases better.
For instance decoding a single H.264 stream on a GT4 part is now improved from
57 seconds to 40 seconds, with minimal VA-API code base changes:
root@sc:~/ffmpeg# VA_INTEL_CONCURRENT=0 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -
Performance counter stats for 'system wide':
57,568,097,358 ns i915/vcs0-busy/
0 ns i915/vcs1-busy/
57.585753514 seconds time elapsed
root@sc:~/ffmpeg# VA_INTEL_CONCURRENT=1 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -
Performance counter stats for 'system wide':
29,152,427,164 ns i915/vcs0-busy/
29,115,272,714 ns i915/vcs1-busy/
40.733992298 seconds time elapsed
I will be sending the proof-of-concept patches for intel-vaapi-driver
separately.
Tvrtko Ursulin (5):
drm/i915: Select engines via class and instance in execbuffer2
drm/i915: Engine capabilities uAPI
drm/i915: Concurrent context uAPI
drm/i915: Re-arrange execbuf so context is known before engine
drm/i915: Per batch buffer VCS balancing
drivers/gpu/drm/i915/i915_drv.h | 7 +-
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/i915_gem_context.c | 14 +++
drivers/gpu/drm/i915/i915_gem_context.h | 20 +++++
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 134 ++++++++++++++++++++++-------
drivers/gpu/drm/i915/intel_engine_cs.c | 3 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +
include/uapi/drm/i915_drm.h | 34 +++++++-
8 files changed, 180 insertions(+), 36 deletions(-)
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 1/5] drm/i915: Select engines via class and instance in execbuffer2
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
@ 2017-11-13 13:09 ` Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 2/5] drm/i915: Engine capabilities uAPI Tvrtko Ursulin
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Introduce a new way of selecting engines using the engine class
and instance concept.
This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.
Proposed idea here is to reserve 8-bits of flags, to pass in the
engine instance, re-use the existing engine selection bits for
the class selection, and a new flag named
I915_EXEC_CLASS_INSTANCE to tell the kernel this new engine
selection API is in use.
The new uAPI also removes access to the weak VCS engine
balancing as currently existing in the driver.
Example usage to send a command to VCS0:
eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 0);
Or to send a command to VCS1:
eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 1);
v2:
* Fix unknown flags mask.
* Use I915_EXEC_RING_MASK for class. (Chris Wilson)
v3:
* Add a map for fast class-instance engine lookup. (Chris Wilson)
v4:
* Update commit to reflect v3.
* Export intel_engine_lookup for other users. (Chris Wilson)
* Split out some warns. (Chris Wilson)
v5:
* Fixed shift and mask logic.
* Rebased to be standalone.
v6:
* Rebased back to follow engine info ioctl.
* Rename helper to intel_engine_lookup_user. (Chris Wilson)
v7:
* Rebased.
v8:
* Rebased after engine class uAPI got in via separate route.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
include/uapi/drm/i915_drm.h | 12 +++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 435ed95df144..2e7ddd197cc4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2004,6 +2004,16 @@ gen8_dispatch_bsd_engine(struct drm_i915_private *dev_priv,
return file_priv->bsd_engine;
}
+static struct intel_engine_cs *
+eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
+{
+ u8 class = eb_flags & I915_EXEC_RING_MASK;
+ u8 instance = (eb_flags & I915_EXEC_INSTANCE_MASK) >>
+ I915_EXEC_INSTANCE_SHIFT;
+
+ return intel_engine_lookup_user(i915, class, instance);
+}
+
#define I915_USER_RINGS (4)
static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
@@ -2022,6 +2032,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
struct intel_engine_cs *engine;
+ if (args->flags & I915_EXEC_CLASS_INSTANCE)
+ return eb_select_engine_class_instance(dev_priv, args->flags);
+
if (user_ring_id > I915_USER_RINGS) {
DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
return NULL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8ca906148a96..9ed351f4a304 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1051,7 +1051,12 @@ struct drm_i915_gem_execbuffer2 {
*/
#define I915_EXEC_FENCE_ARRAY (1<<19)
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+#define I915_EXEC_CLASS_INSTANCE (1<<20)
+
+#define I915_EXEC_INSTANCE_SHIFT (21)
+#define I915_EXEC_INSTANCE_MASK (0xff << I915_EXEC_INSTANCE_SHIFT)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-((1 << 29) << 1))
#define I915_EXEC_CONTEXT_ID_MASK (0xffffffff)
#define i915_execbuffer2_set_context_id(eb2, context) \
@@ -1059,6 +1064,11 @@ struct drm_i915_gem_execbuffer2 {
#define i915_execbuffer2_get_context_id(eb2) \
((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
+#define i915_execbuffer2_engine(class, instance) \
+ (I915_EXEC_CLASS_INSTANCE | \
+ (class) | \
+ ((instance) << I915_EXEC_INSTANCE_SHIFT))
+
struct drm_i915_gem_pin {
/** Handle of the buffer to be pinned. */
__u32 handle;
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 2/5] drm/i915: Engine capabilities uAPI
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 1/5] drm/i915: Select engines via class and instance in execbuffer2 Tvrtko Ursulin
@ 2017-11-13 13:09 ` Tvrtko Ursulin
2017-11-13 13:13 ` Chris Wilson
2017-11-13 13:17 ` Chris Wilson
2017-11-13 13:09 ` [RFC 3/5] drm/i915: Concurrent context uAPI Tvrtko Ursulin
` (3 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
To add the knowledge that VCS1 engine does not support HEVC,
we introduce the concept of engine capabilities. These are
flags defined in per-engine class space which can be passed
in during execbuf time. The driver is then able to fail the
execbuf in case of mismatch between the requested capabilities
and the selected target engine.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 19 +++++++++++++++----
drivers/gpu/drm/i915/intel_engine_cs.c | 3 +++
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
include/uapi/drm/i915_drm.h | 17 ++++++++++++++++-
4 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 2e7ddd197cc4..f7aabea601b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -56,9 +56,10 @@ enum {
#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
#define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
-#define __EXEC_HAS_RELOC BIT(31)
-#define __EXEC_VALIDATED BIT(30)
-#define __EXEC_INTERNAL_FLAGS (~0u << 30)
+#define __EXEC_HAS_RELOC BIT(63)
+#define __EXEC_VALIDATED BIT(62)
+#define __EXEC_INTERNAL_FLAGS (~0ULL << 62)
+
#define UPDATE PIN_OFFSET_FIXED
#define BATCH_OFFSET_BIAS (256*1024)
@@ -2010,8 +2011,16 @@ eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
u8 class = eb_flags & I915_EXEC_RING_MASK;
u8 instance = (eb_flags & I915_EXEC_INSTANCE_MASK) >>
I915_EXEC_INSTANCE_SHIFT;
+ u8 caps = (eb_flags & I915_EXEC_ENGINE_CAP_MASK) >>
+ I915_EXEC_ENGINE_CAP_SHIFT;
+ struct intel_engine_cs *engine;
- return intel_engine_lookup_user(i915, class, instance);
+ engine = intel_engine_lookup_user(i915, class, instance);
+
+ if (engine && ((caps & engine->caps) != caps))
+ return NULL;
+
+ return engine;
}
#define I915_USER_RINGS (4)
@@ -2217,6 +2226,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
~__EXEC_OBJECT_UNKNOWN_FLAGS);
+ BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_UNKNOWN_FLAGS);
+
eb.i915 = to_i915(dev);
eb.file = file;
eb.args = args;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 7f4d60fed46f..c75bd2c9d797 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -233,6 +233,9 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
engine->irq_shift = info->irq_shift;
engine->class = info->class;
engine->instance = info->instance;
+ if (INTEL_GEN(dev_priv) >= 8 && engine->class == VIDEO_DECODE_CLASS &&
+ engine->instance == 0)
+ engine->caps = I915_BSD_CAP_HEVC;
engine->uabi_id = info->uabi_id;
engine->uabi_class = class_info->uabi_class;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4660c6f920e5..a84749bcad33 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -300,6 +300,8 @@ struct intel_engine_cs {
u8 class;
u8 instance;
+ u8 caps;
+
u32 context_size;
u32 mmio_base;
unsigned int irq_shift;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9ed351f4a304..ad5fa26fc9e7 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1056,7 +1056,18 @@ struct drm_i915_gem_execbuffer2 {
#define I915_EXEC_INSTANCE_SHIFT (21)
#define I915_EXEC_INSTANCE_MASK (0xff << I915_EXEC_INSTANCE_SHIFT)
-#define __I915_EXEC_UNKNOWN_FLAGS (-((1 << 29) << 1))
+/*
+ * Inform the kernel of what engine capabilities this batch buffer
+ * requires. For example only the first VCS engine has the HEVC block.
+ *
+ * We reserve four bits for the capabilities where each can be shared
+ * between different engines. Eg. first bit can mean one feature for
+ * one engine and something else for the other.
+ */
+#define I915_EXEC_ENGINE_CAP_SHIFT (29)
+#define I915_EXEC_ENGINE_CAP_MASK (0xf << I915_EXEC_ENGINE_CAP_SHIFT)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-((1ULL << 33) << 1))
#define I915_EXEC_CONTEXT_ID_MASK (0xffffffff)
#define i915_execbuffer2_set_context_id(eb2, context) \
@@ -1069,6 +1080,10 @@ struct drm_i915_gem_execbuffer2 {
(class) | \
((instance) << I915_EXEC_INSTANCE_SHIFT))
+#define I915_BSD_CAP_HEVC (1 << 0)
+
+#define I915_ENGINE_CAP_FLAG(v) ((v) << I915_EXEC_ENGINE_CAP_SHIFT)
+
struct drm_i915_gem_pin {
/** Handle of the buffer to be pinned. */
__u32 handle;
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 3/5] drm/i915: Concurrent context uAPI
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 1/5] drm/i915: Select engines via class and instance in execbuffer2 Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 2/5] drm/i915: Engine capabilities uAPI Tvrtko Ursulin
@ 2017-11-13 13:09 ` Tvrtko Ursulin
2017-11-13 13:19 ` Chris Wilson
2017-11-13 13:28 ` Chris Wilson
2017-11-13 13:09 ` [RFC 4/5] drm/i915: Re-arrange execbuf so context is known before engine Tvrtko Ursulin
` (2 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When submitting work against existing contexts userspace knows that:
a) context state can be relied upon
b) submitted batches will complete in order.
By adding an option to mark its context as "concurrent" we allow userspace
to notify the kernel that it does not care about both of the above. i915
is then at liberty to break the above guarantees if and when appropriate.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++++
drivers/gpu/drm/i915/i915_gem_context.h | 18 ++++++++++++++++++
include/uapi/drm/i915_drm.h | 1 +
3 files changed, 31 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2db040695035..3ea1251924ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1043,6 +1043,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
case I915_CONTEXT_PARAM_PRIORITY:
args->value = ctx->priority;
break;
+ case I915_CONTEXT_PARAM_CONCURRENT:
+ args->value = i915_gem_context_is_concurrent(ctx);
+ break;
default:
ret = -EINVAL;
break;
@@ -1118,6 +1121,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
}
break;
+ case I915_CONTEXT_PARAM_CONCURRENT:
+ if (args->size)
+ ret = -EINVAL;
+ else if (args->value)
+ i915_gem_context_set_concurrent(ctx);
+ else
+ i915_gem_context_clear_concurrent(ctx);
+ break;
+
default:
ret = -EINVAL;
break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 4bfb72f8e1cb..0531c5eedb2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -115,6 +115,7 @@ struct i915_gem_context {
#define CONTEXT_BANNABLE 3
#define CONTEXT_BANNED 4
#define CONTEXT_FORCE_SINGLE_SUBMISSION 5
+#define CONTEXT_CONCURRENT 6
/**
* @hw_id: - unique identifier for the context
@@ -254,6 +255,23 @@ static inline void i915_gem_context_set_force_single_submission(struct i915_gem_
__set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ctx->flags);
}
+static inline bool
+i915_gem_context_is_concurrent(const struct i915_gem_context *ctx)
+{
+ return test_bit(CONTEXT_CONCURRENT, &ctx->flags);
+}
+
+static inline void i915_gem_context_set_concurrent(struct i915_gem_context *ctx)
+{
+ __set_bit(CONTEXT_CONCURRENT, &ctx->flags);
+}
+
+static inline void
+i915_gem_context_clear_concurrent(struct i915_gem_context *ctx)
+{
+ __clear_bit(CONTEXT_CONCURRENT, &ctx->flags);
+}
+
static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
{
return c->user_handle == DEFAULT_CONTEXT_HANDLE;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ad5fa26fc9e7..bef377c4b4fc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1482,6 +1482,7 @@ struct drm_i915_gem_context_param {
#define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */
#define I915_CONTEXT_DEFAULT_PRIORITY 0
#define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */
+#define I915_CONTEXT_PARAM_CONCURRENT 0x7
__u64 value;
};
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 4/5] drm/i915: Re-arrange execbuf so context is known before engine
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
` (2 preceding siblings ...)
2017-11-13 13:09 ` [RFC 3/5] drm/i915: Concurrent context uAPI Tvrtko Ursulin
@ 2017-11-13 13:09 ` Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 5/5] drm/i915: Per batch buffer VCS balancing Tvrtko Ursulin
2017-11-13 13:13 ` ✗ Fi.CI.BAT: failure for Class/instance based execbuf plus more Patchwork
5 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Needed for a following patch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 38 ++++++++++++++++--------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f7aabea601b0..ecb6290ab6ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2258,24 +2258,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (args->flags & I915_EXEC_IS_PINNED)
eb.batch_flags |= I915_DISPATCH_PINNED;
- eb.engine = eb_select_engine(eb.i915, file, args);
- if (!eb.engine)
- return -EINVAL;
-
- if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
- if (!HAS_RESOURCE_STREAMER(eb.i915)) {
- DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
- return -EINVAL;
- }
- if (eb.engine->id != RCS) {
- DRM_DEBUG("RS is not available on %s\n",
- eb.engine->name);
- return -EINVAL;
- }
-
- eb.batch_flags |= I915_DISPATCH_RS;
- }
-
if (args->flags & I915_EXEC_FENCE_IN) {
in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
if (!in_fence)
@@ -2300,6 +2282,25 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (unlikely(err))
goto err_destroy;
+ err = -EINVAL;
+ eb.engine = eb_select_engine(eb.i915, file, args);
+ if (!eb.engine)
+ goto err_engine;
+
+ if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
+ if (!HAS_RESOURCE_STREAMER(eb.i915)) {
+ DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n");
+ goto err_engine;
+ }
+ if (eb.engine->id != RCS) {
+ DRM_DEBUG("RS is not available on %s\n",
+ eb.engine->name);
+ goto err_engine;
+ }
+
+ eb.batch_flags |= I915_DISPATCH_RS;
+ }
+
/*
* Take a local wakeref for preparing to dispatch the execbuf as
* we expect to access the hardware fairly frequently in the
@@ -2460,6 +2461,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
mutex_unlock(&dev->struct_mutex);
err_rpm:
intel_runtime_pm_put(eb.i915);
+err_engine:
i915_gem_context_put(eb.ctx);
err_destroy:
eb_destroy(&eb);
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 5/5] drm/i915: Per batch buffer VCS balancing
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
` (3 preceding siblings ...)
2017-11-13 13:09 ` [RFC 4/5] drm/i915: Re-arrange execbuf so context is known before engine Tvrtko Ursulin
@ 2017-11-13 13:09 ` Tvrtko Ursulin
2017-11-16 0:21 ` Oscar Mateo
2017-11-13 13:13 ` ✗ Fi.CI.BAT: failure for Class/instance based execbuf plus more Patchwork
5 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2017-11-13 13:09 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Building on top of the three previously introduced new pieces of uAPI,
class/instance based engine selection, engine capabilities and concurrent
contexts, we can extend the execbuf2 uAPI to allow per batch buffer load
balancing on the VCS engines.
Legacy execbuf, with contexts mark as concurrent, can now load balance
between individual batches, instead of only statically per client.
This also means two batches submitted one after another, but both to
I915_EXEC_BSD, can potentially be running in parallel on VCS0 and VCS1
respectively.
For legacy (non-concurrent) contexts, behaviour is unchanged. VCS engine
is assigned to each context at the time of first submission and it
persists for the context lifetime.
In both cases trivial round-robin approach is used to load balance.
If execbuf requires a a particular engine feature, like for example HEVC
which is only available on vcs0, it needs to mark it's execbuf calls
appropriately using the engine capabilities uAPI.
For the class/instance based execbuf we add I915_EXEC_INSTANCE_ANY to
accomplish the same behaviour. This is only allowed to be used on
concurrent contexts or an error will be returned.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 7 +--
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/i915_gem_context.c | 2 +
drivers/gpu/drm/i915/i915_gem_context.h | 2 +
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 80 +++++++++++++++++++++++-------
include/uapi/drm/i915_drm.h | 6 +++
6 files changed, 78 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bf7216742452..9137127fad22 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1562,9 +1562,6 @@ struct i915_gem_mm {
u64 unordered_timeline;
- /* the indicator for dispatch video commands on two BSD rings */
- atomic_t bsd_engine_dispatch_index;
-
/** Bit 6 swizzling required for X tiling */
uint32_t bit_6_swizzle_x;
/** Bit 6 swizzling required for Y tiling */
@@ -2292,6 +2289,10 @@ struct drm_i915_private {
struct i915_gem_context *preempt_context;
struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
[MAX_ENGINE_INSTANCE + 1];
+
+ /* the indicator for dispatch video commands on two BSD rings */
+ atomic_t vcs_dispatch_index;
+
struct i915_vma *semaphore;
struct drm_dma_handle *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c5ddae2cc4ef..f39881bbd4ee 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5300,7 +5300,7 @@ i915_gem_load_init(struct drm_i915_private *dev_priv)
init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
- atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
+ atomic_set(&dev_priv->vcs_dispatch_index, 0);
spin_lock_init(&dev_priv->fb_tracking.lock);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3ea1251924ea..fb7021ee35d5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -275,6 +275,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
list_add_tail(&ctx->link, &dev_priv->contexts.list);
ctx->i915 = dev_priv;
ctx->priority = I915_PRIORITY_NORMAL;
+ atomic_set(&ctx->vcs_dispatch_index,
+ atomic_fetch_xor(1, &dev_priv->vcs_dispatch_index));
INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
INIT_LIST_HEAD(&ctx->handles_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 0531c5eedb2a..638c716afbef 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -192,6 +192,8 @@ struct i915_gem_context {
* context close.
*/
struct list_head handles_list;
+
+ atomic_t vcs_dispatch_index;
};
static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ecb6290ab6ee..a7bf4fedbdbc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -235,6 +235,7 @@ struct i915_execbuffer {
u64 invalid_flags; /** Set of execobj.flags that are invalid */
u32 context_flags; /** Set of execobj.flags to insert from the ctx */
+ bool ctx_concurrent;
u32 batch_start_offset; /** Location within object of batch */
u32 batch_len; /** Length of batch within object */
@@ -675,6 +676,8 @@ static int eb_select_context(struct i915_execbuffer *eb)
eb->ctx = ctx;
eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
+ eb->ctx_concurrent = i915_gem_context_is_concurrent(ctx);
+
eb->context_flags = 0;
if (ctx->flags & CONTEXT_NO_ZEROMAP)
eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
@@ -1987,26 +1990,63 @@ static int eb_submit(struct i915_execbuffer *eb)
return 0;
}
+static unsigned int select_vcs_engine(struct i915_execbuffer *eb, u64 eb_flags)
+{
+ struct drm_i915_private *i915 = eb->i915;
+ u8 eb_caps = (eb_flags & I915_EXEC_ENGINE_CAP_MASK) >>
+ I915_EXEC_ENGINE_CAP_SHIFT;
+ unsigned int instance;
+
+ if (!HAS_BSD2(i915))
+ return 0;
+
+ instance = atomic_fetch_xor(1, &eb->ctx->vcs_dispatch_index);
+
+ if (instance == 1 &&
+ (eb_caps & i915->engine[_VCS(instance)]->caps) != eb_caps)
+ instance = 0;
+
+ return instance;
+}
+
+static unsigned int
+client_static_vcs_choice(struct drm_i915_private *i915,
+ struct drm_i915_file_private *file_priv)
+{
+ /*
+ * Check whether the file_priv has already selected one engine
+ * and if not select one.
+ */
+ if ((int)file_priv->bsd_engine < 0)
+ file_priv->bsd_engine =
+ atomic_fetch_xor(1, &i915->vcs_dispatch_index);
+
+ return file_priv->bsd_engine;
+}
+
/**
* Find one BSD ring to dispatch the corresponding BSD command.
* The engine index is returned.
*/
static unsigned int
-gen8_dispatch_bsd_engine(struct drm_i915_private *dev_priv,
- struct drm_file *file)
+eb_select_vcs_engine(struct i915_execbuffer *eb,
+ struct drm_i915_file_private *file_priv,
+ u64 eb_flags)
{
- struct drm_i915_file_private *file_priv = file->driver_priv;
-
- /* Check whether the file_priv has already selected one ring. */
- if ((int)file_priv->bsd_engine < 0)
- file_priv->bsd_engine = atomic_fetch_xor(1,
- &dev_priv->mm.bsd_engine_dispatch_index);
-
- return file_priv->bsd_engine;
+ /*
+ * For concurrent contexts do a round-robin engine assignment
+ * for each batch buffer.
+ */
+ if (eb->ctx_concurrent)
+ return select_vcs_engine(eb, eb_flags);
+ else
+ return client_static_vcs_choice(eb->i915, file_priv);
}
static struct intel_engine_cs *
-eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
+eb_select_engine_class_instance(struct i915_execbuffer *eb,
+ struct drm_i915_file_private *file_priv,
+ u64 eb_flags)
{
u8 class = eb_flags & I915_EXEC_RING_MASK;
u8 instance = (eb_flags & I915_EXEC_INSTANCE_MASK) >>
@@ -2015,7 +2055,10 @@ eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
I915_EXEC_ENGINE_CAP_SHIFT;
struct intel_engine_cs *engine;
- engine = intel_engine_lookup_user(i915, class, instance);
+ if (instance == I915_EXEC_INSTANCE_ANY)
+ instance = eb_select_vcs_engine(eb, file_priv, eb_flags);
+
+ engine = intel_engine_lookup_user(eb->i915, class, instance);
if (engine && ((caps & engine->caps) != caps))
return NULL;
@@ -2034,15 +2077,17 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
};
static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
+eb_select_engine(struct i915_execbuffer *eb,
struct drm_file *file,
struct drm_i915_gem_execbuffer2 *args)
{
+ struct drm_i915_private *dev_priv = eb->i915;
unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
struct intel_engine_cs *engine;
if (args->flags & I915_EXEC_CLASS_INSTANCE)
- return eb_select_engine_class_instance(dev_priv, args->flags);
+ return eb_select_engine_class_instance(eb, file->driver_priv,
+ args->flags);
if (user_ring_id > I915_USER_RINGS) {
DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
@@ -2060,7 +2105,8 @@ eb_select_engine(struct drm_i915_private *dev_priv,
unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
- bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
+ bsd_idx = eb_select_vcs_engine(eb, file->driver_priv,
+ args->flags);
} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
bsd_idx <= I915_EXEC_BSD_RING2) {
bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2283,8 +2329,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
goto err_destroy;
err = -EINVAL;
- eb.engine = eb_select_engine(eb.i915, file, args);
- if (!eb.engine)
+ eb.engine = eb_select_engine(&eb, file, args);
+ if (unlikely(!eb.engine))
goto err_engine;
if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bef377c4b4fc..35255d7802db 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1056,6 +1056,12 @@ struct drm_i915_gem_execbuffer2 {
#define I915_EXEC_INSTANCE_SHIFT (21)
#define I915_EXEC_INSTANCE_MASK (0xff << I915_EXEC_INSTANCE_SHIFT)
+/*
+ * Batches sent with instance set to any can be load balanced by the driver
+ * is concurrent context flag is also enabled.
+ */
+#define I915_EXEC_INSTANCE_ANY (0xff)
+
/*
* Inform the kernel of what engine capabilities this batch buffer
* requires. For example only the first VCS engine has the HEVC block.
--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* ✗ Fi.CI.BAT: failure for Class/instance based execbuf plus more
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
` (4 preceding siblings ...)
2017-11-13 13:09 ` [RFC 5/5] drm/i915: Per batch buffer VCS balancing Tvrtko Ursulin
@ 2017-11-13 13:13 ` Patchwork
5 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-11-13 13:13 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
== Series Details ==
Series: Class/instance based execbuf plus more
URL : https://patchwork.freedesktop.org/series/33706/
State : failure
== Summary ==
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHK include/generated/bounds.h
CHK include/generated/timeconst.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHK scripts/mod/devicetable-offsets.h
CHK include/generated/compile.h
CHK kernel/config_data.h
CC [M] drivers/gpu/drm/i915/i915_gem_execbuffer.o
drivers/gpu/drm/i915/i915_gem_execbuffer.c: In function ‘eb_select_engine_class_instance’:
drivers/gpu/drm/i915/i915_gem_execbuffer.c:2018:11: error: implicit declaration of function ‘intel_engine_lookup_user’ [-Werror=implicit-function-declaration]
engine = intel_engine_lookup_user(i915, class, instance);
^~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_gem_execbuffer.c:2018:9: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
engine = intel_engine_lookup_user(i915, class, instance);
^
cc1: all warnings being treated as errors
scripts/Makefile.build:314: recipe for target 'drivers/gpu/drm/i915/i915_gem_execbuffer.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_gem_execbuffer.o] Error 1
scripts/Makefile.build:573: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:573: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:573: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1024: recipe for target 'drivers' failed
make: *** [drivers] Error 2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/5] drm/i915: Engine capabilities uAPI
2017-11-13 13:09 ` [RFC 2/5] drm/i915: Engine capabilities uAPI Tvrtko Ursulin
@ 2017-11-13 13:13 ` Chris Wilson
2017-11-13 13:17 ` Chris Wilson
1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-11-13 13:13 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
Quoting Tvrtko Ursulin (2017-11-13 13:09:06)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> To add the knowledge that VCS1 engine does not support HEVC,
> we introduce the concept of engine capabilities. These are
> flags defined in per-engine class space which can be passed
> in during execbuf time. The driver is then able to fail the
> execbuf in case of mismatch between the requested capabilities
> and the selected target engine.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 19 +++++++++++++++----
> drivers/gpu/drm/i915/intel_engine_cs.c | 3 +++
> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
> include/uapi/drm/i915_drm.h | 17 ++++++++++++++++-
> 4 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 2e7ddd197cc4..f7aabea601b0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -56,9 +56,10 @@ enum {
> #define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
> #define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
>
> -#define __EXEC_HAS_RELOC BIT(31)
> -#define __EXEC_VALIDATED BIT(30)
> -#define __EXEC_INTERNAL_FLAGS (~0u << 30)
> +#define __EXEC_HAS_RELOC BIT(63)
You'll have to switch to BIT_ULL() as well.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/5] drm/i915: Engine capabilities uAPI
2017-11-13 13:09 ` [RFC 2/5] drm/i915: Engine capabilities uAPI Tvrtko Ursulin
2017-11-13 13:13 ` Chris Wilson
@ 2017-11-13 13:17 ` Chris Wilson
1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-11-13 13:17 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
Quoting Tvrtko Ursulin (2017-11-13 13:09:06)
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 7f4d60fed46f..c75bd2c9d797 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -233,6 +233,9 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
> engine->irq_shift = info->irq_shift;
> engine->class = info->class;
> engine->instance = info->instance;
> + if (INTEL_GEN(dev_priv) >= 8 && engine->class == VIDEO_DECODE_CLASS &&
> + engine->instance == 0)
> + engine->caps = I915_BSD_CAP_HEVC;
engine->caps = intel_engine_lookup_caps();
We can leave how best to define the caps for the next cap (or third to
be sure we have a feel for the generality), but for now we can at least
hide the ugly elsewhere.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/5] drm/i915: Concurrent context uAPI
2017-11-13 13:09 ` [RFC 3/5] drm/i915: Concurrent context uAPI Tvrtko Ursulin
@ 2017-11-13 13:19 ` Chris Wilson
2017-11-13 13:23 ` Chris Wilson
2017-11-13 13:28 ` Chris Wilson
1 sibling, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-11-13 13:19 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
Quoting Tvrtko Ursulin (2017-11-13 13:09:07)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> When submitting work against existing contexts userspace knows that:
>
> a) context state can be relied upon
> b) submitted batches will complete in order.
The kernel relies on this as well... I wonder how you've rewritten
i915_gem_active...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/5] drm/i915: Concurrent context uAPI
2017-11-13 13:19 ` Chris Wilson
@ 2017-11-13 13:23 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-11-13 13:23 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
Quoting Chris Wilson (2017-11-13 13:19:11)
> Quoting Tvrtko Ursulin (2017-11-13 13:09:07)
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > When submitting work against existing contexts userspace knows that:
> >
> > a) context state can be relied upon
> > b) submitted batches will complete in order.
>
> The kernel relies on this as well... I wonder how you've rewritten
> i915_gem_active...
You have to indicate that you are only relaxing the rules for selecting
an engine in execbuf, the ordering is still fixed after that point.
(It's submitted batches will complete in the order they were submitted
to an engine + object dependencies.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/5] drm/i915: Concurrent context uAPI
2017-11-13 13:09 ` [RFC 3/5] drm/i915: Concurrent context uAPI Tvrtko Ursulin
2017-11-13 13:19 ` Chris Wilson
@ 2017-11-13 13:28 ` Chris Wilson
1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-11-13 13:28 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
Quoting Tvrtko Ursulin (2017-11-13 13:09:07)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> When submitting work against existing contexts userspace knows that:
>
> a) context state can be relied upon
> b) submitted batches will complete in order.
>
> By adding an option to mark its context as "concurrent" we allow userspace
> to notify the kernel that it does not care about both of the above. i915
> is then at liberty to break the above guarantees if and when appropriate.
We want a short paragraph on the merits of ctx vs execbuf, and of the
nature of the different scheduling algorithms that may be enacted. i.e.
the likely direction we will be going in after this, and whether this is
temporary api or the start of grander things.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 5/5] drm/i915: Per batch buffer VCS balancing
2017-11-13 13:09 ` [RFC 5/5] drm/i915: Per batch buffer VCS balancing Tvrtko Ursulin
@ 2017-11-16 0:21 ` Oscar Mateo
2017-11-16 9:57 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Oscar Mateo @ 2017-11-16 0:21 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
On 11/13/2017 05:09 AM, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Building on top of the three previously introduced new pieces of uAPI,
> class/instance based engine selection, engine capabilities and concurrent
> contexts, we can extend the execbuf2 uAPI to allow per batch buffer load
> balancing on the VCS engines.
>
> Legacy execbuf, with contexts mark as concurrent, can now load balance
> between individual batches, instead of only statically per client.
>
> This also means two batches submitted one after another, but both to
> I915_EXEC_BSD, can potentially be running in parallel on VCS0 and VCS1
> respectively.
>
> For legacy (non-concurrent) contexts, behaviour is unchanged. VCS engine
> is assigned to each context at the time of first submission and it
> persists for the context lifetime.
>
> In both cases trivial round-robin approach is used to load balance.
>
> If execbuf requires a a particular engine feature, like for example HEVC
> which is only available on vcs0, it needs to mark it's execbuf calls
> appropriately using the engine capabilities uAPI.
>
> For the class/instance based execbuf we add I915_EXEC_INSTANCE_ANY to
> accomplish the same behaviour. This is only allowed to be used on
> concurrent contexts or an error will be returned.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 7 +--
> drivers/gpu/drm/i915/i915_gem.c | 2 +-
> drivers/gpu/drm/i915/i915_gem_context.c | 2 +
> drivers/gpu/drm/i915/i915_gem_context.h | 2 +
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 80 +++++++++++++++++++++++-------
> include/uapi/drm/i915_drm.h | 6 +++
> 6 files changed, 78 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bf7216742452..9137127fad22 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1562,9 +1562,6 @@ struct i915_gem_mm {
>
> u64 unordered_timeline;
>
> - /* the indicator for dispatch video commands on two BSD rings */
> - atomic_t bsd_engine_dispatch_index;
> -
> /** Bit 6 swizzling required for X tiling */
> uint32_t bit_6_swizzle_x;
> /** Bit 6 swizzling required for Y tiling */
> @@ -2292,6 +2289,10 @@ struct drm_i915_private {
> struct i915_gem_context *preempt_context;
> struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
> [MAX_ENGINE_INSTANCE + 1];
> +
> + /* the indicator for dispatch video commands on two BSD rings */
> + atomic_t vcs_dispatch_index;
> +
> struct i915_vma *semaphore;
>
> struct drm_dma_handle *status_page_dmah;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c5ddae2cc4ef..f39881bbd4ee 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5300,7 +5300,7 @@ i915_gem_load_init(struct drm_i915_private *dev_priv)
> init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
> init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>
> - atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
> + atomic_set(&dev_priv->vcs_dispatch_index, 0);
>
> spin_lock_init(&dev_priv->fb_tracking.lock);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3ea1251924ea..fb7021ee35d5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -275,6 +275,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
> list_add_tail(&ctx->link, &dev_priv->contexts.list);
> ctx->i915 = dev_priv;
> ctx->priority = I915_PRIORITY_NORMAL;
> + atomic_set(&ctx->vcs_dispatch_index,
> + atomic_fetch_xor(1, &dev_priv->vcs_dispatch_index));
>
> INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
> INIT_LIST_HEAD(&ctx->handles_list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 0531c5eedb2a..638c716afbef 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -192,6 +192,8 @@ struct i915_gem_context {
> * context close.
> */
> struct list_head handles_list;
> +
> + atomic_t vcs_dispatch_index;
> };
>
> static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index ecb6290ab6ee..a7bf4fedbdbc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -235,6 +235,7 @@ struct i915_execbuffer {
>
> u64 invalid_flags; /** Set of execobj.flags that are invalid */
> u32 context_flags; /** Set of execobj.flags to insert from the ctx */
> + bool ctx_concurrent;
>
> u32 batch_start_offset; /** Location within object of batch */
> u32 batch_len; /** Length of batch within object */
> @@ -675,6 +676,8 @@ static int eb_select_context(struct i915_execbuffer *eb)
> eb->ctx = ctx;
> eb->vm = ctx->ppgtt ? &ctx->ppgtt->base : &eb->i915->ggtt.base;
>
> + eb->ctx_concurrent = i915_gem_context_is_concurrent(ctx);
> +
> eb->context_flags = 0;
> if (ctx->flags & CONTEXT_NO_ZEROMAP)
> eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
> @@ -1987,26 +1990,63 @@ static int eb_submit(struct i915_execbuffer *eb)
> return 0;
> }
>
> +static unsigned int select_vcs_engine(struct i915_execbuffer *eb, u64 eb_flags)
> +{
> + struct drm_i915_private *i915 = eb->i915;
> + u8 eb_caps = (eb_flags & I915_EXEC_ENGINE_CAP_MASK) >>
> + I915_EXEC_ENGINE_CAP_SHIFT;
> + unsigned int instance;
> +
> + if (!HAS_BSD2(i915))
> + return 0;
> +
> + instance = atomic_fetch_xor(1, &eb->ctx->vcs_dispatch_index);
> +
> + if (instance == 1 &&
> + (eb_caps & i915->engine[_VCS(instance)]->caps) != eb_caps)
> + instance = 0;
> +
> + return instance;
> +}
> +
> +static unsigned int
> +client_static_vcs_choice(struct drm_i915_private *i915,
> + struct drm_i915_file_private *file_priv)
> +{
> + /*
> + * Check whether the file_priv has already selected one engine
> + * and if not select one.
> + */
> + if ((int)file_priv->bsd_engine < 0)
> + file_priv->bsd_engine =
> + atomic_fetch_xor(1, &i915->vcs_dispatch_index);
> +
> + return file_priv->bsd_engine;
> +}
> +
> /**
> * Find one BSD ring to dispatch the corresponding BSD command.
> * The engine index is returned.
> */
> static unsigned int
> -gen8_dispatch_bsd_engine(struct drm_i915_private *dev_priv,
> - struct drm_file *file)
> +eb_select_vcs_engine(struct i915_execbuffer *eb,
> + struct drm_i915_file_private *file_priv,
> + u64 eb_flags)
> {
> - struct drm_i915_file_private *file_priv = file->driver_priv;
> -
> - /* Check whether the file_priv has already selected one ring. */
> - if ((int)file_priv->bsd_engine < 0)
> - file_priv->bsd_engine = atomic_fetch_xor(1,
> - &dev_priv->mm.bsd_engine_dispatch_index);
> -
> - return file_priv->bsd_engine;
> + /*
> + * For concurrent contexts do a round-robin engine assignment
> + * for each batch buffer.
> + */
> + if (eb->ctx_concurrent)
> + return select_vcs_engine(eb, eb_flags);
> + else
> + return client_static_vcs_choice(eb->i915, file_priv);
> }
>
> static struct intel_engine_cs *
> -eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
> +eb_select_engine_class_instance(struct i915_execbuffer *eb,
> + struct drm_i915_file_private *file_priv,
> + u64 eb_flags)
> {
> u8 class = eb_flags & I915_EXEC_RING_MASK;
> u8 instance = (eb_flags & I915_EXEC_INSTANCE_MASK) >>
> @@ -2015,7 +2055,10 @@ eb_select_engine_class_instance(struct drm_i915_private *i915, u64 eb_flags)
> I915_EXEC_ENGINE_CAP_SHIFT;
> struct intel_engine_cs *engine;
>
> - engine = intel_engine_lookup_user(i915, class, instance);
> + if (instance == I915_EXEC_INSTANCE_ANY)
> + instance = eb_select_vcs_engine(eb, file_priv, eb_flags);
At this moment we don't know that this is for the video engine:
if (instance == I915_EXEC_INSTANCE_ANY) {
if (class == I915_EXEC_BSD)
instance = eb_select_vcs_engine(eb, file_priv, eb_flags);
else
instance = 0;
}
> +
> + engine = intel_engine_lookup_user(eb->i915, class, instance);
>
> if (engine && ((caps & engine->caps) != caps))
> return NULL;
> @@ -2034,15 +2077,17 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
> };
>
> static struct intel_engine_cs *
> -eb_select_engine(struct drm_i915_private *dev_priv,
> +eb_select_engine(struct i915_execbuffer *eb,
> struct drm_file *file,
> struct drm_i915_gem_execbuffer2 *args)
> {
> + struct drm_i915_private *dev_priv = eb->i915;
> unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
> struct intel_engine_cs *engine;
>
> if (args->flags & I915_EXEC_CLASS_INSTANCE)
> - return eb_select_engine_class_instance(dev_priv, args->flags);
> + return eb_select_engine_class_instance(eb, file->driver_priv,
> + args->flags);
>
> if (user_ring_id > I915_USER_RINGS) {
> DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
> @@ -2060,7 +2105,8 @@ eb_select_engine(struct drm_i915_private *dev_priv,
> unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
>
> if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
> - bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
> + bsd_idx = eb_select_vcs_engine(eb, file->driver_priv,
> + args->flags);
> } else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
> bsd_idx <= I915_EXEC_BSD_RING2) {
> bsd_idx >>= I915_EXEC_BSD_SHIFT;
> @@ -2283,8 +2329,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> goto err_destroy;
>
> err = -EINVAL;
> - eb.engine = eb_select_engine(eb.i915, file, args);
> - if (!eb.engine)
> + eb.engine = eb_select_engine(&eb, file, args);
> + if (unlikely(!eb.engine))
> goto err_engine;
>
> if (args->flags & I915_EXEC_RESOURCE_STREAMER) {
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index bef377c4b4fc..35255d7802db 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1056,6 +1056,12 @@ struct drm_i915_gem_execbuffer2 {
> #define I915_EXEC_INSTANCE_SHIFT (21)
> #define I915_EXEC_INSTANCE_MASK (0xff << I915_EXEC_INSTANCE_SHIFT)
>
> +/*
> + * Batches sent with instance set to any can be load balanced by the driver
> + * is concurrent context flag is also enabled.
> + */
"if" concurrent context flag
> +#define I915_EXEC_INSTANCE_ANY (0xff)
> +
> /*
> * Inform the kernel of what engine capabilities this batch buffer
> * requires. For example only the first VCS engine has the HEVC block.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 5/5] drm/i915: Per batch buffer VCS balancing
2017-11-16 0:21 ` Oscar Mateo
@ 2017-11-16 9:57 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-11-16 9:57 UTC (permalink / raw)
To: Oscar Mateo, Tvrtko Ursulin, Intel-gfx
Quoting Oscar Mateo (2017-11-16 00:21:40)
>
>
> On 11/13/2017 05:09 AM, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > + if (instance == I915_EXEC_INSTANCE_ANY)
> > + instance = eb_select_vcs_engine(eb, file_priv, eb_flags);
>
> At this moment we don't know that this is for the video engine:
>
> if (instance == I915_EXEC_INSTANCE_ANY) {
> if (class == I915_EXEC_BSD)
> instance = eb_select_vcs_engine(eb, file_priv, eb_flags);
> else
> instance = 0;
> }
Too early for scheduling groups?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-11-16 9:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-13 13:09 [RFC 0/5] Class/instance based execbuf plus more Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 1/5] drm/i915: Select engines via class and instance in execbuffer2 Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 2/5] drm/i915: Engine capabilities uAPI Tvrtko Ursulin
2017-11-13 13:13 ` Chris Wilson
2017-11-13 13:17 ` Chris Wilson
2017-11-13 13:09 ` [RFC 3/5] drm/i915: Concurrent context uAPI Tvrtko Ursulin
2017-11-13 13:19 ` Chris Wilson
2017-11-13 13:23 ` Chris Wilson
2017-11-13 13:28 ` Chris Wilson
2017-11-13 13:09 ` [RFC 4/5] drm/i915: Re-arrange execbuf so context is known before engine Tvrtko Ursulin
2017-11-13 13:09 ` [RFC 5/5] drm/i915: Per batch buffer VCS balancing Tvrtko Ursulin
2017-11-16 0:21 ` Oscar Mateo
2017-11-16 9:57 ` Chris Wilson
2017-11-13 13:13 ` ✗ Fi.CI.BAT: failure for Class/instance based execbuf plus more Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.