* [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked()
@ 2016-08-05 9:14 Chris Wilson
2016-08-05 9:14 ` [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
` (18 more replies)
0 siblings, 19 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
It is useful to be able to wait on pending rendering without grabbing
the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
primitive to acquire a reference to the pending request without
requiring struct_mutex, just the RCU read lock, and then call
i915_wait_request().
v2: Rebase onto new i915_gem_active_get_unlocked() semantics that take
the RCU read lock on behalf of the caller.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_request.h | 40 +++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6002adc43523..15495d1e48e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -569,6 +569,46 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
}
/**
+ * i915_gem_active_wait_unlocked - waits until the request is completed
+ * @active - the active request on which to wait
+ * @interruptible - whether the wait can be woken by a userspace signal
+ * @timeout - how long to wait at most
+ * @rps - userspace client to charge for a waitboost
+ *
+ * i915_gem_active_wait_unlocked() waits until the request is completed before
+ * returning, without requiring any locks to be held. Note that it does not
+ * retire any requests before returning.
+ *
+ * This function relies on RCU in order to acquire the reference to the active
+ * request without holding any locks. See __i915_gem_active_get_rcu() for the
+ * glory details on how that is managed. Once the reference is acquired, we
+ * can then wait upon the request, and afterwards release our reference,
+ * free of any locking.
+ *
+ * This function wraps i915_wait_request(), see it for the full details on
+ * the arguments.
+ *
+ * Returns 0 if successful, or a negative error code.
+ */
+static inline int
+i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
+ bool interruptible,
+ s64 *timeout,
+ struct intel_rps_client *rps)
+{
+ struct drm_i915_gem_request *request;
+ int ret = 0;
+
+ request = i915_gem_active_get_unlocked(active);
+ if (request) {
+ ret = i915_wait_request(request, interruptible, timeout, rps);
+ i915_gem_request_put(request);
+ }
+
+ return ret;
+}
+
+/**
* i915_gem_active_retire - waits until the request is retired
* @active - the active request on which to wait
*
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 03/19] drm/i915: Convert non-blocking userptr " Chris Wilson
` (17 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.
v2: Move i915_gem_fault tracepoint back to the start of the function,
before the unlocked wait.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 114 +++++++++++++++++-----------------------
1 file changed, 48 insertions(+), 66 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aa40d6c35c30..c600f2366d2c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -349,24 +349,20 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
return 0;
}
-/* A nonblocking variant of the above wait. This is a highly dangerous routine
- * as the object state may change during this call.
+/* A nonblocking variant of the above wait. Must be called prior to
+ * acquiring the mutex for the object, as the object state may change
+ * during this call. A reference must be held by the caller for the object.
*/
static __must_check int
-i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
- struct intel_rps_client *rps,
- bool readonly)
+__unsafe_wait_rendering(struct drm_i915_gem_object *obj,
+ struct intel_rps_client *rps,
+ bool readonly)
{
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
struct i915_gem_active *active;
unsigned long active_mask;
- int ret, i, n = 0;
-
- lockdep_assert_held(&dev->struct_mutex);
- GEM_BUG_ON(!to_i915(dev)->mm.interruptible);
+ int idx;
- active_mask = i915_gem_object_get_active(obj);
+ active_mask = __I915_BO_ACTIVE(obj);
if (!active_mask)
return 0;
@@ -377,25 +373,16 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
active = &obj->last_write;
}
- for_each_active(active_mask, i) {
- struct drm_i915_gem_request *req;
+ for_each_active(active_mask, idx) {
+ int ret;
- req = i915_gem_active_get(&active[i],
- &obj->base.dev->struct_mutex);
- if (req)
- requests[n++] = req;
+ ret = i915_gem_active_wait_unlocked(&active[idx],
+ true, NULL, rps);
+ if (ret)
+ return ret;
}
- mutex_unlock(&dev->struct_mutex);
- ret = 0;
- for (i = 0; ret == 0 && i < n; i++)
- ret = i915_wait_request(requests[i], true, NULL, rps);
- mutex_lock(&dev->struct_mutex);
-
- for (i = 0; i < n; i++)
- i915_gem_request_put(requests[i]);
-
- return ret;
+ return 0;
}
static struct intel_rps_client *to_rps_client(struct drm_file *file)
@@ -1467,10 +1454,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
int ret;
/* Only handle setting domains to types used by the CPU. */
- if (write_domain & I915_GEM_GPU_DOMAINS)
- return -EINVAL;
-
- if (read_domains & I915_GEM_GPU_DOMAINS)
+ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
return -EINVAL;
/* Having something in the write domain implies it's in the read
@@ -1479,25 +1463,21 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
if (write_domain != 0 && read_domains != write_domain)
return -EINVAL;
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
-
obj = i915_gem_object_lookup(file, args->handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
+ if (!obj)
+ return -ENOENT;
/* Try to flush the object off the GPU without holding the lock.
* We will repeat the flush holding the lock in the normal manner
* to catch cases where we are gazumped.
*/
- ret = i915_gem_object_wait_rendering__nonblocking(obj,
- to_rps_client(file),
- !write_domain);
+ ret = __unsafe_wait_rendering(obj, to_rps_client(file), !write_domain);
+ if (ret)
+ goto err;
+
+ ret = i915_mutex_lock_interruptible(dev);
if (ret)
- goto unref;
+ goto err;
if (read_domains & I915_GEM_DOMAIN_GTT)
ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
@@ -1507,11 +1487,13 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
if (write_domain != 0)
intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
-unref:
i915_gem_object_put(obj);
-unlock:
mutex_unlock(&dev->struct_mutex);
return ret;
+
+err:
+ i915_gem_object_put_unlocked(obj);
+ return ret;
}
/**
@@ -1648,36 +1630,36 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
struct drm_i915_private *dev_priv = to_i915(dev);
struct i915_ggtt *ggtt = &dev_priv->ggtt;
struct i915_ggtt_view view = i915_ggtt_view_normal;
+ bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
pgoff_t page_offset;
unsigned long pfn;
- int ret = 0;
- bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
-
- intel_runtime_pm_get(dev_priv);
+ int ret;
/* We don't use vmf->pgoff since that has the fake offset */
page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >>
PAGE_SHIFT;
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- goto out;
-
trace_i915_gem_object_fault(obj, page_offset, true, write);
/* Try to flush the object off the GPU first without holding the lock.
- * Upon reacquiring the lock, we will perform our sanity checks and then
+ * Upon acquiring the lock, we will perform our sanity checks and then
* repeat the flush holding the lock in the normal manner to catch cases
* where we are gazumped.
*/
- ret = i915_gem_object_wait_rendering__nonblocking(obj, NULL, !write);
+ ret = __unsafe_wait_rendering(obj, NULL, !write);
if (ret)
- goto unlock;
+ goto err;
+
+ intel_runtime_pm_get(dev_priv);
+
+ ret = i915_mutex_lock_interruptible(dev);
+ if (ret)
+ goto err_rpm;
/* Access to snoopable pages through the GTT is incoherent. */
if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev)) {
ret = -EFAULT;
- goto unlock;
+ goto err_unlock;
}
/* Use a partial view if the object is bigger than the aperture. */
@@ -1698,15 +1680,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
/* Now pin it into the GTT if needed */
ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
if (ret)
- goto unlock;
+ goto err_unlock;
ret = i915_gem_object_set_to_gtt_domain(obj, write);
if (ret)
- goto unpin;
+ goto err_unpin;
ret = i915_gem_object_get_fence(obj);
if (ret)
- goto unpin;
+ goto err_unpin;
/* Finally, remap it using the new GTT offset */
pfn = ggtt->mappable_base +
@@ -1751,11 +1733,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
(unsigned long)vmf->virtual_address,
pfn + page_offset);
}
-unpin:
+err_unpin:
i915_gem_object_ggtt_unpin_view(obj, &view);
-unlock:
+err_unlock:
mutex_unlock(&dev->struct_mutex);
-out:
+err_rpm:
+ intel_runtime_pm_put(dev_priv);
+err:
switch (ret) {
case -EIO:
/*
@@ -1796,8 +1780,6 @@ out:
ret = VM_FAULT_SIGBUS;
break;
}
-
- intel_runtime_pm_put(dev_priv);
return ret;
}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 03/19] drm/i915: Convert non-blocking userptr waits for requests over to using RCU
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-08-05 9:14 ` [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 04/19] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
` (16 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 34 +++++++--------------------------
1 file changed, 7 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 53f64fcc89ef..96ab6161903a 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -63,32 +63,12 @@ struct i915_mmu_object {
static void wait_rendering(struct drm_i915_gem_object *obj)
{
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
- int i, n;
-
- if (!i915_gem_object_is_active(obj))
- return;
-
- n = 0;
- for (i = 0; i < I915_NUM_ENGINES; i++) {
- struct drm_i915_gem_request *req;
+ unsigned long active = __I915_BO_ACTIVE(obj);
+ int idx;
- req = i915_gem_active_get(&obj->last_read[i],
- &obj->base.dev->struct_mutex);
- if (req)
- requests[n++] = req;
- }
-
- mutex_unlock(&dev->struct_mutex);
-
- for (i = 0; i < n; i++)
- i915_wait_request(requests[i], false, NULL, NULL);
-
- mutex_lock(&dev->struct_mutex);
-
- for (i = 0; i < n; i++)
- i915_gem_request_put(requests[i]);
+ for_each_active(active, idx)
+ i915_gem_active_wait_unlocked(&obj->last_read[idx],
+ false, NULL, NULL);
}
static void cancel_userptr(struct work_struct *work)
@@ -97,6 +77,8 @@ static void cancel_userptr(struct work_struct *work)
struct drm_i915_gem_object *obj = mo->obj;
struct drm_device *dev = obj->base.dev;
+ wait_rendering(obj);
+
mutex_lock(&dev->struct_mutex);
/* Cancel any active worker and force us to re-evaluate gup */
obj->userptr.work = NULL;
@@ -105,8 +87,6 @@ static void cancel_userptr(struct work_struct *work)
struct drm_i915_private *dev_priv = to_i915(dev);
bool was_interruptible;
- wait_rendering(obj);
-
was_interruptible = dev_priv->mm.interruptible;
dev_priv->mm.interruptible = false;
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 04/19] drm/i915/userptr: Remove superfluous interruptible=false on waiting
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-08-05 9:14 ` [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
2016-08-05 9:14 ` [CI 03/19] drm/i915: Convert non-blocking userptr " Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 05/19] drm/i915: Remove forced stop ring on suspend/unload Chris Wilson
` (15 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
Inside the kthread context, we can't be interrupted by signals so
touching the mm.interruptible flag is pointless and wait-request now
consumes EIO itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 96ab6161903a..57218cca7e05 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -84,16 +84,9 @@ static void cancel_userptr(struct work_struct *work)
obj->userptr.work = NULL;
if (obj->pages != NULL) {
- struct drm_i915_private *dev_priv = to_i915(dev);
- bool was_interruptible;
-
- was_interruptible = dev_priv->mm.interruptible;
- dev_priv->mm.interruptible = false;
-
+ /* We are inside a kthread context and can't be interrupted */
WARN_ON(i915_gem_object_unbind(obj));
WARN_ON(i915_gem_object_put_pages(obj));
-
- dev_priv->mm.interruptible = was_interruptible;
}
i915_gem_object_put(obj);
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 05/19] drm/i915: Remove forced stop ring on suspend/unload
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (2 preceding siblings ...)
2016-08-05 9:14 ` [CI 04/19] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 06/19] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
` (14 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 -
drivers/gpu/drm/i915/i915_gem.c | 18 ------------------
drivers/gpu/drm/i915/intel_lrc.c | 26 --------------------------
drivers/gpu/drm/i915/intel_ringbuffer.c | 16 ----------------
4 files changed, 61 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index db5dc5bd78d8..abdfb97096e2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2004,7 +2004,6 @@ struct drm_i915_private {
/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
struct {
void (*cleanup_engine)(struct intel_engine_cs *engine);
- void (*stop_engine)(struct intel_engine_cs *engine);
/**
* Is the GPU currently considered idle, or busy executing
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c600f2366d2c..8946c52e09fb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4080,16 +4080,6 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
return NULL;
}
-static void
-i915_gem_stop_engines(struct drm_device *dev)
-{
- struct drm_i915_private *dev_priv = to_i915(dev);
- struct intel_engine_cs *engine;
-
- for_each_engine(engine, dev_priv)
- dev_priv->gt.stop_engine(engine);
-}
-
int
i915_gem_suspend(struct drm_device *dev)
{
@@ -4118,12 +4108,6 @@ i915_gem_suspend(struct drm_device *dev)
i915_gem_retire_requests(dev_priv);
- /* Note that rather than stopping the engines, all we have to do
- * is assert that every RING_HEAD == RING_TAIL (all execution complete)
- * and similar for all logical context images (to ensure they are
- * all ready for hibernation).
- */
- i915_gem_stop_engines(dev);
i915_gem_context_lost(dev_priv);
mutex_unlock(&dev->struct_mutex);
@@ -4308,10 +4292,8 @@ int i915_gem_init(struct drm_device *dev)
if (!i915.enable_execlists) {
dev_priv->gt.cleanup_engine = intel_engine_cleanup;
- dev_priv->gt.stop_engine = intel_engine_stop;
} else {
dev_priv->gt.cleanup_engine = intel_logical_ring_cleanup;
- dev_priv->gt.stop_engine = intel_logical_ring_stop;
}
/* This is just a security blanket to placate dragons.
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a07da548ff49..309c5d9b1c57 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -760,31 +760,6 @@ void intel_execlists_cancel_requests(struct intel_engine_cs *engine)
}
}
-void intel_logical_ring_stop(struct intel_engine_cs *engine)
-{
- struct drm_i915_private *dev_priv = engine->i915;
- int ret;
-
- if (!intel_engine_initialized(engine))
- return;
-
- ret = intel_engine_idle(engine);
- if (ret)
- DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
- engine->name, ret);
-
- /* TODO: Is this correct with Execlists enabled? */
- I915_WRITE_MODE(engine, _MASKED_BIT_ENABLE(STOP_RING));
- if (intel_wait_for_register(dev_priv,
- RING_MI_MODE(engine->mmio_base),
- MODE_IDLE, MODE_IDLE,
- 1000)) {
- DRM_ERROR("%s :timed out trying to stop ring\n", engine->name);
- return;
- }
- I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
-}
-
static int intel_lr_context_pin(struct i915_gem_context *ctx,
struct intel_engine_cs *engine)
{
@@ -1717,7 +1692,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
dev_priv = engine->i915;
if (engine->buffer) {
- intel_logical_ring_stop(engine);
WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a862234ccf18..4593a65cae84 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2203,7 +2203,6 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
dev_priv = engine->i915;
if (engine->buffer) {
- intel_engine_stop(engine);
WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
intel_ring_unpin(engine->buffer);
@@ -2907,18 +2906,3 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
return intel_init_ring_buffer(engine);
}
-
-void intel_engine_stop(struct intel_engine_cs *engine)
-{
- int ret;
-
- if (!intel_engine_initialized(engine))
- return;
-
- ret = intel_engine_idle(engine);
- if (ret)
- DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
- engine->name, ret);
-
- stop_ring(engine);
-}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 06/19] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (3 preceding siblings ...)
2016-08-05 9:14 ` [CI 05/19] drm/i915: Remove forced stop ring on suspend/unload Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 07/19] drm/i915: Simplify do_idling() (Ironlake vt-d w/a) Chris Wilson
` (13 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.
As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).
v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem.c | 31 ++++++++++++++----------------
drivers/gpu/drm/i915/i915_gem_evict.c | 6 +++---
drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
drivers/gpu/drm/i915/i915_gem_request.c | 7 ++++---
drivers/gpu/drm/i915/i915_gem_request.h | 11 +++++++++++
drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 +-
drivers/gpu/drm/i915/i915_irq.c | 9 +--------
drivers/gpu/drm/i915/intel_engine_cs.c | 8 +++++++-
drivers/gpu/drm/i915/intel_pm.c | 12 ++----------
drivers/gpu/drm/i915/intel_ringbuffer.c | 18 -----------------
drivers/gpu/drm/i915/intel_ringbuffer.h | 33 ++++++++++++++++++++------------
13 files changed, 68 insertions(+), 76 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 24d63e271f4b..1faea382dfeb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4925,7 +4925,7 @@ i915_drop_caches_set(void *data, u64 val)
return ret;
if (val & DROP_ACTIVE) {
- ret = i915_gem_wait_for_idle(dev_priv);
+ ret = i915_gem_wait_for_idle(dev_priv, true);
if (ret)
goto unlock;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index abdfb97096e2..6eff31202336 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3233,7 +3233,8 @@ int __must_check i915_gem_init(struct drm_device *dev);
int __must_check i915_gem_init_hw(struct drm_device *dev);
void i915_gem_init_swizzling(struct drm_device *dev);
void i915_gem_cleanup_engines(struct drm_device *dev);
-int __must_check i915_gem_wait_for_idle(struct drm_i915_private *dev_priv);
+int __must_check i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
+ bool interruptible);
int __must_check i915_gem_suspend(struct drm_device *dev);
void i915_gem_resume(struct drm_device *dev);
int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8946c52e09fb..434171a6d586 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2438,13 +2438,18 @@ static void i915_gem_reset_engine_status(struct intel_engine_cs *engine)
static void i915_gem_reset_engine_cleanup(struct intel_engine_cs *engine)
{
+ struct drm_i915_gem_request *request;
struct intel_ring *ring;
+ request = i915_gem_active_peek(&engine->last_request,
+ &engine->i915->drm.struct_mutex);
+
/* Mark all pending requests as complete so that any concurrent
* (lockless) lookup doesn't try and wait upon the request as we
* reset it.
*/
- intel_engine_init_seqno(engine, engine->last_submitted_seqno);
+ if (request)
+ intel_engine_init_seqno(engine, request->fence.seqno);
/*
* Clear the execlists queue up before freeing the requests, as those
@@ -2466,15 +2471,9 @@ static void i915_gem_reset_engine_cleanup(struct intel_engine_cs *engine)
* implicit references on things like e.g. ppgtt address spaces through
* the request.
*/
- if (!list_empty(&engine->request_list)) {
- struct drm_i915_gem_request *request;
-
- request = list_last_entry(&engine->request_list,
- struct drm_i915_gem_request,
- link);
-
+ if (request)
i915_gem_request_retire_upto(request);
- }
+ GEM_BUG_ON(intel_engine_is_active(engine));
/* Having flushed all requests from all queues, we know that all
* ringbuffers must now be empty. However, since we do not reclaim
@@ -2897,18 +2896,17 @@ destroy:
return 0;
}
-int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv)
+int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
+ bool interruptible)
{
struct intel_engine_cs *engine;
int ret;
- lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
for_each_engine(engine, dev_priv) {
if (engine->last_context == NULL)
continue;
- ret = intel_engine_idle(engine);
+ ret = intel_engine_idle(engine, interruptible);
if (ret)
return ret;
}
@@ -4080,11 +4078,10 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
return NULL;
}
-int
-i915_gem_suspend(struct drm_device *dev)
+int i915_gem_suspend(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = to_i915(dev);
- int ret = 0;
+ int ret;
intel_suspend_gt_powersave(dev_priv);
@@ -4102,7 +4099,7 @@ i915_gem_suspend(struct drm_device *dev)
if (ret)
goto err;
- ret = i915_gem_wait_for_idle(dev_priv);
+ ret = i915_gem_wait_for_idle(dev_priv, true);
if (ret)
goto err;
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 7be425826539..f76c06e92677 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -39,7 +39,7 @@ gpu_is_idle(struct drm_i915_private *dev_priv)
struct intel_engine_cs *engine;
for_each_engine(engine, dev_priv) {
- if (!list_empty(&engine->request_list))
+ if (intel_engine_is_active(engine))
return false;
}
@@ -167,7 +167,7 @@ search_again:
if (ret)
return ret;
- ret = i915_gem_wait_for_idle(dev_priv);
+ ret = i915_gem_wait_for_idle(dev_priv, true);
if (ret)
return ret;
@@ -272,7 +272,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
return ret;
}
- ret = i915_gem_wait_for_idle(dev_priv);
+ ret = i915_gem_wait_for_idle(dev_priv, true);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index db97155074d3..c1d79978f409 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2248,7 +2248,7 @@ static bool do_idling(struct drm_i915_private *dev_priv)
if (unlikely(ggtt->do_idle_maps)) {
dev_priv->mm.interruptible = false;
- if (i915_gem_wait_for_idle(dev_priv)) {
+ if (i915_gem_wait_for_idle(dev_priv, false)) {
DRM_ERROR("Failed to wait for idle; VT'd may hang.\n");
/* Wait a bit, in hopes it avoids the hang */
udelay(10);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 3fecb8f0e041..1f91dc8c4171 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -265,7 +265,7 @@ static int i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
/* Carefully retire all requests without writing to the rings */
for_each_engine(engine, dev_priv) {
- ret = intel_engine_idle(engine);
+ ret = intel_engine_idle(engine, true);
if (ret)
return ret;
}
@@ -486,7 +486,8 @@ void __i915_add_request(struct drm_i915_gem_request *request,
*/
request->emitted_jiffies = jiffies;
request->previous_seqno = engine->last_submitted_seqno;
- smp_store_mb(engine->last_submitted_seqno, request->fence.seqno);
+ engine->last_submitted_seqno = request->fence.seqno;
+ i915_gem_active_set(&engine->last_request, request);
list_add_tail(&request->link, &engine->request_list);
list_add_tail(&request->ring_link, &ring->request_list);
@@ -757,7 +758,7 @@ void i915_gem_retire_requests(struct drm_i915_private *dev_priv)
for_each_engine(engine, dev_priv) {
engine_retire_requests(engine);
- if (list_empty(&engine->request_list))
+ if (!intel_engine_is_active(engine))
dev_priv->gt.active_engines &= ~intel_engine_flag(engine);
}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 15495d1e48e8..3496e28785e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -29,6 +29,17 @@
#include "i915_gem.h"
+struct intel_wait {
+ struct rb_node node;
+ struct task_struct *tsk;
+ u32 seqno;
+};
+
+struct intel_signal_node {
+ struct rb_node node;
+ struct intel_wait wait;
+};
+
/**
* Request queue structure.
*
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 1341cb55b6f1..23d70376b104 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -412,7 +412,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
return NOTIFY_DONE;
/* Force everything onto the inactive lists */
- ret = i915_gem_wait_for_idle(dev_priv);
+ ret = i915_gem_wait_for_idle(dev_priv, false);
if (ret)
goto out;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e58650096426..006a855877ad 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2805,13 +2805,6 @@ static void gen8_disable_vblank(struct drm_device *dev, unsigned int pipe)
}
static bool
-ring_idle(struct intel_engine_cs *engine, u32 seqno)
-{
- return i915_seqno_passed(seqno,
- READ_ONCE(engine->last_submitted_seqno));
-}
-
-static bool
ipehr_is_semaphore_wait(struct intel_engine_cs *engine, u32 ipehr)
{
if (INTEL_GEN(engine->i915) >= 8) {
@@ -3131,7 +3124,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
user_interrupts = 0;
if (engine->hangcheck.seqno == seqno) {
- if (ring_idle(engine, seqno)) {
+ if (!intel_engine_is_active(engine)) {
engine->hangcheck.action = HANGCHECK_IDLE;
if (busy) {
/* Safeguard against driver failure */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index f495969f749b..e9b301ae2d0c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -166,6 +166,12 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
}
+static void intel_engine_init_requests(struct intel_engine_cs *engine)
+{
+ init_request_active(&engine->last_request, NULL);
+ INIT_LIST_HEAD(&engine->request_list);
+}
+
/**
* intel_engines_setup_common - setup engine state not requiring hw access
* @engine: Engine to setup.
@@ -177,13 +183,13 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
*/
void intel_engine_setup_common(struct intel_engine_cs *engine)
{
- INIT_LIST_HEAD(&engine->request_list);
INIT_LIST_HEAD(&engine->buffers);
INIT_LIST_HEAD(&engine->execlist_queue);
spin_lock_init(&engine->execlist_lock);
engine->fence_context = fence_context_alloc(1);
+ intel_engine_init_requests(engine);
intel_engine_init_hangcheck(engine);
i915_gem_batch_pool_init(engine, &engine->batch_pool);
}
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6bd352a8f30e..eedcacef7d5c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6328,19 +6328,11 @@ EXPORT_SYMBOL_GPL(i915_gpu_lower);
*/
bool i915_gpu_busy(void)
{
- struct drm_i915_private *dev_priv;
- struct intel_engine_cs *engine;
bool ret = false;
spin_lock_irq(&mchdev_lock);
- if (!i915_mch_dev)
- goto out_unlock;
- dev_priv = i915_mch_dev;
-
- for_each_engine(engine, dev_priv)
- ret |= !list_empty(&engine->request_list);
-
-out_unlock:
+ if (i915_mch_dev)
+ ret = i915_mch_dev->gt.awake;
spin_unlock_irq(&mchdev_lock);
return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4593a65cae84..322274a239e4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2227,24 +2227,6 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
engine->i915 = NULL;
}
-int intel_engine_idle(struct intel_engine_cs *engine)
-{
- struct drm_i915_gem_request *req;
-
- /* Wait upon the last request to be completed */
- if (list_empty(&engine->request_list))
- return 0;
-
- req = list_entry(engine->request_list.prev,
- struct drm_i915_gem_request,
- link);
-
- /* Make sure we do not trigger any retires */
- return i915_wait_request(req,
- req->i915->mm.interruptible,
- NULL, NULL);
-}
-
int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
{
int ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 88952bf10b9d..43e545e44352 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -3,6 +3,7 @@
#include <linux/hashtable.h>
#include "i915_gem_batch_pool.h"
+#include "i915_gem_request.h"
#define I915_CMD_HASH_ORDER 9
@@ -307,6 +308,13 @@ struct intel_engine_cs {
*/
u32 last_submitted_seqno;
+ /* An RCU guarded pointer to the last request. No reference is
+ * held to the request, users must carefully acquire a reference to
+ * the request using i915_gem_active_get_request_rcu(), or hold the
+ * struct_mutex.
+ */
+ struct i915_gem_active last_request;
+
struct i915_gem_context *last_context;
struct intel_engine_hangcheck hangcheck;
@@ -465,7 +473,6 @@ static inline u32 intel_ring_offset(struct intel_ring *ring, u32 value)
int __intel_ring_space(int head, int tail, int size);
void intel_ring_update_space(struct intel_ring *ring);
-int __must_check intel_engine_idle(struct intel_engine_cs *engine);
void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno);
int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
@@ -475,6 +482,14 @@ void intel_engine_setup_common(struct intel_engine_cs *engine);
int intel_engine_init_common(struct intel_engine_cs *engine);
void intel_engine_cleanup_common(struct intel_engine_cs *engine);
+static inline int intel_engine_idle(struct intel_engine_cs *engine,
+ bool interruptible)
+{
+ /* Wait upon the last request to be completed */
+ return i915_gem_active_wait_unlocked(&engine->last_request,
+ interruptible, NULL, NULL);
+}
+
int intel_init_render_ring_buffer(struct intel_engine_cs *engine);
int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
int intel_init_bsd2_ring_buffer(struct intel_engine_cs *engine);
@@ -504,17 +519,6 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
}
/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-struct intel_wait {
- struct rb_node node;
- struct task_struct *tsk;
- u32 seqno;
-};
-
-struct intel_signal_node {
- struct rb_node node;
- struct intel_wait wait;
-};
-
int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
@@ -561,4 +565,9 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
unsigned int intel_kick_waiters(struct drm_i915_private *i915);
unsigned int intel_kick_signalers(struct drm_i915_private *i915);
+static inline bool intel_engine_is_active(struct intel_engine_cs *engine)
+{
+ return i915_gem_active_isset(&engine->last_request);
+}
+
#endif /* _INTEL_RINGBUFFER_H_ */
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 07/19] drm/i915: Simplify do_idling() (Ironlake vt-d w/a)
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (4 preceding siblings ...)
2016-08-05 9:14 ` [CI 06/19] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 08/19] drm/i915/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
` (12 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
Now that we pass along the expected interruptible nature for the
wait-for-idle, we do not need to modify the global
i915->mm.interruptible for this single call. (Only the immediate call to
i915_gem_wait_for_idle() takes the interruptible status as the other
action, dma_map_sg(), is independent of i915.ko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 37 ++++++++-----------------------------
1 file changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c1d79978f409..8b4f2f35019c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2241,31 +2241,6 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv)
return false;
}
-static bool do_idling(struct drm_i915_private *dev_priv)
-{
- struct i915_ggtt *ggtt = &dev_priv->ggtt;
- bool ret = dev_priv->mm.interruptible;
-
- if (unlikely(ggtt->do_idle_maps)) {
- dev_priv->mm.interruptible = false;
- if (i915_gem_wait_for_idle(dev_priv, false)) {
- DRM_ERROR("Failed to wait for idle; VT'd may hang.\n");
- /* Wait a bit, in hopes it avoids the hang */
- udelay(10);
- }
- }
-
- return ret;
-}
-
-static void undo_idling(struct drm_i915_private *dev_priv, bool interruptible)
-{
- struct i915_ggtt *ggtt = &dev_priv->ggtt;
-
- if (unlikely(ggtt->do_idle_maps))
- dev_priv->mm.interruptible = interruptible;
-}
-
void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
{
struct intel_engine_cs *engine;
@@ -2713,14 +2688,18 @@ void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj)
{
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = to_i915(dev);
- bool interruptible;
+ struct i915_ggtt *ggtt = &dev_priv->ggtt;
- interruptible = do_idling(dev_priv);
+ if (unlikely(ggtt->do_idle_maps)) {
+ if (i915_gem_wait_for_idle(dev_priv, false)) {
+ DRM_ERROR("Failed to wait for idle; VT'd may hang.\n");
+ /* Wait a bit, in hopes it avoids the hang */
+ udelay(10);
+ }
+ }
dma_unmap_sg(&dev->pdev->dev, obj->pages->sgl, obj->pages->nents,
PCI_DMA_BIDIRECTIONAL);
-
- undo_idling(dev_priv, interruptible);
}
static void i915_gtt_color_adjust(struct drm_mm_node *node,
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 08/19] drm/i915/shrinker: Wait before acquiring struct_mutex under oom
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (5 preceding siblings ...)
2016-08-05 9:14 ` [CI 07/19] drm/i915: Simplify do_idling() (Ironlake vt-d w/a) Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 09/19] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
` (11 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
We can now wait for the GPU (all engines) to become idle without
requiring the struct_mutex. Inside the shrinker, we need to currently
take the struct_mutex in order to purge objects and to purge the objects
we need the GPU to be idle - causing a stall whilst we hold the
struct_mutex. We can hide most of that stall by performing the wait
before taking the struct_mutex and only doing essential waits for
new rendering on objects to be freed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_shrinker.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 23d70376b104..9b92b6470ccc 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -323,17 +323,22 @@ i915_gem_shrinker_lock_uninterruptible(struct drm_i915_private *dev_priv,
struct shrinker_lock_uninterruptible *slu,
int timeout_ms)
{
- unsigned long timeout = msecs_to_jiffies(timeout_ms) + 1;
+ unsigned long timeout = jiffies + msecs_to_jiffies_timeout(timeout_ms);
+
+ do {
+ if (i915_gem_wait_for_idle(dev_priv, false) == 0 &&
+ i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock))
+ break;
- while (!i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock)) {
schedule_timeout_killable(1);
if (fatal_signal_pending(current))
return false;
- if (--timeout == 0) {
+
+ if (time_after(jiffies, timeout)) {
pr_err("Unable to lock GPU to purge memory.\n");
return false;
}
- }
+ } while (1);
slu->was_interruptible = dev_priv->mm.interruptible;
dev_priv->mm.interruptible = false;
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 09/19] drm/i915: Tidy generation of the GTT mmap offset
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (6 preceding siblings ...)
2016-08-05 9:14 ` [CI 08/19] drm/i915/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 10/19] drm/i915: Remove unused no-shrinker-steal Chris Wilson
` (10 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.
One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).
v2: s/ret/err/ and use if (!err) rather than if (ret == 0)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 69 +++++++++++++----------------------------
1 file changed, 22 insertions(+), 47 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 434171a6d586..cf988ad75331 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1898,36 +1898,28 @@ u64 i915_gem_get_ggtt_alignment(struct drm_i915_private *dev_priv, u64 size,
static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
- int ret;
-
- dev_priv->mm.shrinker_no_lock_stealing = true;
+ int err;
- ret = drm_gem_create_mmap_offset(&obj->base);
- if (ret != -ENOSPC)
- goto out;
+ err = drm_gem_create_mmap_offset(&obj->base);
+ if (!err)
+ return 0;
- /* Badly fragmented mmap space? The only way we can recover
- * space is by destroying unwanted objects. We can't randomly release
- * mmap_offsets as userspace expects them to be persistent for the
- * lifetime of the objects. The closest we can is to release the
- * offsets on purgeable objects by truncating it and marking it purged,
- * which prevents userspace from ever using that object again.
+ /* We can idle the GPU locklessly to flush stale objects, but in order
+ * to claim that space for ourselves, we need to take the big
+ * struct_mutex to free the requests+objects and allocate our slot.
*/
- i915_gem_shrink(dev_priv,
- obj->base.size >> PAGE_SHIFT,
- I915_SHRINK_BOUND |
- I915_SHRINK_UNBOUND |
- I915_SHRINK_PURGEABLE);
- ret = drm_gem_create_mmap_offset(&obj->base);
- if (ret != -ENOSPC)
- goto out;
+ err = i915_gem_wait_for_idle(dev_priv, true);
+ if (err)
+ return err;
- i915_gem_shrink_all(dev_priv);
- ret = drm_gem_create_mmap_offset(&obj->base);
-out:
- dev_priv->mm.shrinker_no_lock_stealing = false;
+ err = i915_mutex_lock_interruptible(&dev_priv->drm);
+ if (!err) {
+ i915_gem_retire_requests(dev_priv);
+ err = drm_gem_create_mmap_offset(&obj->base);
+ mutex_unlock(&dev_priv->drm.struct_mutex);
+ }
- return ret;
+ return err;
}
static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
@@ -1944,32 +1936,15 @@ i915_gem_mmap_gtt(struct drm_file *file,
struct drm_i915_gem_object *obj;
int ret;
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
-
obj = i915_gem_object_lookup(file, handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
-
- if (obj->madv != I915_MADV_WILLNEED) {
- DRM_DEBUG("Attempting to mmap a purgeable buffer\n");
- ret = -EFAULT;
- goto out;
- }
+ if (!obj)
+ return -ENOENT;
ret = i915_gem_object_create_mmap_offset(obj);
- if (ret)
- goto out;
+ if (ret == 0)
+ *offset = drm_vma_node_offset_addr(&obj->base.vma_node);
- *offset = drm_vma_node_offset_addr(&obj->base.vma_node);
-
-out:
- i915_gem_object_put(obj);
-unlock:
- mutex_unlock(&dev->struct_mutex);
+ i915_gem_object_put_unlocked(obj);
return ret;
}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 10/19] drm/i915: Remove unused no-shrinker-steal
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (7 preceding siblings ...)
2016-08-05 9:14 ` [CI 09/19] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 11/19] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
` (9 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
After removing the user of this wart, we can remove the wart entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 -
drivers/gpu/drm/i915/i915_gem_shrinker.c | 3 ---
2 files changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6eff31202336..31a614fe9ed7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1318,7 +1318,6 @@ struct i915_gem_mm {
struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
struct shrinker shrinker;
- bool shrinker_no_lock_stealing;
/** LRU list of objects with fence regs on them. */
struct list_head fence_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 9b92b6470ccc..b80802b35353 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -244,9 +244,6 @@ static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
if (!mutex_is_locked_by(&dev->struct_mutex, current))
return false;
- if (to_i915(dev)->mm.shrinker_no_lock_stealing)
- return false;
-
*unlock = false;
} else
*unlock = true;
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 11/19] drm/i915: Do a nonblocking wait first in pread/pwrite
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (8 preceding siblings ...)
2016-08-05 9:14 ` [CI 10/19] drm/i915: Remove unused no-shrinker-steal Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 12/19] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
` (8 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upon
completion, we will acquire the mutex and only then start the operation
(i.e. we do not rely on state from before the initial wait).
v2: Repaint the goto labels
v3: Move the tracepoints back to the start of the ioctls
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 60 ++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cf988ad75331..1abaa0262f92 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -956,25 +956,27 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
args->size))
return -EFAULT;
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
-
obj = i915_gem_object_lookup(file, args->handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
+ if (!obj)
+ return -ENOENT;
/* Bounds check source. */
if (args->offset > obj->base.size ||
args->size > obj->base.size - args->offset) {
ret = -EINVAL;
- goto out;
+ goto err;
}
trace_i915_gem_object_pread(obj, args->offset, args->size);
+ ret = __unsafe_wait_rendering(obj, to_rps_client(file), true);
+ if (ret)
+ goto err;
+
+ ret = i915_mutex_lock_interruptible(dev);
+ if (ret)
+ goto err;
+
ret = i915_gem_shmem_pread(dev, obj, args, file);
/* pread for non shmem backed objects */
@@ -985,10 +987,13 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
intel_runtime_pm_put(to_i915(dev));
}
-out:
i915_gem_object_put(obj);
-unlock:
mutex_unlock(&dev->struct_mutex);
+
+ return ret;
+
+err:
+ i915_gem_object_put_unlocked(obj);
return ret;
}
@@ -1374,27 +1379,29 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
return -EFAULT;
}
- intel_runtime_pm_get(dev_priv);
-
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- goto put_rpm;
-
obj = i915_gem_object_lookup(file, args->handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
+ if (!obj)
+ return -ENOENT;
/* Bounds check destination. */
if (args->offset > obj->base.size ||
args->size > obj->base.size - args->offset) {
ret = -EINVAL;
- goto out;
+ goto err;
}
trace_i915_gem_object_pwrite(obj, args->offset, args->size);
+ ret = __unsafe_wait_rendering(obj, to_rps_client(file), false);
+ if (ret)
+ goto err;
+
+ intel_runtime_pm_get(dev_priv);
+
+ ret = i915_mutex_lock_interruptible(dev);
+ if (ret)
+ goto err_rpm;
+
ret = -EFAULT;
/* We can only do the GTT pwrite on untiled buffers, as otherwise
* it would end up going through the fenced access, and we'll get
@@ -1419,14 +1426,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
ret = -ENODEV;
}
-out:
i915_gem_object_put(obj);
-unlock:
mutex_unlock(&dev->struct_mutex);
-put_rpm:
intel_runtime_pm_put(dev_priv);
return ret;
+
+err_rpm:
+ intel_runtime_pm_put(dev_priv);
+err:
+ i915_gem_object_put_unlocked(obj);
+ return ret;
}
static enum fb_op_origin
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 12/19] drm/i915: Remove (struct_mutex) locking for wait-ioctl
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (9 preceding siblings ...)
2016-08-05 9:14 ` [CI 11/19] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
` (7 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().
The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention
point!
v2: Break up a long line.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 43 ++++++++++++-----------------------------
1 file changed, 12 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1abaa0262f92..ceb00970b2da 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2622,47 +2622,28 @@ int
i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
{
struct drm_i915_gem_wait *args = data;
+ struct intel_rps_client *rps = to_rps_client(file);
struct drm_i915_gem_object *obj;
- struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
- int i, n = 0;
- int ret;
+ unsigned long active;
+ int idx, ret = 0;
if (args->flags != 0)
return -EINVAL;
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
-
obj = i915_gem_object_lookup(file, args->bo_handle);
- if (!obj) {
- mutex_unlock(&dev->struct_mutex);
+ if (!obj)
return -ENOENT;
- }
-
- if (!i915_gem_object_is_active(obj))
- goto out;
- for (i = 0; i < I915_NUM_ENGINES; i++) {
- struct drm_i915_gem_request *req;
-
- req = i915_gem_active_get(&obj->last_read[i],
- &obj->base.dev->struct_mutex);
- if (req)
- requests[n++] = req;
+ active = __I915_BO_ACTIVE(obj);
+ for_each_active(active, idx) {
+ s64 *timeout = args->timeout_ns >= 0 ? &args->timeout_ns : NULL;
+ ret = i915_gem_active_wait_unlocked(&obj->last_read[idx], true,
+ timeout, rps);
+ if (ret)
+ break;
}
-out:
- i915_gem_object_put(obj);
- mutex_unlock(&dev->struct_mutex);
-
- for (i = 0; i < n; i++) {
- if (ret == 0)
- ret = i915_wait_request(requests[i], true,
- args->timeout_ns > 0 ? &args->timeout_ns : NULL,
- to_rps_client(file));
- i915_gem_request_put(requests[i]);
- }
+ i915_gem_object_put_unlocked(obj);
return ret;
}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (10 preceding siblings ...)
2016-08-05 9:14 ` [CI 12/19] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 19:08 ` Daniel Vetter
2016-08-05 9:14 ` [CI 14/19] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
` (6 subsequent siblings)
18 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.
Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
1 file changed, 101 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ceb00970b2da..b99d64bfb7eb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
}
+static __always_inline unsigned __busy_read_flag(unsigned int id)
+{
+ /* Note that we could alias engines in the execbuf API, but
+ * that would be very unwise as it prevents userspace from
+ * fine control over engine selection. Ahem.
+ *
+ * This should be something like EXEC_MAX_ENGINE instead of
+ * I915_NUM_ENGINES.
+ */
+ BUILD_BUG_ON(I915_NUM_ENGINES > 16);
+ return 0x10000 << id;
+}
+
+static __always_inline unsigned int __busy_write_id(unsigned int id)
+{
+ return id;
+}
+
+static __always_inline unsigned
+__busy_set_if_active(const struct i915_gem_active *active,
+ unsigned int (*flag)(unsigned int id))
+{
+ /* For more discussion about the barriers and locking concerns,
+ * see __i915_gem_active_get_rcu().
+ */
+ do {
+ struct drm_i915_gem_request *request;
+ unsigned int id;
+
+ request = rcu_dereference(active->request);
+ if (!request || i915_gem_request_completed(request))
+ return 0;
+
+ id = request->engine->exec_id;
+
+ /* Check that the pointer wasn't reassigned and overwritten. */
+ if (request == rcu_access_pointer(active->request))
+ return flag(id);
+ } while (1);
+}
+
+static inline unsigned
+busy_check_reader(const struct i915_gem_active *active)
+{
+ return __busy_set_if_active(active, __busy_read_flag);
+}
+
+static inline unsigned
+busy_check_writer(const struct i915_gem_active *active)
+{
+ return __busy_set_if_active(active, __busy_write_id);
+}
+
int
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
{
struct drm_i915_gem_busy *args = data;
struct drm_i915_gem_object *obj;
- int ret;
-
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
+ unsigned long active;
obj = i915_gem_object_lookup(file, args->handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
+ if (!obj)
+ return -ENOENT;
- /* Count all active objects as busy, even if they are currently not used
- * by the gpu. Users of this interface expect objects to eventually
- * become non-busy without any further actions.
- */
args->busy = 0;
- if (i915_gem_object_is_active(obj)) {
- struct drm_i915_gem_request *req;
- int i;
+ active = __I915_BO_ACTIVE(obj);
+ if (active) {
+ int idx;
- for (i = 0; i < I915_NUM_ENGINES; i++) {
- req = i915_gem_active_peek(&obj->last_read[i],
- &obj->base.dev->struct_mutex);
- if (req)
- args->busy |= 1 << (16 + req->engine->exec_id);
- }
- req = i915_gem_active_peek(&obj->last_write,
- &obj->base.dev->struct_mutex);
- if (req)
- args->busy |= req->engine->exec_id;
+ /* Yes, the lookups are intentionally racy.
+ *
+ * First, we cannot simply rely on __I915_BO_ACTIVE. We have
+ * to regard the value as stale and as our ABI guarantees
+ * forward progress, we confirm the status of each active
+ * request with the hardware.
+ *
+ * Even though we guard the pointer lookup by RCU, that only
+ * guarantees that the pointer and its contents remain
+ * dereferencable and does *not* mean that the request we
+ * have is the same as the one being tracked by the object.
+ *
+ * Consider that we lookup the request just as it is being
+ * retired and freed. We take a local copy of the pointer,
+ * but before we add its engine into the busy set, the other
+ * thread reallocates it and assigns it to a task on another
+ * engine with a fresh and incomplete seqno.
+ *
+ * So after we lookup the engine's id, we double check that
+ * the active request is the same and only then do we add it
+ * into the busy set.
+ */
+ rcu_read_lock();
+
+ for_each_active(active, idx)
+ args->busy |= busy_check_reader(&obj->last_read[idx]);
+
+ /* For ABI sanity, we only care that the write engine is in
+ * the set of read engines. This is ensured by the ordering
+ * of setting last_read/last_write in i915_vma_move_to_active,
+ * and then in reverse in retire.
+ *
+ * We don't care that the set of active read/write engines
+ * may change during construction of the result, as it is
+ * equally liable to change before userspace can inspect
+ * the result.
+ */
+ args->busy |= busy_check_writer(&obj->last_write);
+
+ rcu_read_unlock();
}
- i915_gem_object_put(obj);
-unlock:
- mutex_unlock(&dev->struct_mutex);
- return ret;
+ i915_gem_object_put_unlocked(obj);
+ return 0;
}
int
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 14/19] drm/i915: Reduce locking inside swfinish ioctl
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (11 preceding siblings ...)
2016-08-05 9:14 ` [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 15/19] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
` (5 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)
v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore
v4: Don't be creative with goto.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b99d64bfb7eb..063c56f5d6f8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1518,26 +1518,23 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
{
struct drm_i915_gem_sw_finish *args = data;
struct drm_i915_gem_object *obj;
- int ret = 0;
-
- ret = i915_mutex_lock_interruptible(dev);
- if (ret)
- return ret;
+ int err = 0;
obj = i915_gem_object_lookup(file, args->handle);
- if (!obj) {
- ret = -ENOENT;
- goto unlock;
- }
+ if (!obj)
+ return -ENOENT;
/* Pinned buffers may be scanout, so flush the cache */
- if (obj->pin_display)
- i915_gem_object_flush_cpu_write_domain(obj);
+ if (READ_ONCE(obj->pin_display)) {
+ err = i915_mutex_lock_interruptible(dev);
+ if (!err) {
+ i915_gem_object_flush_cpu_write_domain(obj);
+ mutex_unlock(&dev->struct_mutex);
+ }
+ }
- i915_gem_object_put(obj);
-unlock:
- mutex_unlock(&dev->struct_mutex);
- return ret;
+ i915_gem_object_put_unlocked(obj);
+ return err;
}
/**
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 15/19] drm/i915: Remove pinned check from madvise ioctl
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (12 preceding siblings ...)
2016-08-05 9:14 ` [CI 14/19] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 19:10 ` Daniel Vetter
2016-08-05 9:14 ` [CI 16/19] drm/i915: Remove locking for get_tiling Chris Wilson
` (4 subsequent siblings)
18 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.
Still despite it being an overzealous check, that error code is part of
the current ABI and so we must proceed with caution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 063c56f5d6f8..7e6b7853f9d7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3883,11 +3883,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
goto unlock;
}
- if (i915_gem_obj_is_pinned(obj)) {
- ret = -EINVAL;
- goto out;
- }
-
if (obj->pages &&
obj->tiling_mode != I915_TILING_NONE &&
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -3906,7 +3901,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
args->retained = obj->madv != __I915_MADV_PURGED;
-out:
i915_gem_object_put(obj);
unlock:
mutex_unlock(&dev->struct_mutex);
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 16/19] drm/i915: Remove locking for get_tiling
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (13 preceding siblings ...)
2016-08-05 9:14 ` [CI 15/19] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 17/19] drm/i915: Document and reject invalid tiling modes Chris Wilson
` (3 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.
get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.
v2: Make tiling_mode a full unsigned int so that we can trivially use it
with READ_ONCE(). Separating it out into manual control over the flags
field was too noisy for a simple patch. Note that we could use the lower
bits of obj->stride for the tiling mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 15 ++++++++-------
drivers/gpu/drm/i915/i915_gem_tiling.c | 10 +++-------
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 31a614fe9ed7..f18d8761305c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2183,10 +2183,6 @@ struct drm_i915_gem_object {
unsigned int madv:2;
/**
- * Current tiling mode for the object.
- */
- unsigned int tiling_mode:2;
- /**
* Whether the tiling parameters for the currently associated fence
* register have changed. Note that for the purposes of tracking
* tiling changes we also treat the unfenced register, the register
@@ -2218,6 +2214,14 @@ struct drm_i915_gem_object {
atomic_t frontbuffer_bits;
+ /**
+ * Current tiling mode for the object.
+ */
+ unsigned int tiling_mode;
+
+ /** Current tiling stride for the object, if it's tiled. */
+ uint32_t stride;
+
unsigned int has_wc_mmap;
/** Count of VMA actually bound by this object */
unsigned int bind_count;
@@ -2245,9 +2249,6 @@ struct drm_i915_gem_object {
struct i915_gem_active last_write;
struct i915_gem_active last_fence;
- /** Current tiling stride for the object, if it's tiled. */
- uint32_t stride;
-
/** References from framebuffers, locks out tiling changes. */
unsigned long framebuffer_references;
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index b7f9875f69b4..c0e01333bddf 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -303,10 +303,8 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
if (!obj)
return -ENOENT;
- mutex_lock(&dev->struct_mutex);
-
- args->tiling_mode = obj->tiling_mode;
- switch (obj->tiling_mode) {
+ args->tiling_mode = READ_ONCE(obj->tiling_mode);
+ switch (args->tiling_mode) {
case I915_TILING_X:
args->swizzle_mode = dev_priv->mm.bit_6_swizzle_x;
break;
@@ -330,8 +328,6 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
if (args->swizzle_mode == I915_BIT_6_SWIZZLE_9_10_17)
args->swizzle_mode = I915_BIT_6_SWIZZLE_9_10;
- i915_gem_object_put(obj);
- mutex_unlock(&dev->struct_mutex);
-
+ i915_gem_object_put_unlocked(obj);
return 0;
}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 17/19] drm/i915: Document and reject invalid tiling modes
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (14 preceding siblings ...)
2016-08-05 9:14 ` [CI 16/19] drm/i915: Remove locking for get_tiling Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 18/19] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
` (2 subsequent siblings)
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
Through the GTT interface to the fence registers, we can only handle
linear, X and Y tiling. The more esoteric tiling patterns are ignored.
Document that the tiling ABI only supports upto Y tiling, and reject any
attempts to set a tiling mode other than NONE, X or Y.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_tiling.c | 3 +++
include/uapi/drm/i915_drm.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index c0e01333bddf..6817f69947d9 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -68,6 +68,9 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
if (tiling_mode == I915_TILING_NONE)
return true;
+ if (tiling_mode > I915_TILING_LAST)
+ return false;
+
if (IS_GEN2(dev) ||
(tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev)))
tile_width = 128;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0f292733cffc..452629de7a57 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -926,6 +926,7 @@ struct drm_i915_gem_caching {
#define I915_TILING_NONE 0
#define I915_TILING_X 1
#define I915_TILING_Y 2
+#define I915_TILING_LAST I915_TILING_Y
#define I915_BIT_6_SWIZZLE_NONE 0
#define I915_BIT_6_SWIZZLE_9 1
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 18/19] drm/i915: Repack fence tiling mode and stride into a single integer
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (15 preceding siblings ...)
2016-08-05 9:14 ` [CI 17/19] drm/i915: Document and reject invalid tiling modes Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:14 ` [CI 19/19] drm/i915: Assert that the request hasn't been retired Chris Wilson
2016-08-05 9:39 ` ✗ Ro.CI.BAT: failure for series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
v2: New magic
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu/drm/i915/i915_drv.h | 30 +++++++++++++++++------
drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++-------
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
drivers/gpu/drm/i915/i915_gem_fence.c | 39 ++++++++++++++++++------------
drivers/gpu/drm/i915/i915_gem_tiling.c | 19 +++++++++------
drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
drivers/gpu/drm/i915/intel_display.c | 34 +++++++++++++-------------
drivers/gpu/drm/i915/intel_fbc.c | 2 +-
drivers/gpu/drm/i915/intel_overlay.c | 2 +-
drivers/gpu/drm/i915/intel_pm.c | 2 +-
drivers/gpu/drm/i915/intel_sprite.c | 12 ++++-----
12 files changed, 98 insertions(+), 68 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1faea382dfeb..0620a84d00ca 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -101,7 +101,7 @@ static char get_pin_flag(struct drm_i915_gem_object *obj)
static char get_tiling_flag(struct drm_i915_gem_object *obj)
{
- switch (obj->tiling_mode) {
+ switch (i915_gem_object_get_tiling(obj)) {
default:
case I915_TILING_NONE: return ' ';
case I915_TILING_X: return 'X';
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f18d8761305c..feec00f769e1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2214,13 +2214,11 @@ struct drm_i915_gem_object {
atomic_t frontbuffer_bits;
- /**
- * Current tiling mode for the object.
- */
- unsigned int tiling_mode;
-
/** Current tiling stride for the object, if it's tiled. */
- uint32_t stride;
+ unsigned int tiling_and_stride;
+#define FENCE_MINIMUM_STRIDE 128 /* See i915_tiling_ok() */
+#define TILING_MASK (FENCE_MINIMUM_STRIDE-1)
+#define STRIDE_MASK (~TILING_MASK)
unsigned int has_wc_mmap;
/** Count of VMA actually bound by this object */
@@ -2359,6 +2357,24 @@ i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
}
+static inline unsigned int
+i915_gem_object_get_tiling(struct drm_i915_gem_object *obj)
+{
+ return obj->tiling_and_stride & TILING_MASK;
+}
+
+static inline bool
+i915_gem_object_is_tiled(struct drm_i915_gem_object *obj)
+{
+ return i915_gem_object_get_tiling(obj) != I915_TILING_NONE;
+}
+
+static inline unsigned int
+i915_gem_object_get_stride(struct drm_i915_gem_object *obj)
+{
+ return obj->tiling_and_stride & STRIDE_MASK;
+}
+
/*
* Optimised SGL iterator for GEM objects
*/
@@ -3457,7 +3473,7 @@ static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_objec
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
return dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
- obj->tiling_mode != I915_TILING_NONE;
+ i915_gem_object_is_tiled(obj);
}
/* i915_debugfs.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e6b7853f9d7..f4f8eaa90f2a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1042,7 +1042,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
int ret;
bool hit_slow_path = false;
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
return -EFAULT;
ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -1671,7 +1671,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
/* Use a partial view if the object is bigger than the aperture. */
if (obj->base.size >= ggtt->mappable_end &&
- obj->tiling_mode == I915_TILING_NONE) {
+ !i915_gem_object_is_tiled(obj)) {
static const unsigned int chunk_size = 256; // 1 MiB
memset(&view, 0, sizeof(view));
@@ -2189,7 +2189,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
if (i915_gem_object_needs_bit17_swizzle(obj))
i915_gem_object_do_bit_17_swizzle(obj);
- if (obj->tiling_mode != I915_TILING_NONE &&
+ if (i915_gem_object_is_tiled(obj) &&
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
i915_gem_object_pin_pages(obj);
@@ -2938,10 +2938,12 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
size = max(size, vma->size);
if (flags & PIN_MAPPABLE)
- size = i915_gem_get_ggtt_size(dev_priv, size, obj->tiling_mode);
+ size = i915_gem_get_ggtt_size(dev_priv, size,
+ i915_gem_object_get_tiling(obj));
min_alignment =
- i915_gem_get_ggtt_alignment(dev_priv, size, obj->tiling_mode,
+ i915_gem_get_ggtt_alignment(dev_priv, size,
+ i915_gem_object_get_tiling(obj),
flags & PIN_MAPPABLE);
if (alignment == 0)
alignment = min_alignment;
@@ -3637,10 +3639,10 @@ void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
fence_size = i915_gem_get_ggtt_size(dev_priv,
obj->base.size,
- obj->tiling_mode);
+ i915_gem_object_get_tiling(obj));
fence_alignment = i915_gem_get_ggtt_alignment(dev_priv,
obj->base.size,
- obj->tiling_mode,
+ i915_gem_object_get_tiling(obj),
true);
fenceable = (vma->node.size == fence_size &&
@@ -3884,7 +3886,7 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
}
if (obj->pages &&
- obj->tiling_mode != I915_TILING_NONE &&
+ i915_gem_object_is_tiled(obj) &&
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
if (obj->madv == I915_MADV_WILLNEED)
i915_gem_object_unpin_pages(obj);
@@ -4054,7 +4056,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
if (obj->pages && obj->madv == I915_MADV_WILLNEED &&
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
- obj->tiling_mode != I915_TILING_NONE)
+ i915_gem_object_is_tiled(obj))
i915_gem_object_unpin_pages(obj);
if (WARN_ON(obj->pages_pin_count))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 71834741bd87..c494b79ded20 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -803,7 +803,7 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
need_fence =
entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
- obj->tiling_mode != I915_TILING_NONE;
+ i915_gem_object_is_tiled(obj);
need_mappable = need_fence || need_reloc_mappable(vma);
if (entry->flags & EXEC_OBJECT_PINNED)
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 3b462da612ca..9e8173fe2a09 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -86,20 +86,22 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
if (obj) {
u32 size = i915_gem_obj_ggtt_size(obj);
+ unsigned int tiling = i915_gem_object_get_tiling(obj);
+ unsigned int stride = i915_gem_object_get_stride(obj);
uint64_t val;
/* Adjust fence size to match tiled area */
- if (obj->tiling_mode != I915_TILING_NONE) {
- uint32_t row_size = obj->stride *
- (obj->tiling_mode == I915_TILING_Y ? 32 : 8);
+ if (tiling != I915_TILING_NONE) {
+ uint32_t row_size = stride *
+ (tiling == I915_TILING_Y ? 32 : 8);
size = (size / row_size) * row_size;
}
val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
0xfffff000) << 32;
val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
- val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
- if (obj->tiling_mode == I915_TILING_Y)
+ val |= (uint64_t)((stride / 128) - 1) << fence_pitch_shift;
+ if (tiling == I915_TILING_Y)
val |= 1 << I965_FENCE_TILING_Y_SHIFT;
val |= I965_FENCE_REG_VALID;
@@ -122,6 +124,8 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
if (obj) {
u32 size = i915_gem_obj_ggtt_size(obj);
+ unsigned int tiling = i915_gem_object_get_tiling(obj);
+ unsigned int stride = i915_gem_object_get_stride(obj);
int pitch_val;
int tile_width;
@@ -131,17 +135,17 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
"object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08x) aligned\n",
i915_gem_obj_ggtt_offset(obj), obj->map_and_fenceable, size);
- if (obj->tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
+ if (tiling == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
tile_width = 128;
else
tile_width = 512;
/* Note: pitch better be a power of two tile widths */
- pitch_val = obj->stride / tile_width;
+ pitch_val = stride / tile_width;
pitch_val = ffs(pitch_val) - 1;
val = i915_gem_obj_ggtt_offset(obj);
- if (obj->tiling_mode == I915_TILING_Y)
+ if (tiling == I915_TILING_Y)
val |= 1 << I830_FENCE_TILING_Y_SHIFT;
val |= I915_FENCE_SIZE_BITS(size);
val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -161,6 +165,8 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
if (obj) {
u32 size = i915_gem_obj_ggtt_size(obj);
+ unsigned int tiling = i915_gem_object_get_tiling(obj);
+ unsigned int stride = i915_gem_object_get_stride(obj);
uint32_t pitch_val;
WARN((i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK) ||
@@ -169,11 +175,11 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
"object 0x%08llx not 512K or pot-size 0x%08x aligned\n",
i915_gem_obj_ggtt_offset(obj), size);
- pitch_val = obj->stride / 128;
+ pitch_val = stride / 128;
pitch_val = ffs(pitch_val) - 1;
val = i915_gem_obj_ggtt_offset(obj);
- if (obj->tiling_mode == I915_TILING_Y)
+ if (tiling == I915_TILING_Y)
val |= 1 << I830_FENCE_TILING_Y_SHIFT;
val |= I830_FENCE_SIZE_BITS(size);
val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -201,9 +207,12 @@ static void i915_gem_write_fence(struct drm_device *dev, int reg,
if (i915_gem_object_needs_mb(dev_priv->fence_regs[reg].obj))
mb();
- WARN(obj && (!obj->stride || !obj->tiling_mode),
+ WARN(obj &&
+ (!i915_gem_object_get_stride(obj) ||
+ !i915_gem_object_get_tiling(obj)),
"bogus fence setup with stride: 0x%x, tiling mode: %i\n",
- obj->stride, obj->tiling_mode);
+ i915_gem_object_get_stride(obj),
+ i915_gem_object_get_tiling(obj));
if (IS_GEN2(dev))
i830_write_fence_reg(dev, reg, obj);
@@ -248,7 +257,7 @@ static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
{
- if (obj->tiling_mode)
+ if (i915_gem_object_is_tiled(obj))
i915_gem_release_mmap(obj);
/* As we do not have an associated fence register, we will force
@@ -361,7 +370,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
{
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = to_i915(dev);
- bool enable = obj->tiling_mode != I915_TILING_NONE;
+ bool enable = i915_gem_object_is_tiled(obj);
struct drm_i915_fence_reg *reg;
int ret;
@@ -477,7 +486,7 @@ void i915_gem_restore_fences(struct drm_device *dev)
*/
if (reg->obj) {
i915_gem_object_update_fence(reg->obj, reg,
- reg->obj->tiling_mode);
+ i915_gem_object_get_tiling(reg->obj));
} else {
i915_gem_write_fence(dev, i, NULL);
}
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 6817f69947d9..f4b984de83b5 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -170,6 +170,9 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
struct drm_i915_gem_object *obj;
int ret = 0;
+ /* Make sure we don't cross-contaminate obj->tiling_and_stride */
+ BUILD_BUG_ON(I915_TILING_LAST & STRIDE_MASK);
+
obj = i915_gem_object_lookup(file, args->handle);
if (!obj)
return -ENOENT;
@@ -217,8 +220,8 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
}
}
- if (args->tiling_mode != obj->tiling_mode ||
- args->stride != obj->stride) {
+ if (args->tiling_mode != i915_gem_object_get_tiling(obj) ||
+ args->stride != i915_gem_object_get_stride(obj)) {
/* We need to rebind the object if its current allocation
* no longer meets the alignment restrictions for its new
* tiling mode. Otherwise we can just leave it alone, but
@@ -241,7 +244,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
if (args->tiling_mode == I915_TILING_NONE)
i915_gem_object_unpin_pages(obj);
- if (obj->tiling_mode == I915_TILING_NONE)
+ if (!i915_gem_object_is_tiled(obj))
i915_gem_object_pin_pages(obj);
}
@@ -250,16 +253,16 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
&dev->struct_mutex) ||
obj->fence_reg != I915_FENCE_REG_NONE;
- obj->tiling_mode = args->tiling_mode;
- obj->stride = args->stride;
+ obj->tiling_and_stride =
+ args->stride | args->tiling_mode;
/* Force the fence to be reacquired for GTT access */
i915_gem_release_mmap(obj);
}
}
/* we have to maintain this existing ABI... */
- args->stride = obj->stride;
- args->tiling_mode = obj->tiling_mode;
+ args->stride = i915_gem_object_get_stride(obj);
+ args->tiling_mode = i915_gem_object_get_tiling(obj);
/* Try to preallocate memory required to save swizzling on put-pages */
if (i915_gem_object_needs_bit17_swizzle(obj)) {
@@ -306,7 +309,7 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
if (!obj)
return -ENOENT;
- args->tiling_mode = READ_ONCE(obj->tiling_mode);
+ args->tiling_mode = READ_ONCE(obj->tiling_and_stride) & TILING_MASK;
switch (args->tiling_mode) {
case I915_TILING_X:
args->swizzle_mode = dev_priv->mm.bit_6_swizzle_x;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index cc28ad429dd8..eecb87063c88 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -781,7 +781,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
err->pinned = 0;
if (i915_gem_obj_is_pinned(obj))
err->pinned = 1;
- err->tiling = obj->tiling_mode;
+ err->tiling = i915_gem_object_get_tiling(obj);
err->dirty = obj->dirty;
err->purgeable = obj->madv != I915_MADV_WILLNEED;
err->userptr = obj->userptr.mm != NULL;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9068676943bf..9cbf5431c1e3 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2466,9 +2466,8 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
return false;
}
- obj->tiling_mode = plane_config->tiling;
- if (obj->tiling_mode == I915_TILING_X)
- obj->stride = fb->pitches[0];
+ if (plane_config->tiling == I915_TILING_X)
+ obj->tiling_and_stride = fb->pitches[0] | I915_TILING_X;
mode_cmd.pixel_format = fb->pixel_format;
mode_cmd.width = fb->width;
@@ -2594,7 +2593,7 @@ valid_fb:
intel_state->dst.y2 = plane_state->crtc_y + plane_state->crtc_h;
obj = intel_fb_obj(fb);
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
dev_priv->preserve_bios_swizzle = true;
drm_framebuffer_reference(fb);
@@ -2672,8 +2671,7 @@ static void i9xx_update_primary_plane(struct drm_plane *primary,
BUG();
}
- if (INTEL_INFO(dev)->gen >= 4 &&
- obj->tiling_mode != I915_TILING_NONE)
+ if (INTEL_INFO(dev)->gen >= 4 && i915_gem_object_is_tiled(obj))
dspcntr |= DISPPLANE_TILED;
if (IS_G4X(dev))
@@ -2782,7 +2780,7 @@ static void ironlake_update_primary_plane(struct drm_plane *primary,
BUG();
}
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
dspcntr |= DISPPLANE_TILED;
if (!IS_HASWELL(dev) && !IS_BROADWELL(dev))
@@ -11200,7 +11198,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
intel_ring_emit(ring, fb->pitches[0]);
intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset |
- obj->tiling_mode);
+ i915_gem_object_get_tiling(obj));
/* XXX Enabling the panel-fitter across page-flip is so far
* untested on non-native modes, so ignore it for now.
@@ -11232,7 +11230,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
intel_ring_emit(ring, MI_DISPLAY_FLIP |
MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
- intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
+ intel_ring_emit(ring, fb->pitches[0] | i915_gem_object_get_tiling(obj));
intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset);
/* Contrary to the suggestions in the documentation,
@@ -11335,7 +11333,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
}
intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
- intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
+ intel_ring_emit(ring, fb->pitches[0] | i915_gem_object_get_tiling(obj));
intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset);
intel_ring_emit(ring, (MI_NOOP));
@@ -11442,7 +11440,7 @@ static void ilk_do_mmio_flip(struct intel_crtc *intel_crtc,
dspcntr = I915_READ(reg);
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
dspcntr |= DISPPLANE_TILED;
else
dspcntr &= ~DISPPLANE_TILED;
@@ -11670,7 +11668,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev)) {
engine = &dev_priv->engine[BCS];
- if (obj->tiling_mode != intel_fb_obj(work->old_fb)->tiling_mode)
+ if (i915_gem_object_get_tiling(obj) !=
+ i915_gem_object_get_tiling(intel_fb_obj(work->old_fb)))
/* vlv: DISPLAY_FLIP fails to change tiling */
engine = NULL;
} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
@@ -14932,15 +14931,15 @@ static int intel_framebuffer_init(struct drm_device *dev,
if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
/* Enforce that fb modifier and tiling mode match, but only for
* X-tiled. This is needed for FBC. */
- if (!!(obj->tiling_mode == I915_TILING_X) !=
+ if (!!(i915_gem_object_get_tiling(obj) == I915_TILING_X) !=
!!(mode_cmd->modifier[0] == I915_FORMAT_MOD_X_TILED)) {
DRM_DEBUG("tiling_mode doesn't match fb modifier\n");
return -EINVAL;
}
} else {
- if (obj->tiling_mode == I915_TILING_X)
+ if (i915_gem_object_get_tiling(obj) == I915_TILING_X)
mode_cmd->modifier[0] = I915_FORMAT_MOD_X_TILED;
- else if (obj->tiling_mode == I915_TILING_Y) {
+ else if (i915_gem_object_get_tiling(obj) == I915_TILING_Y) {
DRM_DEBUG("No Y tiling for legacy addfb\n");
return -EINVAL;
}
@@ -14984,9 +14983,10 @@ static int intel_framebuffer_init(struct drm_device *dev,
}
if (mode_cmd->modifier[0] == I915_FORMAT_MOD_X_TILED &&
- mode_cmd->pitches[0] != obj->stride) {
+ mode_cmd->pitches[0] != i915_gem_object_get_stride(obj)) {
DRM_DEBUG("pitch (%d) must match tiling stride (%d)\n",
- mode_cmd->pitches[0], obj->stride);
+ mode_cmd->pitches[0],
+ i915_gem_object_get_stride(obj));
return -EINVAL;
}
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index d4be07615aa9..85adc2b92594 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -741,7 +741,7 @@ static void intel_fbc_update_state_cache(struct intel_crtc *crtc,
cache->fb.pixel_format = fb->pixel_format;
cache->fb.stride = fb->pitches[0];
cache->fb.fence_reg = obj->fence_reg;
- cache->fb.tiling_mode = obj->tiling_mode;
+ cache->fb.tiling_mode = i915_gem_object_get_tiling(obj);
}
static bool intel_fbc_can_activate(struct intel_crtc *crtc)
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 413a2038e6d1..90f3ab424e01 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -1129,7 +1129,7 @@ int intel_overlay_put_image_ioctl(struct drm_device *dev, void *data,
drm_modeset_lock_all(dev);
mutex_lock(&dev->struct_mutex);
- if (new_bo->tiling_mode) {
+ if (i915_gem_object_is_tiled(new_bo)) {
DRM_DEBUG_KMS("buffer used for overlay image can not be tiled\n");
ret = -EINVAL;
goto out_unlock;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index eedcacef7d5c..aef0b105eb58 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -1585,7 +1585,7 @@ static void i9xx_update_wm(struct drm_crtc *unused_crtc)
obj = intel_fb_obj(enabled->primary->state->fb);
/* self-refresh seems busted with untiled */
- if (obj->tiling_mode == I915_TILING_NONE)
+ if (!i915_gem_object_is_tiled(obj))
enabled = NULL;
}
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index c47b74d9cfb1..5beafd4bc1c1 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -431,7 +431,7 @@ vlv_update_plane(struct drm_plane *dplane,
*/
sprctl |= SP_GAMMA_ENABLE;
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
sprctl |= SP_TILED;
/* Sizes are 0 based */
@@ -468,7 +468,7 @@ vlv_update_plane(struct drm_plane *dplane,
I915_WRITE(SPSTRIDE(pipe, plane), fb->pitches[0]);
I915_WRITE(SPPOS(pipe, plane), (crtc_y << 16) | crtc_x);
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
I915_WRITE(SPTILEOFF(pipe, plane), (y << 16) | x);
else
I915_WRITE(SPLINOFF(pipe, plane), linear_offset);
@@ -553,7 +553,7 @@ ivb_update_plane(struct drm_plane *plane,
*/
sprctl |= SPRITE_GAMMA_ENABLE;
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
sprctl |= SPRITE_TILED;
if (IS_HASWELL(dev) || IS_BROADWELL(dev))
@@ -607,7 +607,7 @@ ivb_update_plane(struct drm_plane *plane,
* register */
if (IS_HASWELL(dev) || IS_BROADWELL(dev))
I915_WRITE(SPROFFSET(pipe), (y << 16) | x);
- else if (obj->tiling_mode != I915_TILING_NONE)
+ else if (i915_gem_object_is_tiled(obj))
I915_WRITE(SPRTILEOFF(pipe), (y << 16) | x);
else
I915_WRITE(SPRLINOFF(pipe), linear_offset);
@@ -694,7 +694,7 @@ ilk_update_plane(struct drm_plane *plane,
*/
dvscntr |= DVS_GAMMA_ENABLE;
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
dvscntr |= DVS_TILED;
if (IS_GEN6(dev))
@@ -737,7 +737,7 @@ ilk_update_plane(struct drm_plane *plane,
I915_WRITE(DVSSTRIDE(pipe), fb->pitches[0]);
I915_WRITE(DVSPOS(pipe), (crtc_y << 16) | crtc_x);
- if (obj->tiling_mode != I915_TILING_NONE)
+ if (i915_gem_object_is_tiled(obj))
I915_WRITE(DVSTILEOFF(pipe), (y << 16) | x);
else
I915_WRITE(DVSLINOFF(pipe), linear_offset);
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [CI 19/19] drm/i915: Assert that the request hasn't been retired
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (16 preceding siblings ...)
2016-08-05 9:14 ` [CI 18/19] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
@ 2016-08-05 9:14 ` Chris Wilson
2016-08-05 9:39 ` ✗ Ro.CI.BAT: failure for series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
18 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 9:14 UTC (permalink / raw)
To: intel-gfx
With all callers now not playing tricks with dropping the struct_mutex
between waiting and retiring, we can assert that the request is ready to
be retired.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_request.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 1f91dc8c4171..b317a672040f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -170,7 +170,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
struct i915_gem_active *active, *next;
trace_i915_gem_request_retire(request);
- list_del_init(&request->link);
+ list_del(&request->link);
/* We know the GPU must have read the request to have
* sent us the seqno + interrupt, so use the position
@@ -228,9 +228,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
struct drm_i915_gem_request *tmp;
lockdep_assert_held(&req->i915->drm.struct_mutex);
-
- if (list_empty(&req->link))
- return;
+ GEM_BUG_ON(list_empty(&req->link));
do {
tmp = list_first_entry(&engine->request_list,
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 24+ messages in thread
* ✗ Ro.CI.BAT: failure for series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked()
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
` (17 preceding siblings ...)
2016-08-05 9:14 ` [CI 19/19] drm/i915: Assert that the request hasn't been retired Chris Wilson
@ 2016-08-05 9:39 ` Patchwork
18 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2016-08-05 9:39 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked()
URL : https://patchwork.freedesktop.org/series/10706/
State : failure
== Summary ==
Series 10706v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/10706/revisions/1/mbox
Test gem_exec_gttfill:
Subgroup basic:
skip -> PASS (fi-snb-i7-2600)
Test kms_cursor_legacy:
Subgroup basic-flip-vs-cursor-legacy:
fail -> PASS (fi-hsw-i7-4770k)
pass -> FAIL (ro-hsw-i7-4770r)
fail -> PASS (ro-byt-n2820)
fail -> PASS (ro-snb-i7-2620M)
fail -> PASS (ro-bdw-i5-5250u)
Subgroup basic-flip-vs-cursor-varying-size:
pass -> FAIL (ro-skl3-i5-6260u)
pass -> FAIL (ro-snb-i7-2620M)
fi-hsw-i7-4770k total:240 pass:218 dwarn:0 dfail:0 fail:0 skip:22
fi-kbl-qkkr total:240 pass:181 dwarn:29 dfail:0 fail:4 skip:26
fi-skl-i5-6260u total:240 pass:220 dwarn:4 dfail:0 fail:2 skip:14
fi-skl-i7-6700k total:240 pass:204 dwarn:4 dfail:2 fail:2 skip:28
fi-snb-i7-2600 total:240 pass:198 dwarn:0 dfail:0 fail:0 skip:42
ro-bdw-i5-5250u total:240 pass:218 dwarn:5 dfail:0 fail:1 skip:16
ro-bdw-i7-5557U total:240 pass:223 dwarn:1 dfail:0 fail:0 skip:16
ro-bdw-i7-5600u total:240 pass:206 dwarn:1 dfail:0 fail:1 skip:32
ro-bsw-n3050 total:240 pass:195 dwarn:0 dfail:0 fail:3 skip:42
ro-byt-n2820 total:240 pass:198 dwarn:0 dfail:0 fail:2 skip:40
ro-hsw-i3-4010u total:240 pass:214 dwarn:0 dfail:0 fail:0 skip:26
ro-hsw-i7-4770r total:240 pass:213 dwarn:0 dfail:0 fail:1 skip:26
ro-ilk-i7-620lm total:240 pass:173 dwarn:1 dfail:0 fail:1 skip:65
ro-ivb-i7-3770 total:240 pass:205 dwarn:0 dfail:0 fail:0 skip:35
ro-ivb2-i7-3770 total:240 pass:209 dwarn:0 dfail:0 fail:0 skip:31
ro-skl3-i5-6260u total:240 pass:221 dwarn:1 dfail:0 fail:4 skip:14
ro-snb-i7-2620M total:240 pass:197 dwarn:0 dfail:0 fail:2 skip:41
Results at /archive/results/CI_IGT_test/RO_Patchwork_1725/
4e6081a drm-intel-nightly: 2016y-08m-05d-08h-43m-34s UTC integration manifest
5498da9 drm/i915: Assert that the request hasn't been retired
14bd9cd drm/i915: Repack fence tiling mode and stride into a single integer
c17543a drm/i915: Document and reject invalid tiling modes
ddd10fb drm/i915: Remove locking for get_tiling
7f82de8 drm/i915: Remove pinned check from madvise ioctl
bb79226 drm/i915: Reduce locking inside swfinish ioctl
09792f4 drm/i915: Remove (struct_mutex) locking for busy-ioctl
8116d48 drm/i915: Remove (struct_mutex) locking for wait-ioctl
a8bfdc9 drm/i915: Do a nonblocking wait first in pread/pwrite
773b7d7 drm/i915: Remove unused no-shrinker-steal
5af2eb6 drm/i915: Tidy generation of the GTT mmap offset
a9a6a82 drm/i915/shrinker: Wait before acquiring struct_mutex under oom
3f67580 drm/i915: Simplify do_idling() (Ironlake vt-d w/a)
3761c19 drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
11753a3 drm/i915: Remove forced stop ring on suspend/unload
e6e3d8e drm/i915/userptr: Remove superfluous interruptible=false on waiting
c9789a5 drm/i915: Convert non-blocking userptr waits for requests over to using RCU
eb64585 drm/i915: Convert non-blocking waits for requests over to using RCU
5dfb122 drm/i915: Introduce i915_gem_active_wait_unlocked()
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
2016-08-05 9:14 ` [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
@ 2016-08-05 19:08 ` Daniel Vetter
2016-08-05 19:30 ` Chris Wilson
0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2016-08-05 19:08 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> By applying the same logic as for wait-ioctl, we can query whether a
> request has completed without holding struct_mutex. The biggest impact
> system-wide is removing the flush_active and the contention that causes.
>
> Testcase: igt/gem_busy
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> 1 file changed, 101 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ceb00970b2da..b99d64bfb7eb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> }
>
> +static __always_inline unsigned __busy_read_flag(unsigned int id)
> +{
> + /* Note that we could alias engines in the execbuf API, but
> + * that would be very unwise as it prevents userspace from
> + * fine control over engine selection. Ahem.
> + *
> + * This should be something like EXEC_MAX_ENGINE instead of
> + * I915_NUM_ENGINES.
> + */
> + BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> + return 0x10000 << id;
> +}
> +
> +static __always_inline unsigned int __busy_write_id(unsigned int id)
> +{
> + return id;
> +}
> +
> +static __always_inline unsigned
> +__busy_set_if_active(const struct i915_gem_active *active,
> + unsigned int (*flag)(unsigned int id))
> +{
> + /* For more discussion about the barriers and locking concerns,
> + * see __i915_gem_active_get_rcu().
> + */
> + do {
> + struct drm_i915_gem_request *request;
> + unsigned int id;
> +
> + request = rcu_dereference(active->request);
> + if (!request || i915_gem_request_completed(request))
> + return 0;
> +
> + id = request->engine->exec_id;
> +
> + /* Check that the pointer wasn't reassigned and overwritten. */
cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
anywhere here, hence this needs an smp_rmb(). Also nitpick: The two
rcu_dereference(actove->request) feel a bit silly. If we move the first in
front of the loop, and update the local request pointer (using a tmp) it
would look tidier, and we could even move the loop termination condition
into the while () check (and move the return flag(id) at the end of the
function).
-Daniel
> + if (request == rcu_access_pointer(active->request))
> + return flag(id);
> + } while (1);
> +}
> +
> +static inline unsigned
> +busy_check_reader(const struct i915_gem_active *active)
> +{
> + return __busy_set_if_active(active, __busy_read_flag);
> +}
> +
> +static inline unsigned
> +busy_check_writer(const struct i915_gem_active *active)
> +{
> + return __busy_set_if_active(active, __busy_write_id);
> +}
> +
> int
> i915_gem_busy_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file)
> {
> struct drm_i915_gem_busy *args = data;
> struct drm_i915_gem_object *obj;
> - int ret;
> -
> - ret = i915_mutex_lock_interruptible(dev);
> - if (ret)
> - return ret;
> + unsigned long active;
>
> obj = i915_gem_object_lookup(file, args->handle);
> - if (!obj) {
> - ret = -ENOENT;
> - goto unlock;
> - }
> + if (!obj)
> + return -ENOENT;
>
> - /* Count all active objects as busy, even if they are currently not used
> - * by the gpu. Users of this interface expect objects to eventually
> - * become non-busy without any further actions.
> - */
> args->busy = 0;
> - if (i915_gem_object_is_active(obj)) {
> - struct drm_i915_gem_request *req;
> - int i;
> + active = __I915_BO_ACTIVE(obj);
> + if (active) {
> + int idx;
>
> - for (i = 0; i < I915_NUM_ENGINES; i++) {
> - req = i915_gem_active_peek(&obj->last_read[i],
> - &obj->base.dev->struct_mutex);
> - if (req)
> - args->busy |= 1 << (16 + req->engine->exec_id);
> - }
> - req = i915_gem_active_peek(&obj->last_write,
> - &obj->base.dev->struct_mutex);
> - if (req)
> - args->busy |= req->engine->exec_id;
> + /* Yes, the lookups are intentionally racy.
> + *
> + * First, we cannot simply rely on __I915_BO_ACTIVE. We have
> + * to regard the value as stale and as our ABI guarantees
> + * forward progress, we confirm the status of each active
> + * request with the hardware.
> + *
> + * Even though we guard the pointer lookup by RCU, that only
> + * guarantees that the pointer and its contents remain
> + * dereferencable and does *not* mean that the request we
> + * have is the same as the one being tracked by the object.
> + *
> + * Consider that we lookup the request just as it is being
> + * retired and freed. We take a local copy of the pointer,
> + * but before we add its engine into the busy set, the other
> + * thread reallocates it and assigns it to a task on another
> + * engine with a fresh and incomplete seqno.
> + *
> + * So after we lookup the engine's id, we double check that
> + * the active request is the same and only then do we add it
> + * into the busy set.
> + */
> + rcu_read_lock();
> +
> + for_each_active(active, idx)
> + args->busy |= busy_check_reader(&obj->last_read[idx]);
> +
> + /* For ABI sanity, we only care that the write engine is in
> + * the set of read engines. This is ensured by the ordering
> + * of setting last_read/last_write in i915_vma_move_to_active,
> + * and then in reverse in retire.
> + *
> + * We don't care that the set of active read/write engines
> + * may change during construction of the result, as it is
> + * equally liable to change before userspace can inspect
> + * the result.
> + */
> + args->busy |= busy_check_writer(&obj->last_write);
> +
> + rcu_read_unlock();
> }
>
> - i915_gem_object_put(obj);
> -unlock:
> - mutex_unlock(&dev->struct_mutex);
> - return ret;
> + i915_gem_object_put_unlocked(obj);
> + return 0;
> }
>
> int
> --
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [CI 15/19] drm/i915: Remove pinned check from madvise ioctl
2016-08-05 9:14 ` [CI 15/19] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2016-08-05 19:10 ` Daniel Vetter
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Vetter @ 2016-08-05 19:10 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Fri, Aug 05, 2016 at 10:14:20AM +0100, Chris Wilson wrote:
> We don't need to incur the overhead of checking whether the object is
> pinned prior to changing its madvise. If the object is pinned, the
> madvise will not take effect until it is unpinned and so we cannot free
> the pages being pointed at by hardware. Marking a pinned object with
> allocated pages as DONTNEED will not trigger any undue warnings. The check
> is therefore superfluous, and by removing it we can remove a linear walk
> over all the vma the object has.
>
> Still despite it being an overzealous check, that error code is part of
> the current ABI and so we must proceed with caution.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
fwiw Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 063c56f5d6f8..7e6b7853f9d7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3883,11 +3883,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> goto unlock;
> }
>
> - if (i915_gem_obj_is_pinned(obj)) {
> - ret = -EINVAL;
> - goto out;
> - }
> -
> if (obj->pages &&
> obj->tiling_mode != I915_TILING_NONE &&
> dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
> @@ -3906,7 +3901,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>
> args->retained = obj->madv != __I915_MADV_PURGED;
>
> -out:
> i915_gem_object_put(obj);
> unlock:
> mutex_unlock(&dev->struct_mutex);
> --
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
2016-08-05 19:08 ` Daniel Vetter
@ 2016-08-05 19:30 ` Chris Wilson
2016-08-05 20:12 ` Daniel Vetter
0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-08-05 19:30 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Fri, Aug 05, 2016 at 09:08:34PM +0200, Daniel Vetter wrote:
> On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> > By applying the same logic as for wait-ioctl, we can query whether a
> > request has completed without holding struct_mutex. The biggest impact
> > system-wide is removing the flush_active and the contention that causes.
> >
> > Testcase: igt/gem_busy
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> > 1 file changed, 101 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ceb00970b2da..b99d64bfb7eb 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> > i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> > }
> >
> > +static __always_inline unsigned __busy_read_flag(unsigned int id)
> > +{
> > + /* Note that we could alias engines in the execbuf API, but
> > + * that would be very unwise as it prevents userspace from
> > + * fine control over engine selection. Ahem.
> > + *
> > + * This should be something like EXEC_MAX_ENGINE instead of
> > + * I915_NUM_ENGINES.
> > + */
> > + BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> > + return 0x10000 << id;
> > +}
> > +
> > +static __always_inline unsigned int __busy_write_id(unsigned int id)
> > +{
> > + return id;
> > +}
> > +
> > +static __always_inline unsigned
> > +__busy_set_if_active(const struct i915_gem_active *active,
> > + unsigned int (*flag)(unsigned int id))
> > +{
> > + /* For more discussion about the barriers and locking concerns,
> > + * see __i915_gem_active_get_rcu().
> > + */
> > + do {
> > + struct drm_i915_gem_request *request;
> > + unsigned int id;
> > +
> > + request = rcu_dereference(active->request);
> > + if (!request || i915_gem_request_completed(request))
> > + return 0;
> > +
> > + id = request->engine->exec_id;
> > +
> > + /* Check that the pointer wasn't reassigned and overwritten. */
>
> cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
> anywhere here, hence this needs an smp_rmb().
I toyed with smp_rmb().
The rcu_deference() followed by rcu_access_pointer() is ordered.
So I was back with dancing around "where the dependent-reads ordered by
the first rcu_deference ordered in front of the second access which was
itself ordered after the first?" I probably should
have stuck in the smp_rmb() and stopped worrying - it is still going to
be cheaper than the refcount traffic.
> Also nitpick: The two
> rcu_dereference(actove->request) feel a bit silly. If we move the first in
> front of the loop, and update the local request pointer (using a tmp) it
> would look tidier, and we could even move the loop termination condition
> into the while () check (and move the return flag(id) at the end of the
> function).
I was quite content with only having to think of one phase through the
loop and not worry about state being carried forward.
__busy_set_if_active(const struct i915_gem_active *active,
unsigned int (*flag)(unsigned int id))
{
+ struct drm_i915_gem_request *request;
+ unsigned int id;
+
/* For more discussion about the barriers and locking concerns,
* see __i915_gem_active_get_rcu().
*/
+ request = rcu_dereference(active->request);
do {
- struct drm_i915_gem_request *request;
- unsigned int id;
+ struct drm_i915_gem_request *tmp;
- request = rcu_dereference(active->request);
if (!request || i915_gem_request_completed(request))
return 0;
id = request->engine->exec_id;
/* Check that the pointer wasn't reassigned and overwritten. */
- if (request == rcu_access_pointer(active->request))
- return flag(id);
+ tmp = rcu_dereference(active->request);
+ if (tmp == request)
+ break;
+
+ request = tmp;
} while (1);
+
+ return flag(id);
}
is also not as well optimised by gcc, apparently.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
2016-08-05 19:30 ` Chris Wilson
@ 2016-08-05 20:12 ` Daniel Vetter
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Vetter @ 2016-08-05 20:12 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, intel-gfx
On Fri, Aug 05, 2016 at 08:30:42PM +0100, Chris Wilson wrote:
> On Fri, Aug 05, 2016 at 09:08:34PM +0200, Daniel Vetter wrote:
> > On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> > > By applying the same logic as for wait-ioctl, we can query whether a
> > > request has completed without holding struct_mutex. The biggest impact
> > > system-wide is removing the flush_active and the contention that causes.
> > >
> > > Testcase: igt/gem_busy
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> > > 1 file changed, 101 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index ceb00970b2da..b99d64bfb7eb 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> > > i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> > > }
> > >
> > > +static __always_inline unsigned __busy_read_flag(unsigned int id)
> > > +{
> > > + /* Note that we could alias engines in the execbuf API, but
> > > + * that would be very unwise as it prevents userspace from
> > > + * fine control over engine selection. Ahem.
> > > + *
> > > + * This should be something like EXEC_MAX_ENGINE instead of
> > > + * I915_NUM_ENGINES.
> > > + */
> > > + BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> > > + return 0x10000 << id;
> > > +}
> > > +
> > > +static __always_inline unsigned int __busy_write_id(unsigned int id)
> > > +{
> > > + return id;
> > > +}
> > > +
> > > +static __always_inline unsigned
> > > +__busy_set_if_active(const struct i915_gem_active *active,
> > > + unsigned int (*flag)(unsigned int id))
> > > +{
> > > + /* For more discussion about the barriers and locking concerns,
> > > + * see __i915_gem_active_get_rcu().
> > > + */
> > > + do {
> > > + struct drm_i915_gem_request *request;
> > > + unsigned int id;
> > > +
> > > + request = rcu_dereference(active->request);
> > > + if (!request || i915_gem_request_completed(request))
> > > + return 0;
> > > +
> > > + id = request->engine->exec_id;
> > > +
> > > + /* Check that the pointer wasn't reassigned and overwritten. */
> >
> > cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
> > anywhere here, hence this needs an smp_rmb().
>
> I toyed with smp_rmb().
>
> The rcu_deference() followed by rcu_access_pointer() is ordered.
>
> So I was back with dancing around "where the dependent-reads ordered by
> the first rcu_deference ordered in front of the second access which was
> itself ordered after the first?" I probably should
> have stuck in the smp_rmb() and stopped worrying - it is still going to
> be cheaper than the refcount traffic.
It's the read of exec_id vs. the 2nd read of request which isn't ordered,
and which we want to be ordered to ensure we read the right engine id (and
not some bogus thing since the request was recycled meanwhile). And I
think the smp_rmb() is indeed required in there:
1. first active->request lookup
2. 2nd active->request lookup (compiler/cpu is allowed to do that, I think it
could even reorder ahead of 1 since it's not a dependent read)
<- gpu completes request, evil other thread does all the clean&recycles
with new bogus engine
3. sample the engine->exec_id
4. bail out of the loop sinc requests looked up in 1&2 match.
> > Also nitpick: The two
> > rcu_dereference(actove->request) feel a bit silly. If we move the first in
> > front of the loop, and update the local request pointer (using a tmp) it
> > would look tidier, and we could even move the loop termination condition
> > into the while () check (and move the return flag(id) at the end of the
> > function).
>
> I was quite content with only having to think of one phase through the
> loop and not worry about state being carried forward.
>
> __busy_set_if_active(const struct i915_gem_active *active,
> unsigned int (*flag)(unsigned int id))
> {
> + struct drm_i915_gem_request *request;
> + unsigned int id;
> +
> /* For more discussion about the barriers and locking concerns,
> * see __i915_gem_active_get_rcu().
> */
> + request = rcu_dereference(active->request);
> do {
> - struct drm_i915_gem_request *request;
> - unsigned int id;
> + struct drm_i915_gem_request *tmp;
>
> - request = rcu_dereference(active->request);
> if (!request || i915_gem_request_completed(request))
> return 0;
>
> id = request->engine->exec_id;
>
> /* Check that the pointer wasn't reassigned and overwritten. */
> - if (request == rcu_access_pointer(active->request))
> - return flag(id);
> + tmp = rcu_dereference(active->request);
> + if (tmp == request)
> + break;
> +
> + request = tmp;
> } while (1);
> +
> + return flag(id);
> }
>
> is also not as well optimised by gcc, apparently.
Hm yeah, underwhelming. I'm ok with either I guess.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2016-08-05 20:12 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-08-05 9:14 ` [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
2016-08-05 9:14 ` [CI 03/19] drm/i915: Convert non-blocking userptr " Chris Wilson
2016-08-05 9:14 ` [CI 04/19] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
2016-08-05 9:14 ` [CI 05/19] drm/i915: Remove forced stop ring on suspend/unload Chris Wilson
2016-08-05 9:14 ` [CI 06/19] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
2016-08-05 9:14 ` [CI 07/19] drm/i915: Simplify do_idling() (Ironlake vt-d w/a) Chris Wilson
2016-08-05 9:14 ` [CI 08/19] drm/i915/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
2016-08-05 9:14 ` [CI 09/19] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
2016-08-05 9:14 ` [CI 10/19] drm/i915: Remove unused no-shrinker-steal Chris Wilson
2016-08-05 9:14 ` [CI 11/19] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
2016-08-05 9:14 ` [CI 12/19] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-08-05 9:14 ` [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-08-05 19:08 ` Daniel Vetter
2016-08-05 19:30 ` Chris Wilson
2016-08-05 20:12 ` Daniel Vetter
2016-08-05 9:14 ` [CI 14/19] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-08-05 9:14 ` [CI 15/19] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-08-05 19:10 ` Daniel Vetter
2016-08-05 9:14 ` [CI 16/19] drm/i915: Remove locking for get_tiling Chris Wilson
2016-08-05 9:14 ` [CI 17/19] drm/i915: Document and reject invalid tiling modes Chris Wilson
2016-08-05 9:14 ` [CI 18/19] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
2016-08-05 9:14 ` [CI 19/19] drm/i915: Assert that the request hasn't been retired Chris Wilson
2016-08-05 9:39 ` ✗ Ro.CI.BAT: failure for series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.