* [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
@ 2016-08-29 7:08 ` Chris Wilson
2016-09-23 12:59 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes Chris Wilson
` (4 subsequent siblings)
5 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Daniel Vetter, Sumit Semwal, linux-media,
linaro-mm-sig
This variant of fence_get_rcu() takes an RCU protected pointer to a
fence and carefully returns a reference to the fence ensuring that it is
not reallocated as it does. This is required when mixing fences and
SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 51 insertions(+), 5 deletions(-)
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d763053f97a..c9c5ba98c302 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
void fence_free(struct fence *fence);
/**
+ * fence_put - decreases refcount of the fence
+ * @fence: [in] fence to reduce refcount of
+ */
+static inline void fence_put(struct fence *fence)
+{
+ if (fence)
+ kref_put(&fence->refcount, fence_release);
+}
+
+/**
* fence_get - increases refcount of the fence
* @fence: [in] fence to increase refcount of
*
@@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
}
/**
- * fence_put - decreases refcount of the fence
- * @fence: [in] fence to reduce refcount of
+ * fence_get_rcu_safe - acquire a reference to an RCU tracked fence
+ * @fence: [in] pointer to fence to increase refcount of
+ *
+ * Function returns NULL if no refcount could be obtained, or the fence.
+ * This function handles acquiring a reference to a fence that may be
+ * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
+ * so long as the caller is using RCU on the pointer to the fence.
+ *
+ * An alternative mechanism is to employ a seqlock to protect a bunch of
+ * fences, such as used by struct reservation_object. When using a seqlock,
+ * the seqlock must be taken before and checked after a reference to the
+ * fence is acquired (as shown here).
+ *
+ * The caller is required to hold the RCU read lock.
*/
-static inline void fence_put(struct fence *fence)
+static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
{
- if (fence)
- kref_put(&fence->refcount, fence_release);
+ do {
+ struct fence *fence;
+
+ fence = rcu_dereference(*fencep);
+ if (!fence || !fence_get_rcu(fence))
+ return NULL;
+
+ /* The atomic_inc_not_zero() inside fence_get_rcu()
+ * provides a full memory barrier upon success (such as now).
+ * This is paired with the write barrier from assigning
+ * to the __rcu protected fence pointer so that if that
+ * pointer still matches the current fence, we know we
+ * have successfully acquire a reference to it. If it no
+ * longer matches, we are holding a reference to some other
+ * reallocated pointer. This is possible if the allocator
+ * is using a freelist like SLAB_DESTROY_BY_RCU where the
+ * fence remains valid for the RCU grace period, but it
+ * may be reallocated. When using such allocators, we are
+ * responsible for ensuring the reference we get is to
+ * the right fence, as below.
+ */
+ if (fence == rcu_access_pointer(*fencep))
+ return rcu_pointer_handoff(fence);
+
+ fence_put(fence);
+ } while (1);
}
int fence_signal(struct fence *fence);
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
2016-08-29 7:08 ` [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe() Chris Wilson
@ 2016-08-29 7:08 ` Chris Wilson
2016-09-23 13:03 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() " Chris Wilson
` (3 subsequent siblings)
5 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
1 file changed, 31 insertions(+), 40 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 723d8af988e5..10fd441dd4ed 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
unsigned *pshared_count,
struct fence ***pshared)
{
- unsigned shared_count = 0;
- unsigned retry = 1;
- struct fence **shared = NULL, *fence_excl = NULL;
- int ret = 0;
+ struct fence **shared = NULL;
+ struct fence *fence_excl;
+ unsigned shared_count;
+ int ret = 1;
- while (retry) {
+ do {
struct reservation_object_list *fobj;
unsigned seq;
+ unsigned i;
- seq = read_seqcount_begin(&obj->seq);
+ shared_count = i = 0;
rcu_read_lock();
+ seq = read_seqcount_begin(&obj->seq);
+
+ fence_excl = rcu_dereference(obj->fence_excl);
+ if (fence_excl && !fence_get_rcu(fence_excl))
+ goto unlock;
fobj = rcu_dereference(obj->fence);
if (fobj) {
@@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
}
ret = -ENOMEM;
- shared_count = 0;
break;
}
shared = nshared;
- memcpy(shared, fobj->shared, sz);
shared_count = fobj->shared_count;
- } else
- shared_count = 0;
- fence_excl = rcu_dereference(obj->fence_excl);
-
- retry = read_seqcount_retry(&obj->seq, seq);
- if (retry)
- goto unlock;
-
- if (!fence_excl || fence_get_rcu(fence_excl)) {
- unsigned i;
for (i = 0; i < shared_count; ++i) {
- if (fence_get_rcu(shared[i]))
- continue;
-
- /* uh oh, refcount failed, abort and retry */
- while (i--)
- fence_put(shared[i]);
-
- if (fence_excl) {
- fence_put(fence_excl);
- fence_excl = NULL;
- }
-
- retry = 1;
- break;
+ shared[i] = rcu_dereference(fobj->shared[i]);
+ if (!fence_get_rcu(shared[i]))
+ break;
}
- } else
- retry = 1;
+ }
+
+ if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
+ while (i--)
+ fence_put(shared[i]);
+ fence_put(fence_excl);
+ goto unlock;
+ }
+ ret = 0;
unlock:
rcu_read_unlock();
- }
- *pshared_count = shared_count;
- if (shared_count)
- *pshared = shared;
- else {
- *pshared = NULL;
+ } while (ret);
+
+ if (!shared_count) {
kfree(shared);
+ shared = NULL;
}
+
+ *pshared_count = shared_count;
+ *pshared = shared;
*pfence_excl = fence_excl;
return ret;
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
2016-08-29 7:08 ` [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe() Chris Wilson
2016-08-29 7:08 ` [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes Chris Wilson
@ 2016-08-29 7:08 ` Chris Wilson
2016-09-23 13:18 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() " Chris Wilson
` (2 subsequent siblings)
5 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/reservation.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 10fd441dd4ed..3369e4668e96 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -388,9 +388,6 @@ retry:
if (fobj)
shared_count = fobj->shared_count;
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
for (i = 0; i < shared_count; ++i) {
struct fence *lfence = rcu_dereference(fobj->shared[i]);
@@ -413,9 +410,6 @@ retry:
if (!shared_count) {
struct fence *fence_excl = rcu_dereference(obj->fence_excl);
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
if (fence_excl &&
!test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
if (!fence_get_rcu(fence_excl))
@@ -430,6 +424,11 @@ retry:
rcu_read_unlock();
if (fence) {
+ if (read_seqcount_retry(&obj->seq, seq)) {
+ fence_put(fence);
+ goto retry;
+ }
+
ret = fence_wait_timeout(fence, intr, ret);
fence_put(fence);
if (ret > 0 && wait_all && (i + 1 < shared_count))
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
` (2 preceding siblings ...)
2016-08-29 7:08 ` [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() " Chris Wilson
@ 2016-08-29 7:08 ` Chris Wilson
2016-09-23 13:43 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single Chris Wilson
2016-08-29 7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
5 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
1 file changed, 10 insertions(+), 20 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 3369e4668e96..e74493e7332b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
bool test_all)
{
unsigned seq, shared_count;
- int ret = true;
+ int ret;
+ rcu_read_lock();
retry:
+ ret = true;
shared_count = 0;
seq = read_seqcount_begin(&obj->seq);
- rcu_read_lock();
if (test_all) {
unsigned i;
@@ -490,46 +491,35 @@ retry:
if (fobj)
shared_count = fobj->shared_count;
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
for (i = 0; i < shared_count; ++i) {
struct fence *fence = rcu_dereference(fobj->shared[i]);
ret = reservation_object_test_signaled_single(fence);
if (ret < 0)
- goto unlock_retry;
+ goto retry;
else if (!ret)
break;
}
- /*
- * There could be a read_seqcount_retry here, but nothing cares
- * about whether it's the old or newer fence pointers that are
- * signaled. That race could still have happened after checking
- * read_seqcount_retry. If you care, use ww_mutex_lock.
- */
+ if (read_seqcount_retry(&obj->seq, seq))
+ goto retry;
}
if (!shared_count) {
struct fence *fence_excl = rcu_dereference(obj->fence_excl);
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
if (fence_excl) {
ret = reservation_object_test_signaled_single(
fence_excl);
if (ret < 0)
- goto unlock_retry;
+ goto retry;
+
+ if (read_seqcount_retry(&obj->seq, seq))
+ goto retry;
}
}
rcu_read_unlock();
return ret;
-
-unlock_retry:
- rcu_read_unlock();
- goto retry;
}
EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
` (3 preceding siblings ...)
2016-08-29 7:08 ` [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() " Chris Wilson
@ 2016-08-29 7:08 ` Chris Wilson
2016-09-23 13:49 ` [Linaro-mm-sig] " Daniel Vetter
2016-08-29 7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
5 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Sumit Semwal, linux-media, linaro-mm-sig
With the seqlock now extended to cover the lookup of the fence and its
testing, we can perform that testing solely under the seqlock guard and
avoid the effective locking and serialisation of acquiring a reference to
the request. As the fence is RCU protected we know it cannot disappear
as we test it, the same guarantee that made it safe to acquire the
reference previously. The seqlock tests whether the fence was replaced
as we are testing it telling us whether or not we can trust the result
(if not, we just repeat the test until stable).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/reservation.c | 32 ++++----------------------------
1 file changed, 4 insertions(+), 28 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index e74493e7332b..1ddffa5adb5a 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -442,24 +442,6 @@ unlock_retry:
}
EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
-
-static inline int
-reservation_object_test_signaled_single(struct fence *passed_fence)
-{
- struct fence *fence, *lfence = passed_fence;
- int ret = 1;
-
- if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
- fence = fence_get_rcu(lfence);
- if (!fence)
- return -1;
-
- ret = !!fence_is_signaled(fence);
- fence_put(fence);
- }
- return ret;
-}
-
/**
* reservation_object_test_signaled_rcu - Test if a reservation object's
* fences have been signaled.
@@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
bool test_all)
{
unsigned seq, shared_count;
- int ret;
+ bool ret;
rcu_read_lock();
retry:
@@ -494,10 +476,8 @@ retry:
for (i = 0; i < shared_count; ++i) {
struct fence *fence = rcu_dereference(fobj->shared[i]);
- ret = reservation_object_test_signaled_single(fence);
- if (ret < 0)
- goto retry;
- else if (!ret)
+ ret = fence_is_signaled(fence);
+ if (!ret)
break;
}
@@ -509,11 +489,7 @@ retry:
struct fence *fence_excl = rcu_dereference(obj->fence_excl);
if (fence_excl) {
- ret = reservation_object_test_signaled_single(
- fence_excl);
- if (ret < 0)
- goto retry;
-
+ ret = fence_is_signaled(fence_excl);
if (read_seqcount_retry(&obj->seq, seq))
goto retry;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
` (4 preceding siblings ...)
2016-08-29 7:08 ` [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single Chris Wilson
@ 2016-08-29 7:08 ` Chris Wilson
2016-08-29 18:16 ` [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0) Chris Wilson
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
5 siblings, 2 replies; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 7:08 UTC (permalink / raw)
To: dri-devel
Cc: intel-gfx, Chris Wilson, Sumit Semwal, linux-media, linaro-mm-sig
Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.
We can query whether the poll will block prior to installing the
callback to make the busy-query fast.
Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster
Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
drivers/dma-buf/dma-buf.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
if (!events)
return 0;
+ if (poll_does_not_wait(poll)) {
+ if (events & POLLOUT &&
+ !reservation_object_test_signaled_rcu(resv, true))
+ events &= ~(POLLOUT | POLLIN);
+
+ if (events & POLLIN &&
+ !reservation_object_test_signaled_rcu(resv, false))
+ events &= ~POLLIN;
+
+ return events;
+ }
+
retry:
seq = read_seqcount_begin(&resv->seq);
rcu_read_lock();
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
2016-08-29 7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
@ 2016-08-29 18:16 ` Chris Wilson
2016-08-29 18:26 ` Gustavo Padovan
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
1 sibling, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-08-29 18:16 UTC (permalink / raw)
To: dri-devel
Cc: Chris Wilson, Sumit Semwal, Gustavo Padovan, linux-media,
linaro-mm-sig
If we being polled with a timeout of zero, a nonblocking busy query,
we don't need to install any fence callbacks as we will not be waiting.
As we only install the callback once, the overhead comes from the atomic
bit test that also causes serialisation between threads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/sync_file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 486d29c1a830..abb5fdab75fd 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
poll_wait(file, &sync_file->wq, wait);
- if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
+ if (!poll_does_not_wait(wait) &&
+ !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
if (fence_add_callback(sync_file->fence, &sync_file->cb,
fence_check_cb_func) < 0)
wake_up_all(&sync_file->wq);
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
2016-08-29 18:16 ` [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0) Chris Wilson
@ 2016-08-29 18:26 ` Gustavo Padovan
2016-09-13 14:46 ` Sumit Semwal
0 siblings, 1 reply; 23+ messages in thread
From: Gustavo Padovan @ 2016-08-29 18:26 UTC (permalink / raw)
To: Chris Wilson; +Cc: dri-devel, Sumit Semwal, linux-media, linaro-mm-sig
Hi Chris,
2016-08-29 Chris Wilson <chris@chris-wilson.co.uk>:
> If we being polled with a timeout of zero, a nonblocking busy query,
> we don't need to install any fence callbacks as we will not be waiting.
> As we only install the callback once, the overhead comes from the atomic
> bit test that also causes serialisation between threads.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Gustavo Padovan <gustavo@padovan.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
> drivers/dma-buf/sync_file.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Indeed, we can shortcut this.
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Gustavo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
2016-08-29 18:26 ` Gustavo Padovan
@ 2016-09-13 14:46 ` Sumit Semwal
0 siblings, 0 replies; 23+ messages in thread
From: Sumit Semwal @ 2016-09-13 14:46 UTC (permalink / raw)
To: Gustavo Padovan, Chris Wilson, DRI mailing list, Sumit Semwal,
linux-media@vger.kernel.org, Linaro MM SIG Mailman List
Hi Chris,
On 29 August 2016 at 23:56, Gustavo Padovan <gustavo@padovan.org> wrote:
> Hi Chris,
>
> 2016-08-29 Chris Wilson <chris@chris-wilson.co.uk>:
>
>> If we being polled with a timeout of zero, a nonblocking busy query,
>> we don't need to install any fence callbacks as we will not be waiting.
>> As we only install the callback once, the overhead comes from the atomic
>> bit test that also causes serialisation between threads.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>> Cc: Gustavo Padovan <gustavo@padovan.org>
>> Cc: linux-media@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: linaro-mm-sig@lists.linaro.org
>> ---
>> drivers/dma-buf/sync_file.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Indeed, we can shortcut this.
>
> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>
> Gustavo
Thanks; pushed to drm-misc.
Best,
Sumit.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
2016-08-29 7:08 ` [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe() Chris Wilson
@ 2016-09-23 12:59 ` Daniel Vetter
2016-09-23 13:34 ` Markus Heiser
0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:59 UTC (permalink / raw)
To: Chris Wilson
Cc: dri-devel, intel-gfx, Daniel Vetter, Sumit Semwal, linux-media,
linaro-mm-sig
On Mon, Aug 29, 2016 at 08:08:29AM +0100, Chris Wilson wrote:
> This variant of fence_get_rcu() takes an RCU protected pointer to a
> fence and carefully returns a reference to the fence ensuring that it is
> not reallocated as it does. This is required when mixing fences and
> SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
> include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 51 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index 0d763053f97a..c9c5ba98c302 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
> void fence_free(struct fence *fence);
>
> /**
> + * fence_put - decreases refcount of the fence
> + * @fence: [in] fence to reduce refcount of
> + */
> +static inline void fence_put(struct fence *fence)
> +{
> + if (fence)
> + kref_put(&fence->refcount, fence_release);
> +}
> +
> +/**
> * fence_get - increases refcount of the fence
> * @fence: [in] fence to increase refcount of
> *
> @@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
> }
>
> /**
> - * fence_put - decreases refcount of the fence
> - * @fence: [in] fence to reduce refcount of
> + * fence_get_rcu_safe - acquire a reference to an RCU tracked fence
> + * @fence: [in] pointer to fence to increase refcount of
> + *
> + * Function returns NULL if no refcount could be obtained, or the fence.
> + * This function handles acquiring a reference to a fence that may be
> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
> + * so long as the caller is using RCU on the pointer to the fence.
> + *
> + * An alternative mechanism is to employ a seqlock to protect a bunch of
> + * fences, such as used by struct reservation_object. When using a seqlock,
> + * the seqlock must be taken before and checked after a reference to the
> + * fence is acquired (as shown here).
> + *
> + * The caller is required to hold the RCU read lock.
Would be good to cross reference the various fence_get functions a bit
better in the docs. But since the docs aren't yet pulled into the rst/html
output, that doesn't matter that much. Hence as-is:
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> */
> -static inline void fence_put(struct fence *fence)
> +static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
> {
> - if (fence)
> - kref_put(&fence->refcount, fence_release);
> + do {
> + struct fence *fence;
> +
> + fence = rcu_dereference(*fencep);
> + if (!fence || !fence_get_rcu(fence))
> + return NULL;
> +
> + /* The atomic_inc_not_zero() inside fence_get_rcu()
> + * provides a full memory barrier upon success (such as now).
> + * This is paired with the write barrier from assigning
> + * to the __rcu protected fence pointer so that if that
> + * pointer still matches the current fence, we know we
> + * have successfully acquire a reference to it. If it no
> + * longer matches, we are holding a reference to some other
> + * reallocated pointer. This is possible if the allocator
> + * is using a freelist like SLAB_DESTROY_BY_RCU where the
> + * fence remains valid for the RCU grace period, but it
> + * may be reallocated. When using such allocators, we are
> + * responsible for ensuring the reference we get is to
> + * the right fence, as below.
> + */
> + if (fence == rcu_access_pointer(*fencep))
> + return rcu_pointer_handoff(fence);
> +
> + fence_put(fence);
> + } while (1);
> }
>
> int fence_signal(struct fence *fence);
> --
> 2.9.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
2016-08-29 7:08 ` [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes Chris Wilson
@ 2016-09-23 13:03 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:03 UTC (permalink / raw)
To: Chris Wilson
Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
On Mon, Aug 29, 2016 at 08:08:30AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
> drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
> 1 file changed, 31 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 723d8af988e5..10fd441dd4ed 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
> unsigned *pshared_count,
> struct fence ***pshared)
> {
> - unsigned shared_count = 0;
> - unsigned retry = 1;
> - struct fence **shared = NULL, *fence_excl = NULL;
> - int ret = 0;
> + struct fence **shared = NULL;
> + struct fence *fence_excl;
> + unsigned shared_count;
> + int ret = 1;
Personally I'd go with ret = -EBUSY here, but that's a bikeshed.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> - while (retry) {
> + do {
> struct reservation_object_list *fobj;
> unsigned seq;
> + unsigned i;
>
> - seq = read_seqcount_begin(&obj->seq);
> + shared_count = i = 0;
>
> rcu_read_lock();
> + seq = read_seqcount_begin(&obj->seq);
> +
> + fence_excl = rcu_dereference(obj->fence_excl);
> + if (fence_excl && !fence_get_rcu(fence_excl))
> + goto unlock;
>
> fobj = rcu_dereference(obj->fence);
> if (fobj) {
> @@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
> }
>
> ret = -ENOMEM;
> - shared_count = 0;
> break;
> }
> shared = nshared;
> - memcpy(shared, fobj->shared, sz);
> shared_count = fobj->shared_count;
> - } else
> - shared_count = 0;
> - fence_excl = rcu_dereference(obj->fence_excl);
> -
> - retry = read_seqcount_retry(&obj->seq, seq);
> - if (retry)
> - goto unlock;
> -
> - if (!fence_excl || fence_get_rcu(fence_excl)) {
> - unsigned i;
>
> for (i = 0; i < shared_count; ++i) {
> - if (fence_get_rcu(shared[i]))
> - continue;
> -
> - /* uh oh, refcount failed, abort and retry */
> - while (i--)
> - fence_put(shared[i]);
> -
> - if (fence_excl) {
> - fence_put(fence_excl);
> - fence_excl = NULL;
> - }
> -
> - retry = 1;
> - break;
> + shared[i] = rcu_dereference(fobj->shared[i]);
> + if (!fence_get_rcu(shared[i]))
> + break;
> }
> - } else
> - retry = 1;
> + }
> +
> + if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
> + while (i--)
> + fence_put(shared[i]);
> + fence_put(fence_excl);
> + goto unlock;
> + }
>
> + ret = 0;
> unlock:
> rcu_read_unlock();
> - }
> - *pshared_count = shared_count;
> - if (shared_count)
> - *pshared = shared;
> - else {
> - *pshared = NULL;
> + } while (ret);
> +
> + if (!shared_count) {
> kfree(shared);
> + shared = NULL;
> }
> +
> + *pshared_count = shared_count;
> + *pshared = shared;
> *pfence_excl = fence_excl;
>
> return ret;
> --
> 2.9.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
2016-08-29 7:08 ` [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() " Chris Wilson
@ 2016-09-23 13:18 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:18 UTC (permalink / raw)
To: Chris Wilson
Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
On Mon, Aug 29, 2016 at 08:08:31AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/dma-buf/reservation.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 10fd441dd4ed..3369e4668e96 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -388,9 +388,6 @@ retry:
> if (fobj)
> shared_count = fobj->shared_count;
>
> - if (read_seqcount_retry(&obj->seq, seq))
> - goto unlock_retry;
> -
> for (i = 0; i < shared_count; ++i) {
> struct fence *lfence = rcu_dereference(fobj->shared[i]);
>
> @@ -413,9 +410,6 @@ retry:
> if (!shared_count) {
> struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>
> - if (read_seqcount_retry(&obj->seq, seq))
> - goto unlock_retry;
> -
> if (fence_excl &&
> !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
> if (!fence_get_rcu(fence_excl))
> @@ -430,6 +424,11 @@ retry:
>
> rcu_read_unlock();
> if (fence) {
> + if (read_seqcount_retry(&obj->seq, seq)) {
> + fence_put(fence);
> + goto retry;
> + }
> +
> ret = fence_wait_timeout(fence, intr, ret);
> fence_put(fence);
> if (ret > 0 && wait_all && (i + 1 < shared_count))
> --
> 2.9.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
2016-09-23 12:59 ` Daniel Vetter
@ 2016-09-23 13:34 ` Markus Heiser
0 siblings, 0 replies; 23+ messages in thread
From: Markus Heiser @ 2016-09-23 13:34 UTC (permalink / raw)
To: Daniel Vetter
Cc: Chris Wilson, dri-devel, intel-gfx, Daniel Vetter, Sumit Semwal,
linux-media, linaro-mm-sig
Am 23.09.2016 um 14:59 schrieb Daniel Vetter <daniel@ffwll.ch>:
>>
>> /**
>> - * fence_put - decreases refcount of the fence
>> - * @fence: [in] fence to reduce refcount of
>> + * fence_get_rcu_safe - acquire a reference to an RCU tracked fence
>> + * @fence: [in] pointer to fence to increase refcount of
>> + *
>> + * Function returns NULL if no refcount could be obtained, or the fence.
>> + * This function handles acquiring a reference to a fence that may be
>> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
>> + * so long as the caller is using RCU on the pointer to the fence.
>> + *
>> + * An alternative mechanism is to employ a seqlock to protect a bunch of
>> + * fences, such as used by struct reservation_object. When using a seqlock,
>> + * the seqlock must be taken before and checked after a reference to the
>> + * fence is acquired (as shown here).
>> + *
>> + * The caller is required to hold the RCU read lock.
>
> Would be good to cross reference the various fence_get functions a bit
> better in the docs. But since the docs aren't yet pulled into the rst/html
> output, that doesn't matter that much
Hi Daniel ... I'am working on ;-)
* http://return42.github.io/sphkerneldoc/linux_src_doc/include/linux/fence_h.html
-- Markus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
2016-08-29 7:08 ` [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() " Chris Wilson
@ 2016-09-23 13:43 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:43 UTC (permalink / raw)
To: Chris Wilson
Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
Christian König, Alex Deucher, Sumit Semwal, linux-media,
linaro-mm-sig
On Mon, Aug 29, 2016 at 08:08:32AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
> 1 file changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 3369e4668e96..e74493e7332b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
> bool test_all)
> {
> unsigned seq, shared_count;
> - int ret = true;
> + int ret;
>
> + rcu_read_lock();
> retry:
> + ret = true;
> shared_count = 0;
> seq = read_seqcount_begin(&obj->seq);
> - rcu_read_lock();
>
> if (test_all) {
> unsigned i;
> @@ -490,46 +491,35 @@ retry:
> if (fobj)
> shared_count = fobj->shared_count;
>
> - if (read_seqcount_retry(&obj->seq, seq))
> - goto unlock_retry;
> -
> for (i = 0; i < shared_count; ++i) {
> struct fence *fence = rcu_dereference(fobj->shared[i]);
>
> ret = reservation_object_test_signaled_single(fence);
> if (ret < 0)
> - goto unlock_retry;
> + goto retry;
> else if (!ret)
> break;
> }
>
> - /*
> - * There could be a read_seqcount_retry here, but nothing cares
> - * about whether it's the old or newer fence pointers that are
> - * signaled. That race could still have happened after checking
> - * read_seqcount_retry. If you care, use ww_mutex_lock.
> - */
> + if (read_seqcount_retry(&obj->seq, seq))
> + goto retry;
> }
>
> if (!shared_count) {
> struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>
> - if (read_seqcount_retry(&obj->seq, seq))
> - goto unlock_retry;
> -
> if (fence_excl) {
> ret = reservation_object_test_signaled_single(
> fence_excl);
> if (ret < 0)
> - goto unlock_retry;
> + goto retry;
> +
> + if (read_seqcount_retry(&obj->seq, seq))
> + goto retry;
> }
> }
>
> rcu_read_unlock();
> return ret;
> -
> -unlock_retry:
> - rcu_read_unlock();
> - goto retry;
> }
> EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
> --
> 2.9.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
2016-08-29 7:08 ` [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single Chris Wilson
@ 2016-09-23 13:49 ` Daniel Vetter
2016-09-23 14:02 ` Chris Wilson
0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:49 UTC (permalink / raw)
To: Chris Wilson; +Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media
On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> With the seqlock now extended to cover the lookup of the fence and its
> testing, we can perform that testing solely under the seqlock guard and
> avoid the effective locking and serialisation of acquiring a reference to
> the request. As the fence is RCU protected we know it cannot disappear
> as we test it, the same guarantee that made it safe to acquire the
> reference previously. The seqlock tests whether the fence was replaced
> as we are testing it telling us whether or not we can trust the result
> (if not, we just repeat the test until stable).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
Not entirely sure this is safe for non-i915 drivers. We might now call
->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
i915 can do that, but other drivers might go boom.
I think in generic code we can't do these kind of tricks unfortunately.
-Daniel
> ---
> drivers/dma-buf/reservation.c | 32 ++++----------------------------
> 1 file changed, 4 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index e74493e7332b..1ddffa5adb5a 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -442,24 +442,6 @@ unlock_retry:
> }
> EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
>
> -
> -static inline int
> -reservation_object_test_signaled_single(struct fence *passed_fence)
> -{
> - struct fence *fence, *lfence = passed_fence;
> - int ret = 1;
> -
> - if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
> - fence = fence_get_rcu(lfence);
> - if (!fence)
> - return -1;
> -
> - ret = !!fence_is_signaled(fence);
> - fence_put(fence);
> - }
> - return ret;
> -}
> -
> /**
> * reservation_object_test_signaled_rcu - Test if a reservation object's
> * fences have been signaled.
> @@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
> bool test_all)
> {
> unsigned seq, shared_count;
> - int ret;
> + bool ret;
>
> rcu_read_lock();
> retry:
> @@ -494,10 +476,8 @@ retry:
> for (i = 0; i < shared_count; ++i) {
> struct fence *fence = rcu_dereference(fobj->shared[i]);
>
> - ret = reservation_object_test_signaled_single(fence);
> - if (ret < 0)
> - goto retry;
> - else if (!ret)
> + ret = fence_is_signaled(fence);
> + if (!ret)
> break;
> }
>
> @@ -509,11 +489,7 @@ retry:
> struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>
> if (fence_excl) {
> - ret = reservation_object_test_signaled_single(
> - fence_excl);
> - if (ret < 0)
> - goto retry;
> -
> + ret = fence_is_signaled(fence_excl);
> if (read_seqcount_retry(&obj->seq, seq))
> goto retry;
> }
> --
> 2.9.3
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-08-29 7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-29 18:16 ` [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0) Chris Wilson
@ 2016-09-23 13:50 ` Daniel Vetter
2016-09-23 14:15 ` Chris Wilson
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:50 UTC (permalink / raw)
To: Chris Wilson
Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal
On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
>
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
>
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
>
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Need to strike the r-b here, since Christian König pointed out that
objects won't magically switch signalling on.
-Daniel
> ---
> drivers/dma-buf/dma-buf.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
> if (!events)
> return 0;
>
> + if (poll_does_not_wait(poll)) {
> + if (events & POLLOUT &&
> + !reservation_object_test_signaled_rcu(resv, true))
> + events &= ~(POLLOUT | POLLIN);
> +
> + if (events & POLLIN &&
> + !reservation_object_test_signaled_rcu(resv, false))
> + events &= ~POLLIN;
> +
> + return events;
> + }
> +
> retry:
> seq = read_seqcount_begin(&resv->seq);
> rcu_read_lock();
> --
> 2.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
2016-09-23 13:49 ` [Linaro-mm-sig] " Daniel Vetter
@ 2016-09-23 14:02 ` Chris Wilson
2016-09-25 20:43 ` Daniel Vetter
0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-09-23 14:02 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media
On Fri, Sep 23, 2016 at 03:49:26PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> > With the seqlock now extended to cover the lookup of the fence and its
> > testing, we can perform that testing solely under the seqlock guard and
> > avoid the effective locking and serialisation of acquiring a reference to
> > the request. As the fence is RCU protected we know it cannot disappear
> > as we test it, the same guarantee that made it safe to acquire the
> > reference previously. The seqlock tests whether the fence was replaced
> > as we are testing it telling us whether or not we can trust the result
> > (if not, we just repeat the test until stable).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
>
> Not entirely sure this is safe for non-i915 drivers. We might now call
> ->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
> i915 can do that, but other drivers might go boom.
All fences must be under RCU guard, or is that the sticking point? Given
that, the problem is fence reallocation within the RCU grace period. If
we can mandate that fence reallocation must be safe for concurrent
fence->ops->*(), we can use this technique to avoid the serialisation
barrier otherwise required. In the simple stress test, the difference is
an order of magnitude, and test_signaled_rcu is often on a path where
every memory barrier quickly adds up (at least for us).
So is it just that you worry that others using SLAB_DESTROY_BY_RCU won't
ensure their fence is safe during the reallocation?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
@ 2016-09-23 14:15 ` Chris Wilson
2016-09-23 15:06 ` Chris Wilson
2016-09-23 15:20 ` Chris Wilson
2 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-09-23 14:15 UTC (permalink / raw)
To: Daniel Vetter
Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> >
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> >
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> >
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.
The point being here that we don't even want to switch signaling on! :)
Christian's point was that not all fences guarantee forward progress
irrespective of whether signaling is enabled or not, and fences are not
required to guarantee forward progress without signaling even if they
provide an ops->signaled().
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
2016-09-23 14:15 ` Chris Wilson
@ 2016-09-23 15:06 ` Chris Wilson
2016-09-23 15:20 ` Chris Wilson
2 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-09-23 15:06 UTC (permalink / raw)
To: Daniel Vetter
Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> >
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> >
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> >
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.
Propagating a flag through to sync_file is trivial, but not through to
the dma_buf->resv. Looks like dma-buf will be without a fast busy query,
which I guess in the grand scheme of things (i.e. dma-buf itself is not
intended to be used as a fence) is not that important.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
2016-09-23 14:15 ` Chris Wilson
2016-09-23 15:06 ` Chris Wilson
@ 2016-09-23 15:20 ` Chris Wilson
2016-09-23 17:59 ` Christian König
2 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2016-09-23 15:20 UTC (permalink / raw)
To: Daniel Vetter
Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> >
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> >
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> >
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.
Oh, it also means that
commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
Author: Jammy Zhou <Jammy.Zhou@amd.com>
Date: Wed Jan 21 18:35:47 2015 +0800
reservation: wait only with non-zero timeout specified (v3)
When the timeout value passed to reservation_object_wait_timeout_rcu
is zero, no wait should be done if the fences are not signaled.
Return '1' for idle and '0' for busy if the specified timeout is '0'
to keep consistent with the case of non-zero timeout.
v2: call fence_put if not signaled in the case of timeout==0
v3: switch to reservation_object_test_signaled_rcu
Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
is wrong. And reservation_object_test_signaled_rcu() is unreliable.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-09-23 15:20 ` Chris Wilson
@ 2016-09-23 17:59 ` Christian König
2016-09-25 20:44 ` Daniel Vetter
0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2016-09-23 17:59 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
linux-media, Sumit Semwal
Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
>>> Currently we install a callback for performing poll on a dma-buf,
>>> irrespective of the timeout. This involves taking a spinlock, as well as
>>> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
>>> multiple threads.
>>>
>>> We can query whether the poll will block prior to installing the
>>> callback to make the busy-query fast.
>>>
>>> Single thread: 60% faster
>>> 8 threads on 4 (+4 HT) cores: 600% faster
>>>
>>> Still not quite the perfect scaling we get with a native busy ioctl, but
>>> poll(dmabuf) is faster due to the quicker lookup of the object and
>>> avoiding drm_ioctl().
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: linux-media@vger.kernel.org
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Need to strike the r-b here, since Christian König pointed out that
>> objects won't magically switch signalling on.
> Oh, it also means that
>
> commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> Author: Jammy Zhou <Jammy.Zhou@amd.com>
> Date: Wed Jan 21 18:35:47 2015 +0800
>
> reservation: wait only with non-zero timeout specified (v3)
>
> When the timeout value passed to reservation_object_wait_timeout_rcu
> is zero, no wait should be done if the fences are not signaled.
>
> Return '1' for idle and '0' for busy if the specified timeout is '0'
> to keep consistent with the case of non-zero timeout.
>
> v2: call fence_put if not signaled in the case of timeout==0
>
> v3: switch to reservation_object_test_signaled_rcu
>
> Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
>
> is wrong. And reservation_object_test_signaled_rcu() is unreliable.
Ups indeed, that patch is wrong as well.
I suggest that we just enable the signaling in this case as well.
Regards,
Christian.
> -Chris
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
2016-09-23 14:02 ` Chris Wilson
@ 2016-09-25 20:43 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:43 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
linux-media
On Fri, Sep 23, 2016 at 03:02:32PM +0100, Chris Wilson wrote:
> On Fri, Sep 23, 2016 at 03:49:26PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> > > With the seqlock now extended to cover the lookup of the fence and its
> > > testing, we can perform that testing solely under the seqlock guard and
> > > avoid the effective locking and serialisation of acquiring a reference to
> > > the request. As the fence is RCU protected we know it cannot disappear
> > > as we test it, the same guarantee that made it safe to acquire the
> > > reference previously. The seqlock tests whether the fence was replaced
> > > as we are testing it telling us whether or not we can trust the result
> > > (if not, we just repeat the test until stable).
> > >
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: linux-media@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> >
> > Not entirely sure this is safe for non-i915 drivers. We might now call
> > ->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
> > i915 can do that, but other drivers might go boom.
>
> All fences must be under RCU guard, or is that the sticking point? Given
> that, the problem is fence reallocation within the RCU grace period. If
> we can mandate that fence reallocation must be safe for concurrent
> fence->ops->*(), we can use this technique to avoid the serialisation
> barrier otherwise required. In the simple stress test, the difference is
> an order of magnitude, and test_signaled_rcu is often on a path where
> every memory barrier quickly adds up (at least for us).
>
> So is it just that you worry that others using SLAB_DESTROY_BY_RCU won't
> ensure their fence is safe during the reallocation?
Before your patch the rcu-protected part was just the
kref_get_unless_zero. This was done before calling down into and
fenc->ops->* functions. Which means the code of these functions was
guaranteed to run on a real fence object, and not a zombie fence in the
process of getting destructed.
Afaiui with your patch we might call into fence->ops->* on these zombie
fences. That would be a behaviour change with rather big implications
(since we'd need to audit all existing implementations, and also make sure
all future ones will be ok too). Or I missed something again.
I think we could still to this trick, at least partially, by making sure
we only inspect generic fence state and never call into fence->ops before
we're guaranteed to have a real fence.
But atm (at least per Christian König) a fence won't eventually get
signalled without calling into ->ops functions, so there's a catch 22.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
2016-09-23 17:59 ` Christian König
@ 2016-09-25 20:44 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:44 UTC (permalink / raw)
To: Christian König
Cc: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
linux-media, Sumit Semwal
On Fri, Sep 23, 2016 at 07:59:44PM +0200, Christian König wrote:
> Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > > > Currently we install a callback for performing poll on a dma-buf,
> > > > irrespective of the timeout. This involves taking a spinlock, as well as
> > > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > > > multiple threads.
> > > >
> > > > We can query whether the poll will block prior to installing the
> > > > callback to make the busy-query fast.
> > > >
> > > > Single thread: 60% faster
> > > > 8 threads on 4 (+4 HT) cores: 600% faster
> > > >
> > > > Still not quite the perfect scaling we get with a native busy ioctl, but
> > > > poll(dmabuf) is faster due to the quicker lookup of the object and
> > > > avoiding drm_ioctl().
> > > >
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Need to strike the r-b here, since Christian König pointed out that
> > > objects won't magically switch signalling on.
> > Oh, it also means that
> >
> > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> > Author: Jammy Zhou <Jammy.Zhou@amd.com>
> > Date: Wed Jan 21 18:35:47 2015 +0800
> >
> > reservation: wait only with non-zero timeout specified (v3)
> > When the timeout value passed to reservation_object_wait_timeout_rcu
> > is zero, no wait should be done if the fences are not signaled.
> > Return '1' for idle and '0' for busy if the specified timeout is '0'
> > to keep consistent with the case of non-zero timeout.
> > v2: call fence_put if not signaled in the case of timeout==0
> > v3: switch to reservation_object_test_signaled_rcu
> > Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> > Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> > Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> >
> > is wrong. And reservation_object_test_signaled_rcu() is unreliable.
>
> Ups indeed, that patch is wrong as well.
>
> I suggest that we just enable the signaling in this case as well.
Will you/Zhou take care of this corner case? Just so I can't forget about
it ;-)
Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-09-25 20:44 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20160829070834.22296-1-chris@chris-wilson.co.uk>
2016-08-29 7:08 ` [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe() Chris Wilson
2016-09-23 12:59 ` Daniel Vetter
2016-09-23 13:34 ` Markus Heiser
2016-08-29 7:08 ` [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes Chris Wilson
2016-09-23 13:03 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() " Chris Wilson
2016-09-23 13:18 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() " Chris Wilson
2016-09-23 13:43 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single Chris Wilson
2016-09-23 13:49 ` [Linaro-mm-sig] " Daniel Vetter
2016-09-23 14:02 ` Chris Wilson
2016-09-25 20:43 ` Daniel Vetter
2016-08-29 7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-29 18:16 ` [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0) Chris Wilson
2016-08-29 18:26 ` Gustavo Padovan
2016-09-13 14:46 ` Sumit Semwal
2016-09-23 13:50 ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
2016-09-23 14:15 ` Chris Wilson
2016-09-23 15:06 ` Chris Wilson
2016-09-23 15:20 ` Chris Wilson
2016-09-23 17:59 ` Christian König
2016-09-25 20:44 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).