public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-08-28 16:37 Chris Wilson
  2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
  2016-08-28 20:33 ` Chris Wilson
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 16:37 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Sumit Semwal, linux-media, dri-devel, linaro-mm-sig

Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.

We can query whether the poll will block prior to installing the
callback to make the busy-query fast.

Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster

Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/dma-buf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
 	if (!events)
 		return 0;
 
+	if (poll_does_not_wait(poll)) {
+		if (events & POLLOUT &&
+		    !reservation_object_test_signaled_rcu(resv, true))
+			events &= ~(POLLOUT | POLLIN);
+
+		if (events & POLLIN &&
+		    !reservation_object_test_signaled_rcu(resv, false))
+			events &= ~POLLIN;
+
+		return events;
+	}
+
 retry:
 	seq = read_seqcount_begin(&resv->seq);
 	rcu_read_lock();
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
@ 2016-08-28 17:04 ` Daniel Vetter
  2016-08-28 20:33 ` Chris Wilson
  1 sibling, 0 replies; 4+ messages in thread
From: Daniel Vetter @ 2016-08-28 17:04 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, linaro-mm-sig, linux-media, Sumit Semwal, dri-devel

On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
> 
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
> 
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
> 
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/dma-buf/dma-buf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
>  	if (!events)
>  		return 0;
>  
> +	if (poll_does_not_wait(poll)) {
> +		if (events & POLLOUT &&
> +		    !reservation_object_test_signaled_rcu(resv, true))
> +			events &= ~(POLLOUT | POLLIN);
> +
> +		if (events & POLLIN &&
> +		    !reservation_object_test_signaled_rcu(resv, false))
> +			events &= ~POLLIN;
> +
> +		return events;
> +	}
> +
>  retry:
>  	seq = read_seqcount_begin(&resv->seq);
>  	rcu_read_lock();
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
  2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
@ 2016-08-28 20:33 ` Chris Wilson
  2016-08-28 20:50   ` Chris Wilson
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 20:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sumit Semwal, linux-media, dri-devel, linaro-mm-sig

On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
> 
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
> 
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster

Hmm, this only really applies to the idle case.
reservation_object_test_signaled_rcu() is still a major bottleneck when
busy, due to the dance inside reservation_object_test_signaled_single()
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-08-28 20:33 ` Chris Wilson
@ 2016-08-28 20:50   ` Chris Wilson
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 20:50 UTC (permalink / raw)
  To: intel-gfx, Sumit Semwal, linux-media, dri-devel, linaro-mm-sig

On Sun, Aug 28, 2016 at 09:33:54PM +0100, Chris Wilson wrote:
> On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> 
> Hmm, this only really applies to the idle case.
> reservation_object_test_signaled_rcu() is still a major bottleneck when
> busy, due to the dance inside reservation_object_test_signaled_single()

The fix is not difficult, just requires extending the seqlock to catch
the RCU race (i.e. earlier patches). I'll resend that series in the
morning.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-28 20:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
2016-08-28 20:33 ` Chris Wilson
2016-08-28 20:50   ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox