* [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-08-28 16:37 Chris Wilson
2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
2016-08-28 20:33 ` Chris Wilson
0 siblings, 2 replies; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 16:37 UTC (permalink / raw)
To: intel-gfx
Cc: Chris Wilson, Sumit Semwal, linux-media, dri-devel, linaro-mm-sig
Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.
We can query whether the poll will block prior to installing the
callback to make the busy-query fast.
Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster
Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
drivers/dma-buf/dma-buf.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
if (!events)
return 0;
+ if (poll_does_not_wait(poll)) {
+ if (events & POLLOUT &&
+ !reservation_object_test_signaled_rcu(resv, true))
+ events &= ~(POLLOUT | POLLIN);
+
+ if (events & POLLIN &&
+ !reservation_object_test_signaled_rcu(resv, false))
+ events &= ~POLLIN;
+
+ return events;
+ }
+
retry:
seq = read_seqcount_begin(&resv->seq);
rcu_read_lock();
--
2.9.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
@ 2016-08-28 17:04 ` Daniel Vetter
2016-08-28 20:33 ` Chris Wilson
1 sibling, 0 replies; 4+ messages in thread
From: Daniel Vetter @ 2016-08-28 17:04 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx, linaro-mm-sig, linux-media, Sumit Semwal, dri-devel
On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
>
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
>
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
>
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/dma-buf/dma-buf.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
> if (!events)
> return 0;
>
> + if (poll_does_not_wait(poll)) {
> + if (events & POLLOUT &&
> + !reservation_object_test_signaled_rcu(resv, true))
> + events &= ~(POLLOUT | POLLIN);
> +
> + if (events & POLLIN &&
> + !reservation_object_test_signaled_rcu(resv, false))
> + events &= ~POLLIN;
> +
> + return events;
> + }
> +
> retry:
> seq = read_seqcount_begin(&resv->seq);
> rcu_read_lock();
> --
> 2.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
@ 2016-08-28 20:33 ` Chris Wilson
2016-08-28 20:50 ` Chris Wilson
1 sibling, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 20:33 UTC (permalink / raw)
To: intel-gfx; +Cc: Sumit Semwal, linux-media, dri-devel, linaro-mm-sig
On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
>
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
>
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
Hmm, this only really applies to the idle case.
reservation_object_test_signaled_rcu() is still a major bottleneck when
busy, due to the dance inside reservation_object_test_signaled_single()
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0
2016-08-28 20:33 ` Chris Wilson
@ 2016-08-28 20:50 ` Chris Wilson
0 siblings, 0 replies; 4+ messages in thread
From: Chris Wilson @ 2016-08-28 20:50 UTC (permalink / raw)
To: intel-gfx, Sumit Semwal, linux-media, dri-devel, linaro-mm-sig
On Sun, Aug 28, 2016 at 09:33:54PM +0100, Chris Wilson wrote:
> On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> >
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> >
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
>
> Hmm, this only really applies to the idle case.
> reservation_object_test_signaled_rcu() is still a major bottleneck when
> busy, due to the dance inside reservation_object_test_signaled_single()
The fix is not difficult, just requires extending the seqlock to catch
the RCU race (i.e. earlier patches). I'll resend that series in the
morning.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-08-28 20:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-28 16:37 [PATCH] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-28 17:04 ` [Intel-gfx] " Daniel Vetter
2016-08-28 20:33 ` Chris Wilson
2016-08-28 20:50 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox