From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jason Wang <jasowang@redhat.com>,
brouer@redhat.com
Subject: Re: [PATCH 1/3] ptr_ring: batch ring zeroing
Date: Sat, 8 Apr 2017 14:14:08 +0200 [thread overview]
Message-ID: <20170408141408.2101017e@redhat.com> (raw)
In-Reply-To: <1491544049-19108-1-git-send-email-mst@redhat.com>
On Fri, 7 Apr 2017 08:49:57 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> A known weakness in ptr_ring design is that it does not handle well the
> situation when ring is almost full: as entries are consumed they are
> immediately used again by the producer, so consumer and producer are
> writing to a shared cache line.
>
> To fix this, add batching to consume calls: as entries are
> consumed do not write NULL into the ring until we get
> a multiple (in current implementation 2x) of cache lines
> away from the producer. At that point, write them all out.
>
> We do the write out in the reverse order to keep
> producer from sharing cache with consumer for as long
> as possible.
>
> Writeout also triggers when ring wraps around - there's
> no special reason to do this but it helps keep the code
> a bit simpler.
>
> What should we do if getting away from producer by 2 cache lines
> would mean we are keeping the ring moe than half empty?
> Maybe we should reduce the batching in this case,
> current patch simply reduces the batching.
>
> Notes:
> - it is no longer true that a call to consume guarantees
> that the following call to produce will succeed.
> No users seem to assume that.
> - batching can also in theory reduce the signalling rate:
> users that would previously send interrups to the producer
> to wake it up after consuming each entry would now only
> need to do this once in a batch.
> Doing this would be easy by returning a flag to the caller.
> No users seem to do signalling on consume yet so this was not
> implemented yet.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Jason, I am curious whether the following gives you some of
> the performance boost that you see with vhost batching
> patches. Is vhost batching on top still helpful?
>
> include/linux/ptr_ring.h | 63 +++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 54 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 6c70444..6b2e0dd 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -34,11 +34,13 @@
> struct ptr_ring {
> int producer ____cacheline_aligned_in_smp;
> spinlock_t producer_lock;
> - int consumer ____cacheline_aligned_in_smp;
> + int consumer_head ____cacheline_aligned_in_smp; /* next valid entry */
> + int consumer_tail; /* next entry to invalidate */
> spinlock_t consumer_lock;
> /* Shared consumer/producer data */
> /* Read-only by both the producer and the consumer */
> int size ____cacheline_aligned_in_smp; /* max entries in queue */
> + int batch; /* number of entries to consume in a batch */
> void **queue;
> };
>
> @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct ptr_ring *r, void *ptr)
> static inline void *__ptr_ring_peek(struct ptr_ring *r)
> {
> if (likely(r->size))
> - return r->queue[r->consumer];
> + return r->queue[r->consumer_head];
> return NULL;
> }
>
> @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct ptr_ring *r)
> /* Must only be called after __ptr_ring_peek returned !NULL */
> static inline void __ptr_ring_discard_one(struct ptr_ring *r)
> {
> - r->queue[r->consumer++] = NULL;
> - if (unlikely(r->consumer >= r->size))
> - r->consumer = 0;
> + /* Fundamentally, what we want to do is update consumer
> + * index and zero out the entry so producer can reuse it.
> + * Doing it naively at each consume would be as simple as:
> + * r->queue[r->consumer++] = NULL;
> + * if (unlikely(r->consumer >= r->size))
> + * r->consumer = 0;
> + * but that is suboptimal when the ring is full as producer is writing
> + * out new entries in the same cache line. Defer these updates until a
> + * batch of entries has been consumed.
> + */
> + int head = r->consumer_head++;
> +
> + /* Once we have processed enough entries invalidate them in
> + * the ring all at once so producer can reuse their space in the ring.
> + * We also do this when we reach end of the ring - not mandatory
> + * but helps keep the implementation simple.
> + */
> + if (unlikely(r->consumer_head - r->consumer_tail >= r->batch ||
> + r->consumer_head >= r->size)) {
> + /* Zero out entries in the reverse order: this way we touch the
> + * cache line that producer might currently be reading the last;
> + * producer won't make progress and touch other cache lines
> + * besides the first one until we write out all entries.
> + */
> + while (likely(head >= r->consumer_tail))
> + r->queue[head--] = NULL;
> + r->consumer_tail = r->consumer_head;
> + }
> + if (unlikely(r->consumer_head >= r->size)) {
> + r->consumer_head = 0;
> + r->consumer_tail = 0;
> + }
> }
I love this idea. Reviewed and discussed the idea in-person with MST
during netdevconf[1] at this laptop. I promised I will also run it
through my micro-benchmarking[2] once I return home (hint ptr_ring gets
used in network stack as skb_array).
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
[1] http://netdevconf.org/2.1/
[2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_bench01.c
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-04-08 12:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-07 5:49 [PATCH 1/3] ptr_ring: batch ring zeroing Michael S. Tsirkin
2017-04-07 5:50 ` [PATCH 2/3] ringtest: support test specific parameters Michael S. Tsirkin
2017-04-07 5:50 ` [PATCH 3/3] ptr_ring: support testing different batching sizes Michael S. Tsirkin
2017-04-08 12:14 ` Jesper Dangaard Brouer [this message]
2017-05-09 13:33 ` [PATCH 1/3] ptr_ring: batch ring zeroing Michael S. Tsirkin
2017-05-10 3:30 ` Jason Wang
2017-05-10 12:22 ` Michael S. Tsirkin
2017-05-10 9:18 ` Jesper Dangaard Brouer
2017-05-10 12:20 ` Michael S. Tsirkin
2017-04-12 8:03 ` Jason Wang
2017-04-14 7:52 ` Jason Wang
2017-04-14 21:00 ` Michael S. Tsirkin
2017-04-18 2:16 ` Jason Wang
2017-04-14 22:50 ` Michael S. Tsirkin
2017-04-18 2:18 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170408141408.2101017e@redhat.com \
--to=brouer@redhat.com \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox