Re: [PATCH 09/12] ring: introduce lockless ring buffer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <guangrong.xiao@gmail.com>
To: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org, mst@redhat.com, peterz@infradead.org,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	stefani@seibold.net, mtosatti@redhat.com,
	Xiao Guangrong <xiaoguangrong@tencent.com>,
	dgilbert@redhat.com, qemu-devel@nongnu.org, wei.w.wang@intel.com,
	jiang.biao2@zte.com.cn, pbonzini@redhat.com,
	paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH 09/12] ring: introduce lockless ring buffer
Date: Thu, 28 Jun 2018 18:02:35 +0800	[thread overview]
Message-ID: <6f871708-d38b-2acf-350a-985c8a23a013@gmail.com> (raw)
In-Reply-To: <20180620045203.GE18985@xz-mi>


CC: Paul, Peter Zijlstra, Stefani, Lai who are all good at memory barrier.


On 06/20/2018 12:52 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> It's the simple lockless ring buffer implement which supports both
>> single producer vs. single consumer and multiple producers vs.
>> single consumer.
>>
>> Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's
>> rte_ring (2) before i wrote this implement. It corrects some bugs of
>> memory barriers in kfifo and it is the simpler lockless version of
>> rte_ring as currently multiple access is only allowed for producer.
> 
> Could you provide some more information about the kfifo bug?  Any
> pointer would be appreciated.
> 

Sure, i reported one of the memory barrier issue to linux kernel:
    https://lkml.org/lkml/2018/5/11/58

Actually, beside that, there is another memory barrier issue in kfifo,
please consider this case:

    at the beginning
    ring->size = 4
    ring->out = 0
    ring->in = 4

      Consumer                            Producer
  ---------------                     --------------
    index = ring->out; /* index == 0 */
    ring->out++; /* ring->out == 1 */
    < Re-Order >
                                     out = ring->out;
                                     if (ring->in - out >= ring->mask)
                                         return -EFULL;
                                     /* see the ring is not full */
                                     index = ring->in & ring->mask; /* index == 0 */
                                     ring->data[index] = new_data;
　　　　　　　　　　　　　　　　　　　　　ring->in++;

    data = ring->data[index];
    !!!!!! the old data is lost !!!!!!

So we need to make sure:
1) for the consumer, we should read the ring->data[] out before updating ring->out
2) for the producer, we should read ring->out before updating ring->data[]

as followings:
       Producer                                       Consumer
   ------------------------------------         ------------------------
       Reading ring->out                            Reading ring->data[index]
       smp_mb()                                     smp_mb()
       Setting ring->data[index] = data             ring->out++

[ i used atomic_store_release() and atomic_load_acquire() instead of smp_mb() in the
   patch. ]

But i am not sure if we can use smp_acquire__after_ctrl_dep() in the producer?

>>
>> If has single producer vs. single consumer, it is the traditional fifo,
>> If has multiple producers, it uses the algorithm as followings:
>>
>> For the producer, it uses two steps to update the ring:
>>     - first step, occupy the entry in the ring:
>>
>> retry:
>>        in = ring->in
>>        if (cmpxhg(&ring->in, in, in +1) != in)
>>              goto retry;
>>
>>       after that the entry pointed by ring->data[in] has been owned by
>>       the producer.
>>
>>       assert(ring->data[in] == NULL);
>>
>>       Note, no other producer can touch this entry so that this entry
>>       should always be the initialized state.
>>
>>     - second step, write the data to the entry:
>>
>>       ring->data[in] = data;
>>
>> For the consumer, it first checks if there is available entry in the
>> ring and fetches the entry from the ring:
>>
>>       if (!ring_is_empty(ring))
>>            entry = &ring[ring->out];
>>
>>       Note: the ring->out has not been updated so that the entry pointed
>>       by ring->out is completely owned by the consumer.
>>
>> Then it checks if the data is ready:
>>
>> retry:
>>       if (*entry == NULL)
>>              goto retry;
>> That means, the producer has updated the index but haven't written any
>> data to it.
>>
>> Finally, it fetches the valid data out, set the entry to the initialized
>> state and update ring->out to make the entry be usable to the producer:
>>
>>        data = *entry;
>>        *entry = NULL;
>>        ring->out++;
>>
>> Memory barrier is omitted here, please refer to the comment in the code.
>>
>> (1) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kfifo.h
>> (2) http://dpdk.org/doc/api/rte__ring_8h.html
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   migration/ring.h | 265 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> If this is a very general implementation, not sure whether we can move
> this to util/ directory so that it can be used even outside migration
> codes.

I thought it too. Currently migration is the only user for it, so i put
it near the code of migration. It's good to me to move it to util/ if you
prefer.

> 
>>   1 file changed, 265 insertions(+)
>>   create mode 100644 migration/ring.h
>>
>> diff --git a/migration/ring.h b/migration/ring.h
>> new file mode 100644
>> index 0000000000..da9b8bdcbb
>> --- /dev/null
>> +++ b/migration/ring.h
>> @@ -0,0 +1,265 @@
>> +/*
>> + * Ring Buffer
>> + *
>> + * Multiple producers and single consumer are supported with lock free.
>> + *
>> + * Copyright (c) 2018 Tencent Inc
>> + *
>> + * Authors:
>> + *  Xiao Guangrong <xiaoguangrong@tencent.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef _RING__
>> +#define _RING__
>> +
>> +#define CACHE_LINE  64
> 
> Is this for x86_64?  Is the cache line size the same for all arch?

64 bytes is just a common size. :)

Does QEMU support pre-configured CACHE_SIZE?

> 
>> +#define cache_aligned __attribute__((__aligned__(CACHE_LINE)))
>> +
>> +#define RING_MULTI_PRODUCER 0x1
>> +
>> +struct Ring {
>> +    unsigned int flags;
>> +    unsigned int size;
>> +    unsigned int mask;
>> +
>> +    unsigned int in cache_aligned;
>> +
>> +    unsigned int out cache_aligned;
>> +
>> +    void *data[0] cache_aligned;
>> +};
>> +typedef struct Ring Ring;
>> +
>> +/*
>> + * allocate and initialize the ring
>> + *
>> + * @size: the number of element, it should be power of 2
>> + * @flags: set to RING_MULTI_PRODUCER if the ring has multiple producer,
>> + *         otherwise set it to 0, i,e. single producer and single consumer.
>> + *
>> + * return the ring.
>> + */
>> +static inline Ring *ring_alloc(unsigned int size, unsigned int flags)
>> +{
>> +    Ring *ring;
>> +
>> +    assert(is_power_of_2(size));
>> +
>> +    ring = g_malloc0(sizeof(*ring) + size * sizeof(void *));
>> +    ring->size = size;
>> +    ring->mask = ring->size - 1;
>> +    ring->flags = flags;
>> +    return ring;
>> +}
>> +
>> +static inline void ring_free(Ring *ring)
>> +{
>> +    g_free(ring);
>> +}
>> +
>> +static inline bool __ring_is_empty(unsigned int in, unsigned int out)
>> +{
>> +    return in == out;
>> +}
> 
> (some of the helpers are a bit confusing to me like this one; I would
>   prefer some of the helpers be directly squashed into code, but it's a
>   personal preference only)
> 

I will carefully consider it in the next version...

>> +
>> +static inline bool ring_is_empty(Ring *ring)
>> +{
>> +    return ring->in == ring->out;
>> +}
>> +
>> +static inline unsigned int ring_len(unsigned int in, unsigned int out)
>> +{
>> +    return in - out;
>> +}
> 
> (this too)
> 
>> +
>> +static inline bool
>> +__ring_is_full(Ring *ring, unsigned int in, unsigned int out)
>> +{
>> +    return ring_len(in, out) > ring->mask;
>> +}
>> +
>> +static inline bool ring_is_full(Ring *ring)
>> +{
>> +    return __ring_is_full(ring, ring->in, ring->out);
>> +}
>> +
>> +static inline unsigned int ring_index(Ring *ring, unsigned int pos)
>> +{
>> +    return pos & ring->mask;
>> +}
>> +
>> +static inline int __ring_put(Ring *ring, void *data)
>> +{
>> +    unsigned int index, out;
>> +
>> +    out = atomic_load_acquire(&ring->out);
>> +    /*
>> +     * smp_mb()
>> +     *
>> +     * should read ring->out before updating the entry, see the comments in
>> +     * __ring_get().
> 
> Nit: here I think it means the comment in [1] below.  Maybe:
> 
>    "see the comments in __ring_get() when calling
>     atomic_store_release()"
> 
> ?

Yes, you are right, i will address your suggestion.

> 
>> +     */
>> +
>> +    if (__ring_is_full(ring, ring->in, out)) {
>> +        return -ENOBUFS;
>> +    }
>> +
>> +    index = ring_index(ring, ring->in);
>> +
>> +    atomic_set(&ring->data[index], data);
>> +
>> +    /*
>> +     * should make sure the entry is updated before increasing ring->in
>> +     * otherwise the consumer will get a entry but its content is useless.
>> +     */
>> +    smp_wmb();
>> +    atomic_set(&ring->in, ring->in + 1);
> 
> Pure question: could we use store_release() instead of a mixture of
> store/release and raw memory barriers in the function?  Or is there
> any performance consideration behind?
> 
> It'll be nice to mention the performance considerations if there is.

I think atomic_mb_read() and atomic_mb_set() is what you are
talking about. These operations speed up read accesses but
slow done write accesses that is not suitable for our case.

> 
>> +    return 0;
>> +}
>> +
>> +static inline void *__ring_get(Ring *ring)
>> +{
>> +    unsigned int index, in;
>> +    void *data;
>> +
>> +    in = atomic_read(&ring->in);
>> +
>> +    /*
>> +     * should read ring->in first to make sure the entry pointed by this
>> +     * index is available, see the comments in __ring_put().
>> +     */
> 
> Nit: similar to above, maybe mention about which comment would be a
> bit nicer.

Yes, will improve it.

next prev parent reply	other threads:[~2018-06-28 10:02 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-04  9:55 [PATCH 00/12] migration: improve multithreads for compression and decompression guangrong.xiao
2018-06-04  9:55 ` [PATCH 01/12] migration: do not wait if no free thread guangrong.xiao
2018-06-11  7:39   ` Peter Xu
2018-06-12  2:42     ` Xiao Guangrong
2018-06-12  3:15       ` Peter Xu
2018-06-13 15:43         ` Dr. David Alan Gilbert
2018-06-14  3:19           ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 02/12] migration: fix counting normal page for compression guangrong.xiao
2018-06-13 15:51   ` Dr. David Alan Gilbert
2018-06-14  3:32     ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 03/12] migration: fix counting xbzrle cache_miss_rate guangrong.xiao
2018-06-13 16:09   ` Dr. David Alan Gilbert
2018-06-15 11:30   ` Dr. David Alan Gilbert
2018-06-04  9:55 ` [PATCH 04/12] migration: introduce migration_update_rates guangrong.xiao
2018-06-13 16:17   ` Dr. David Alan Gilbert
2018-06-14  3:35     ` Xiao Guangrong
2018-06-15 11:32     ` Dr. David Alan Gilbert
2018-06-04  9:55 ` [PATCH 05/12] migration: show the statistics of compression guangrong.xiao
2018-06-04 22:31   ` Eric Blake
2018-06-06 12:44     ` Xiao Guangrong
2018-06-13 16:25   ` Dr. David Alan Gilbert
2018-06-14  6:48     ` Xiao Guangrong
2018-07-16 19:01       ` Dr. David Alan Gilbert
2018-07-18  8:51         ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 06/12] migration: do not detect zero page for compression guangrong.xiao
2018-06-19  7:30   ` Peter Xu
2018-06-28  9:12     ` Xiao Guangrong
2018-06-28  9:36       ` Daniel P. Berrangé
2018-06-29  3:50         ` Xiao Guangrong
2018-06-29  9:54         ` Dr. David Alan Gilbert
2018-06-29  9:42       ` Dr. David Alan Gilbert
2018-07-03  3:53         ` Xiao Guangrong
2018-07-16 18:58           ` Dr. David Alan Gilbert
2018-07-18  8:46             ` Xiao Guangrong
2018-07-22 16:05               ` Michael S. Tsirkin
2018-07-23  7:12                 ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 07/12] migration: hold the lock only if it is really needed guangrong.xiao
2018-06-19  7:36   ` Peter Xu
2018-06-28  9:33     ` Xiao Guangrong
2018-06-29 11:22       ` Dr. David Alan Gilbert
2018-07-03  6:27         ` Xiao Guangrong
2018-07-11  8:21       ` Peter Xu
2018-07-12  7:47         ` Xiao Guangrong
2018-07-12  8:26           ` Peter Xu
2018-07-18  8:56             ` Xiao Guangrong
2018-07-18 10:18               ` Peter Xu
2018-07-13 17:44           ` Dr. David Alan Gilbert
2018-06-04  9:55 ` [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration guangrong.xiao
2018-07-13 18:01   ` Dr. David Alan Gilbert
2018-07-18  8:44     ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 09/12] ring: introduce lockless ring buffer guangrong.xiao
2018-06-20  4:52   ` Peter Xu
2018-06-28 10:02     ` Xiao Guangrong [this message]
2018-06-28 11:55       ` Wei Wang
2018-06-29  3:55         ` Xiao Guangrong
2018-07-03 15:55           ` Paul E. McKenney
2018-06-20  5:55   ` Peter Xu
2018-06-28 14:00     ` Xiao Guangrong
2018-06-20 12:38   ` Michael S. Tsirkin
2018-06-29  7:30     ` Xiao Guangrong
2018-06-29 13:08       ` Michael S. Tsirkin
2018-07-03  7:31         ` Xiao Guangrong
2018-06-28 13:36   ` Jason Wang
2018-06-29  3:59     ` Xiao Guangrong
2018-06-29  6:15       ` Jason Wang
2018-06-29  7:47         ` Xiao Guangrong
2018-06-29  4:23     ` Michael S. Tsirkin
2018-06-29  7:44       ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 10/12] migration: introduce lockless multithreads model guangrong.xiao
2018-06-20  6:52   ` Peter Xu
2018-06-28 14:25     ` Xiao Guangrong
2018-07-13 16:24     ` Dr. David Alan Gilbert
2018-07-18  7:12       ` Xiao Guangrong
2018-06-04  9:55 ` [PATCH 11/12] migration: use lockless Multithread model for compression guangrong.xiao
2018-06-04  9:55 ` [PATCH 12/12] migration: use lockless Multithread model for decompression guangrong.xiao
2018-06-11  8:00 ` [PATCH 00/12] migration: improve multithreads for compression and decompression Peter Xu
2018-06-12  3:19   ` Xiao Guangrong
2018-06-12  5:36     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f871708-d38b-2acf-350a-985c8a23a013@gmail.com \
    --to=guangrong.xiao@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=jiang.biao2@zte.com.cn \
    --cc=jiangshanlai@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefani@seibold.net \
    --cc=wei.w.wang@intel.com \
    --cc=xiaoguangrong@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.