All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Butsykin <pbutsykin@virtuozzo.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com,
	stefanha@redhat.com, den@openvz.org, jsnow@redhat.com,
	eblake@redhat.com, famz@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v2 05/22] block/pcache: add aio requests into cache
Date: Tue, 6 Sep 2016 19:54:54 +0300	[thread overview]
Message-ID: <57CEF4DE.9030901@virtuozzo.com> (raw)
In-Reply-To: <20160901152813.GE6355@noname.redhat.com>

On 01.09.2016 18:28, Kevin Wolf wrote:
> Am 29.08.2016 um 19:10 hat Pavel Butsykin geschrieben:
>> For storing requests use an rbtree, here are add basic operations on the
>> rbtree to work with  cache nodes.
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>> ---
>>   block/pcache.c | 190 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 189 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/pcache.c b/block/pcache.c
>> index 7f221d6..f5022f9 100644
>> --- a/block/pcache.c
>> +++ b/block/pcache.c
>> @@ -27,6 +27,7 @@
>>   #include "block/raw-aio.h"
>>   #include "qapi/error.h"
>>   #include "qapi/qmp/qstring.h"
>> +#include "qemu/rbtree.h"
>>
>>   #define PCACHE_DEBUG
>>
>> @@ -37,9 +38,53 @@
>>   #define DPRINTF(fmt, ...) do { } while (0)
>>   #endif
>>
>> +typedef struct RbNodeKey {
>> +    uint64_t    num;
>> +    uint32_t    size;
>> +} RbNodeKey;
>> +
>> +typedef struct BlockNode {
>> +    struct RbNode               rb_node;
>> +    union {
>> +        RbNodeKey               key;
>> +        struct {
>> +            uint64_t            sector_num;
>> +            uint32_t            nb_sectors;
>> +        };
>> +    };
>
> What's the deal with this union?
>
> It just adds an alias, and 'sector_num'/'nb_sectors' are actually longer
> to type than 'key.num' and 'key.size', so the only advantage that I
> could image doesn't really apply.
>
Yes, you're right. Initially this was done to reduce the code in those
places where the name change will not confuse. But later it turned out
that such places are few.

> But it brings problems: We always have to be careful to keep the two
> structs in sync, and grepping for the field name will only bring up half
> of the users because the other half uses the other alias.
>
Yep, relative to the rest of the Qemu code it looks unusual, and given
the small benefits it is need to remove :)

>> +    QTAILQ_ENTRY(BlockNode) entry;
>> +} BlockNode;
>> +
>> +typedef struct PCNode {
>> +    BlockNode cm;
>> +
>> +    uint8_t                  *data;
>> +} PCNode;
>
> What do 'PC' and 'cm' mean?
>
PC - PrefetchCache, cm - common.

But I think it would be better to fix it:

PrefetchCache - PrefCacheNode, common - common.

>> +typedef struct ReqStor {
>> +    struct {
>> +        struct RbRoot root;
>> +        CoMutex       lock;
>> +    } tree;
>> +
>> +    uint32_t curr_size;
>> +} ReqStor;
>
> Same question for ReqStor. For an identifier that is used only three
> times, it could be a bit more descriptive.
>
Storage requests

> What unit has curr_size or what does it count? The nodes in the tree?

This is the current size of the storage requests.

> Also, cur_ seems to be more common as a prefix than curr_.
>
OK

>> +typedef struct BDRVPCacheState {
>> +    BlockDriverState **bs;
>
> This is unused. (Good thing, it looks weird.)
>
Apparently I forgot to remove it.

>> +    ReqStor pcache;
>> +
>> +    struct {
>> +        QTAILQ_HEAD(pcache_head, BlockNode) head;
>> +        CoMutex lock;
>> +    } list;
>> +} BDRVPCacheState;
>> +
>>   typedef struct PrefCacheAIOCB {
>>       BlockAIOCB common;
>>
>> +    BDRVPCacheState *s;
>
> Not really needed, you already have acb->common.bs.
>
>>       QEMUIOVector *qiov;
>>       uint64_t sector_num;
>>       uint32_t nb_sectors;
>> @@ -64,6 +109,124 @@ static QemuOptsList runtime_opts = {
>>       },
>>   };
>>
>> +#define PCNODE(_n) ((PCNode *)(_n))
>
> container_of() would be preferable for type safety.
>
OK

>> +static int pcache_key_cmp(const RbNodeKey *key1, const RbNodeKey *key2)
>> +{
>> +    assert(key1 != NULL);
>> +    assert(key2 != NULL);
>> +
>> +    if (key1->num >= key2->num + key2->size) {
>> +        return 1;
>> +    }
>> +    if (key1->num + key1->size <= key2->num) {
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void *node_insert(struct RbRoot *root, BlockNode *node)
>> +{
>> +    struct RbNode **new = &(root->rb_node), *parent = NULL;
>> +
>> +    /* Figure out where to put new node */
>> +    while (*new) {
>> +        BlockNode *this = container_of(*new, BlockNode, rb_node);
>> +        int result = pcache_key_cmp(&node->key, &this->key);
>> +        if (result == 0) {
>> +            return this;
>> +        }
>> +        parent = *new;
>> +        new = result < 0 ? &((*new)->rb_left) : &((*new)->rb_right);
>> +    }
>> +    /* Add new node and rebalance tree. */
>> +    rb_link_node(&node->rb_node, parent, new);
>> +    rb_insert_color(&node->rb_node, root);
>> +
>> +    return node;
>> +}
>> +
>> +static inline PCNode *pcache_node_insert(struct RbRoot *root, PCNode *node)
>> +{
>> +    return node_insert(root, &node->cm);
>> +}
>> +
>> +static inline void pcache_node_free(PCNode *node)
>> +{
>> +    g_free(node->data);
>> +    g_slice_free1(sizeof(*node), node);
>> +}
>
> We moved away from g_slice_* because it turned out that it hurt more
> than it helped.
>
Somewhere I saw it, thanks.

>> +static inline void *pcache_node_alloc(RbNodeKey* key)
>> +{
>> +    PCNode *node = g_slice_alloc(sizeof(*node));
>> +
>> +    node->cm.sector_num = key->num;
>> +    node->cm.nb_sectors = key->size;
>
> In other words, node->cm.key = *key;
>
Yep :)

>> +    node->data = g_malloc(node->cm.nb_sectors << BDRV_SECTOR_BITS);
>> +
>> +    return node;
>> +}
>> +
>> +static bool pcache_node_find_and_create(PrefCacheAIOCB *acb, RbNodeKey *key,
>> +                                        PCNode **out_node)
>> +{
>> +    BDRVPCacheState *s = acb->s;
>> +    PCNode *new_node = pcache_node_alloc(key);
>> +    PCNode *found;
>> +
>> +    qemu_co_mutex_lock(&s->pcache.tree.lock);
>> +    found = pcache_node_insert(&s->pcache.tree.root, new_node);
>> +    qemu_co_mutex_unlock(&s->pcache.tree.lock);
>
> pcache_node_insert() doesn't yield, so the CoMutex is unnecessary.
>
>> +    if (found != new_node) {
>> +        pcache_node_free(new_node);
>
> Isn't it a bit wasteful to allocate a new node just in case and then
> immediately free it again if it turns out that we don't need it?
>
Yes, I know about this problem. I wrote about this in TODO list, but
not very detailed.

>> +        *out_node = found;
>> +        return false;
>> +    }
>> +    atomic_add(&s->pcache.curr_size, new_node->cm.nb_sectors);
>
> atomic_add() implies that you have concurrent threads. I don't see any.
>
Yeah, but what about iothread? Doesn't presence of iothread lead to
parallel handling of requests in multiple threads?

>> +    qemu_co_mutex_lock(&s->list.lock);
>> +    QTAILQ_INSERT_HEAD(&s->list.head, &new_node->cm, entry);
>> +    qemu_co_mutex_unlock(&s->list.lock);
>
> Same here as above, QTAILQ_INSERT_HEAD doesn't yield.
>
>> +    *out_node = new_node;
>> +    return true;
>> +}
>> +
>> +static inline void prefetch_init_key(PrefCacheAIOCB *acb, RbNodeKey* key)
>> +{
>> +    key->num = acb->sector_num;
>> +    key->size = acb->nb_sectors;
>> +}
>> +
>> +enum {
>> +    PREFETCH_NEW_NODE  = 0,
>> +    PREFETCH_FULL_UP   = 1,
>> +    PREFETCH_PART_UP   = 2
>> +};
>> +
>> +static int32_t pcache_prefetch(PrefCacheAIOCB *acb)
>> +{
>> +    RbNodeKey key;
>> +    PCNode *node = NULL;
>> +
>> +    prefetch_init_key(acb, &key);
>> +    if (pcache_node_find_and_create(acb, &key, &node)) {
>> +        return PREFETCH_NEW_NODE;
>> +    }
>> +
>> +    /* Node covers the whole request */
>> +    if (node->cm.sector_num <= acb->sector_num &&
>> +        node->cm.sector_num + node->cm.nb_sectors >= acb->sector_num +
>> +                                                     acb->nb_sectors)
>> +    {
>> +        return PREFETCH_FULL_UP;
>> +    }
>> +
>> +    return PREFETCH_PART_UP;
>> +}
>> +
>>   static void pcache_aio_cb(void *opaque, int ret)
>>   {
>>       PrefCacheAIOCB *acb = opaque;
>> @@ -80,6 +243,7 @@ static PrefCacheAIOCB *pcache_aio_get(BlockDriverState *bs, int64_t sector_num,
>>   {
>>       PrefCacheAIOCB *acb = qemu_aio_get(&pcache_aiocb_info, bs, cb, opaque);
>>
>> +    acb->s = bs->opaque;
>>       acb->sector_num = sector_num;
>>       acb->nb_sectors = nb_sectors;
>>       acb->qiov = qiov;
>> @@ -99,6 +263,8 @@ static BlockAIOCB *pcache_aio_readv(BlockDriverState *bs,
>>       PrefCacheAIOCB *acb = pcache_aio_get(bs, sector_num, qiov, nb_sectors, cb,
>>                                            opaque, QEMU_AIO_READ);
>>
>> +    pcache_prefetch(acb);
>> +
>>       bdrv_aio_readv(bs->file, sector_num, qiov, nb_sectors,
>>                      pcache_aio_cb, acb);
>>       return &acb->common;
>> @@ -119,6 +285,17 @@ static BlockAIOCB *pcache_aio_writev(BlockDriverState *bs,
>>       return &acb->common;
>>   }
>>
>> +static void pcache_state_init(QemuOpts *opts, BDRVPCacheState *s)
>> +{
>> +    DPRINTF("pcache configure:\n");
>> +
>> +    s->pcache.tree.root = RB_ROOT;
>> +    qemu_co_mutex_init(&s->pcache.tree.lock);
>> +    QTAILQ_INIT(&s->list.head);
>> +    qemu_co_mutex_init(&s->list.lock);
>
> QTAILQ_INIT() doesn't yield.
>
>> +    s->pcache.curr_size = 0;
>> +}
>> +
>
> Kevin
>

  reply	other threads:[~2016-09-06 21:40 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-29 17:09 [Qemu-devel] [PATCH RFC v2 00/22] I/O prefetch cache Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 01/22] block/pcache: empty pcache driver filter Pavel Butsykin
2016-09-01 14:31   ` Kevin Wolf
2016-09-06 15:20     ` Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 02/22] block/pcache: add own AIOCB block Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 03/22] util/rbtree: add rbtree from linux kernel Pavel Butsykin
2016-09-01 14:37   ` Kevin Wolf
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 04/22] block/pcache: add pcache debug build Pavel Butsykin
2016-09-08 15:11   ` Eric Blake
2016-09-08 15:49     ` Pavel Butsykin
2016-09-08 16:05       ` Pavel Butsykin
2016-09-08 18:42         ` Eric Blake
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 05/22] block/pcache: add aio requests into cache Pavel Butsykin
2016-09-01 15:28   ` Kevin Wolf
2016-09-06 16:54     ` Pavel Butsykin [this message]
2016-09-06 17:07       ` Kevin Wolf
2016-09-07 16:21         ` Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 06/22] block/pcache: restrict cache size Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 07/22] block/pcache: introduce LRU as method of memory Pavel Butsykin
2016-09-02  8:49   ` Kevin Wolf
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 08/22] block/pcache: implement pickup parts of the cache Pavel Butsykin
2016-09-02  8:59   ` Kevin Wolf
2016-09-08 12:29     ` Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 09/22] block/pcache: separation AIOCB on requests Pavel Butsykin
2016-09-02  9:10   ` Kevin Wolf
2016-09-08 15:47     ` Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 10/22] block/pcache: add check node leak Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 11/22] add QEMU style defines for __sync_add_and_fetch Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 12/22] block/pcache: implement read cache to qiov and drop node during aio write Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 13/22] block/pcache: add generic request complete Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 14/22] block/pcache: add support for rescheduling requests Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 15/22] block/pcache: simple readahead one chunk forward Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 16/22] block/pcache: pcache readahead node around Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 17/22] block/pcache: skip readahead for non-sequential requests Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 18/22] block/pcache: add pcache skip large aio read Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 19/22] block/pcache: add pcache node assert Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 20/22] block/pcache: implement pcache error handling of aio cb Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 21/22] block/pcache: add write through node Pavel Butsykin
2016-08-29 17:10 ` [Qemu-devel] [PATCH RFC v2 22/22] block/pcache: drop used pcache node Pavel Butsykin
2016-09-01 14:17 ` [Qemu-devel] [PATCH RFC v2 00/22] I/O prefetch cache Kevin Wolf
2016-09-06 12:36   ` Pavel Butsykin
2016-09-01 15:26 ` Avi Kivity
2016-09-06 12:40   ` Pavel Butsykin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57CEF4DE.9030901@virtuozzo.com \
    --to=pbutsykin@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.