[Qemu-devel] Re: [RFC][PATCH 1/2] qcow2: Add QcowCache

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: [RFC][PATCH 1/2] qcow2: Add QcowCache
Date: Fri, 14 Jan 2011 16:22:48 +0100	[thread overview]
Message-ID: <4D306A48.4070009@redhat.com> (raw)
In-Reply-To: <AANLkTinbr_OtCa35UVNYf2z3bYmpY=v3DaBZX42PoZ0h@mail.gmail.com>

Am 14.01.2011 15:36, schrieb Stefan Hajnoczi:
> On Mon, Jan 10, 2011 at 4:53 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> +typedef struct QcowCachedTable {
> 
> How about using Qcow2* naming?  Just recently the old qcow_* functions
> in qcow2.c were renamed and it would be nice to continue that.

Will change it.

>> +    void*   table;
>> +    int64_t offset;
>> +    bool    dirty;
>> +    int     cache_hits;
>> +    int     ref;
>> +} QcowCachedTable;
>> +
>> +struct QcowCache {
>> +    int                     size;
>> +    QcowCachedTable*        entries;
>> +    struct QcowCache*       depends;
>> +    bool                    writethrough;
>> +};
>> +
>> +QcowCache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
>> +    bool writethrough)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QcowCache *c;
>> +    int i;
>> +
>> +    c = qemu_mallocz(sizeof(*c));
>> +    c->size = num_tables;
>> +    c->entries = qemu_mallocz(sizeof(*c->entries) * num_tables);
>> +    c->writethrough = writethrough;
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
>> +    }
> 
> These could be allocated lazily.  For a single cache it doesn't
> matter, but we will have n QcowCaches where n is the number of
> dependencies?

There is one L2 cache and one refcount block cache, both initialized
only once during bdrv_open.

Also, the only dependency we have is that L2 depends on refcounts being
flushed first or vice versa, i.e. the two caches (not tables!) that
exist may depend on each other but new caches are never created.

>> +
>> +    return c;
>> +}
>> +
>> +int qcow2_cache_destroy(BlockDriverState* bs, QcowCache *c)
>> +{
>> +    int i;
>> +    int ret;
>> +
>> +    ret = qcow2_cache_flush(bs, c);
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        assert(c->entries[i].ref == 0);
>> +        qemu_vfree(c->entries[i].table);
>> +    }
>> +    qemu_free(c->entries);
> 
> And qemu_free(c)?

Right...

> 
>> +
>> +    bdrv_flush(bs->file);
>> +
>> +    return ret;
>> +}
>> +
>> +static int qcow2_cache_flush_dependency(BlockDriverState *bs, QcowCache *c)
>> +{
>> +    int ret;
>> +
>> +    ret = qcow2_cache_flush(bs, c->depends);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->depends = NULL;
>> +    return 0;
>> +}
>> +
>> +static int qcow2_cache_entry_flush(BlockDriverState *bs, QcowCache *c, int i)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    if (!c->entries[i].dirty || !c->entries[i].offset) {
>> +        return 0;
>> +    }
>> +
>> +    if (c->depends) {
>> +        ret = qcow2_cache_flush_dependency(bs, c);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
>> +        s->cluster_size);
>> +
>> +    c->entries[i].dirty = false;
> 
> In the error case we should leave this entry marked dirty.

Will fix.

> 
>> +
>> +    return ret;
>> +}
>> +
>> +int qcow2_cache_flush(BlockDriverState *bs, QcowCache *c)
>> +{
>> +    int result = 0;
>> +    int ret;
>> +    int i;
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        ret = qcow2_cache_entry_flush(bs, c, i);
>> +        if (ret < 0 && result != -ENOSPC) {
>> +            result = ret;
>> +        }
>> +    }
>> +
>> +    if (result == 0) {
>> +        ret = bdrv_flush(bs->file);
>> +        if (ret < 0) {
>> +            result = ret;
>> +        }
>> +    }
>> +
>> +    return result;
>> +}
>> +
>> +int qcow2_cache_set_dependency(BlockDriverState *bs, QcowCache *c,
>> +    QcowCache *dependency)
>> +{
>> +    int ret;
>> +
>> +    if (dependency->depends) {
>> +        ret = qcow2_cache_flush_dependency(bs, dependency);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    if (c->depends && (c->depends != dependency)) {
>> +        ret = qcow2_cache_flush_dependency(bs, c);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    c->depends = dependency;
>> +    return 0;
>> +}
>> +
>> +static int qcow2_cache_find_entry_to_replace(QcowCache *c)
>> +{
>> +    int i;
>> +    int min_count = INT_MAX;
>> +    int min_index = 0;
>> +
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        if (c->entries[i].ref) {
>> +            continue;
>> +        }
>> +
>> +        if (c->entries[i].cache_hits < min_count) {
>> +            min_index = i;
>> +            min_count = c->entries[i].cache_hits;
>> +        }
>> +
>> +        /* Give newer hits priority */
>> +        /* TODO Check how to optimize the replacement strategy */
>> +        c->entries[i].cache_hits /= 2;
>> +    }
>> +
>> +    if (c->entries[min_index].ref) {
>> +        abort(); /* TODO */
>> +    }
> 
> Initialize min_index to -1, then you don't need this check.  The
> caller could return ENOSPC or ENOBUFS if this function returns < 0.

I'll implement the -1 initialization.

The check itself is more of a reminder once we make things asynchronous
and multiple requests can use the cache at the same time. In the current
code it's more like an assert, it can never happen that there is no free
cache entry.

>> +    return min_index;
>> +}
>> +
>> +int qcow2_cache_get(BlockDriverState *bs, QcowCache *c, uint64_t offset,
>> +    void **table)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i;
>> +    int ret;
>> +
>> +    /* Check if the table is already cached */
>> +    for (i = 0; i < c->size; i++) {
>> +        if (c->entries[i].offset == offset) {
>> +            goto found;
>> +        }
>> +    }
>> +
>> +    /* If not, write a table back and replace it */
>> +    i = qcow2_cache_find_entry_to_replace(c);
>> +    if (i < 0) {
>> +        return i;
>> +    }
>> +
>> +    ret = qcow2_cache_entry_flush(bs, c, i);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->entries[i].offset = 0;
>> +    ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->entries[i].cache_hits = 32;
> 
> 32?

It should be 42, "an arbitrary but carefully chosen number" ;-)

The point is that we don't want a new entry to be freed immediately
again, so it gets some hits for the start. I'll add a comment for that.

Kevin

next prev parent reply	other threads:[~2011-01-14 15:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-10 16:53 [Qemu-devel] [RFC][PATCH 0/2] qcow2 metadata cache Kevin Wolf
2011-01-10 16:53 ` [Qemu-devel] [RFC][PATCH 1/2] qcow2: Add QcowCache Kevin Wolf
     [not found]   ` <AANLkTinbr_OtCa35UVNYf2z3bYmpY=v3DaBZX42PoZ0h@mail.gmail.com>
2011-01-14 15:22     ` Kevin Wolf [this message]
2011-01-14 16:59       ` [Qemu-devel] " Stefan Hajnoczi
2011-01-10 16:53 ` [Qemu-devel] [RFC][PATCH 2/2] qcow2: Use QcowCache Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D306A48.4070009@redhat.com \
    --to=kwolf@redhat.com \
    --cc=Qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).