From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=46010 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PdlSy-0005qK-RO
	for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:33 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1PdlSu-0007bZ-AW
	for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:28 -0500
Received: from mx1.redhat.com ([209.132.183.28]:13703)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1PdlSu-0007bD-3L
	for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:24 -0500
Message-ID: <4D306A48.4070009@redhat.com>
Date: Fri, 14 Jan 2011 16:22:48 +0100
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <1294678428-7266-1-git-send-email-kwolf@redhat.com>	<1294678428-7266-2-git-send-email-kwolf@redhat.com>
	<AANLkTinbr_OtCa35UVNYf2z3bYmpY=v3DaBZX42PoZ0h@mail.gmail.com>
In-Reply-To: <AANLkTinbr_OtCa35UVNYf2z3bYmpY=v3DaBZX42PoZ0h@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [RFC][PATCH 1/2] qcow2: Add QcowCache
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Qemu-devel@nongnu.org

Am 14.01.2011 15:36, schrieb Stefan Hajnoczi:
> On Mon, Jan 10, 2011 at 4:53 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> +typedef struct QcowCachedTable {
> 
> How about using Qcow2* naming?  Just recently the old qcow_* functions
> in qcow2.c were renamed and it would be nice to continue that.

Will change it.

>> +    void*   table;
>> +    int64_t offset;
>> +    bool    dirty;
>> +    int     cache_hits;
>> +    int     ref;
>> +} QcowCachedTable;
>> +
>> +struct QcowCache {
>> +    int                     size;
>> +    QcowCachedTable*        entries;
>> +    struct QcowCache*       depends;
>> +    bool                    writethrough;
>> +};
>> +
>> +QcowCache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
>> +    bool writethrough)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QcowCache *c;
>> +    int i;
>> +
>> +    c = qemu_mallocz(sizeof(*c));
>> +    c->size = num_tables;
>> +    c->entries = qemu_mallocz(sizeof(*c->entries) * num_tables);
>> +    c->writethrough = writethrough;
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
>> +    }
> 
> These could be allocated lazily.  For a single cache it doesn't
> matter, but we will have n QcowCaches where n is the number of
> dependencies?

There is one L2 cache and one refcount block cache, both initialized
only once during bdrv_open.

Also, the only dependency we have is that L2 depends on refcounts being
flushed first or vice versa, i.e. the two caches (not tables!) that
exist may depend on each other but new caches are never created.

>> +
>> +    return c;
>> +}
>> +
>> +int qcow2_cache_destroy(BlockDriverState* bs, QcowCache *c)
>> +{
>> +    int i;
>> +    int ret;
>> +
>> +    ret = qcow2_cache_flush(bs, c);
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        assert(c->entries[i].ref == 0);
>> +        qemu_vfree(c->entries[i].table);
>> +    }
>> +    qemu_free(c->entries);
> 
> And qemu_free(c)?

Right...

> 
>> +
>> +    bdrv_flush(bs->file);
>> +
>> +    return ret;
>> +}
>> +
>> +static int qcow2_cache_flush_dependency(BlockDriverState *bs, QcowCache *c)
>> +{
>> +    int ret;
>> +
>> +    ret = qcow2_cache_flush(bs, c->depends);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->depends = NULL;
>> +    return 0;
>> +}
>> +
>> +static int qcow2_cache_entry_flush(BlockDriverState *bs, QcowCache *c, int i)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    if (!c->entries[i].dirty || !c->entries[i].offset) {
>> +        return 0;
>> +    }
>> +
>> +    if (c->depends) {
>> +        ret = qcow2_cache_flush_dependency(bs, c);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
>> +        s->cluster_size);
>> +
>> +    c->entries[i].dirty = false;
> 
> In the error case we should leave this entry marked dirty.

Will fix.

> 
>> +
>> +    return ret;
>> +}
>> +
>> +int qcow2_cache_flush(BlockDriverState *bs, QcowCache *c)
>> +{
>> +    int result = 0;
>> +    int ret;
>> +    int i;
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        ret = qcow2_cache_entry_flush(bs, c, i);
>> +        if (ret < 0 && result != -ENOSPC) {
>> +            result = ret;
>> +        }
>> +    }
>> +
>> +    if (result == 0) {
>> +        ret = bdrv_flush(bs->file);
>> +        if (ret < 0) {
>> +            result = ret;
>> +        }
>> +    }
>> +
>> +    return result;
>> +}
>> +
>> +int qcow2_cache_set_dependency(BlockDriverState *bs, QcowCache *c,
>> +    QcowCache *dependency)
>> +{
>> +    int ret;
>> +
>> +    if (dependency->depends) {
>> +        ret = qcow2_cache_flush_dependency(bs, dependency);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    if (c->depends && (c->depends != dependency)) {
>> +        ret = qcow2_cache_flush_dependency(bs, c);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    c->depends = dependency;
>> +    return 0;
>> +}
>> +
>> +static int qcow2_cache_find_entry_to_replace(QcowCache *c)
>> +{
>> +    int i;
>> +    int min_count = INT_MAX;
>> +    int min_index = 0;
>> +
>> +
>> +    for (i = 0; i < c->size; i++) {
>> +        if (c->entries[i].ref) {
>> +            continue;
>> +        }
>> +
>> +        if (c->entries[i].cache_hits < min_count) {
>> +            min_index = i;
>> +            min_count = c->entries[i].cache_hits;
>> +        }
>> +
>> +        /* Give newer hits priority */
>> +        /* TODO Check how to optimize the replacement strategy */
>> +        c->entries[i].cache_hits /= 2;
>> +    }
>> +
>> +    if (c->entries[min_index].ref) {
>> +        abort(); /* TODO */
>> +    }
> 
> Initialize min_index to -1, then you don't need this check.  The
> caller could return ENOSPC or ENOBUFS if this function returns < 0.

I'll implement the -1 initialization.

The check itself is more of a reminder once we make things asynchronous
and multiple requests can use the cache at the same time. In the current
code it's more like an assert, it can never happen that there is no free
cache entry.

>> +    return min_index;
>> +}
>> +
>> +int qcow2_cache_get(BlockDriverState *bs, QcowCache *c, uint64_t offset,
>> +    void **table)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i;
>> +    int ret;
>> +
>> +    /* Check if the table is already cached */
>> +    for (i = 0; i < c->size; i++) {
>> +        if (c->entries[i].offset == offset) {
>> +            goto found;
>> +        }
>> +    }
>> +
>> +    /* If not, write a table back and replace it */
>> +    i = qcow2_cache_find_entry_to_replace(c);
>> +    if (i < 0) {
>> +        return i;
>> +    }
>> +
>> +    ret = qcow2_cache_entry_flush(bs, c, i);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->entries[i].offset = 0;
>> +    ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    c->entries[i].cache_hits = 32;
> 
> 32?

It should be 42, "an arbitrary but carefully chosen number" ;-)

The point is that we don't want a new entry to be freed immediately
again, so it gets some hits for the start. I'll add a comment for that.

Kevin