From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=46010 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PdlSy-0005qK-RO for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PdlSu-0007bZ-AW for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:13703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PdlSu-0007bD-3L for Qemu-devel@nongnu.org; Fri, 14 Jan 2011 10:21:24 -0500 Message-ID: <4D306A48.4070009@redhat.com> Date: Fri, 14 Jan 2011 16:22:48 +0100 From: Kevin Wolf MIME-Version: 1.0 References: <1294678428-7266-1-git-send-email-kwolf@redhat.com> <1294678428-7266-2-git-send-email-kwolf@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC][PATCH 1/2] qcow2: Add QcowCache List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Qemu-devel@nongnu.org Am 14.01.2011 15:36, schrieb Stefan Hajnoczi: > On Mon, Jan 10, 2011 at 4:53 PM, Kevin Wolf wrote: >> +typedef struct QcowCachedTable { > > How about using Qcow2* naming? Just recently the old qcow_* functions > in qcow2.c were renamed and it would be nice to continue that. Will change it. >> + void* table; >> + int64_t offset; >> + bool dirty; >> + int cache_hits; >> + int ref; >> +} QcowCachedTable; >> + >> +struct QcowCache { >> + int size; >> + QcowCachedTable* entries; >> + struct QcowCache* depends; >> + bool writethrough; >> +}; >> + >> +QcowCache *qcow2_cache_create(BlockDriverState *bs, int num_tables, >> + bool writethrough) >> +{ >> + BDRVQcowState *s = bs->opaque; >> + QcowCache *c; >> + int i; >> + >> + c = qemu_mallocz(sizeof(*c)); >> + c->size = num_tables; >> + c->entries = qemu_mallocz(sizeof(*c->entries) * num_tables); >> + c->writethrough = writethrough; >> + >> + for (i = 0; i < c->size; i++) { >> + c->entries[i].table = qemu_blockalign(bs, s->cluster_size); >> + } > > These could be allocated lazily. For a single cache it doesn't > matter, but we will have n QcowCaches where n is the number of > dependencies? There is one L2 cache and one refcount block cache, both initialized only once during bdrv_open. Also, the only dependency we have is that L2 depends on refcounts being flushed first or vice versa, i.e. the two caches (not tables!) that exist may depend on each other but new caches are never created. >> + >> + return c; >> +} >> + >> +int qcow2_cache_destroy(BlockDriverState* bs, QcowCache *c) >> +{ >> + int i; >> + int ret; >> + >> + ret = qcow2_cache_flush(bs, c); >> + >> + for (i = 0; i < c->size; i++) { >> + assert(c->entries[i].ref == 0); >> + qemu_vfree(c->entries[i].table); >> + } >> + qemu_free(c->entries); > > And qemu_free(c)? Right... > >> + >> + bdrv_flush(bs->file); >> + >> + return ret; >> +} >> + >> +static int qcow2_cache_flush_dependency(BlockDriverState *bs, QcowCache *c) >> +{ >> + int ret; >> + >> + ret = qcow2_cache_flush(bs, c->depends); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + c->depends = NULL; >> + return 0; >> +} >> + >> +static int qcow2_cache_entry_flush(BlockDriverState *bs, QcowCache *c, int i) >> +{ >> + BDRVQcowState *s = bs->opaque; >> + int ret; >> + >> + if (!c->entries[i].dirty || !c->entries[i].offset) { >> + return 0; >> + } >> + >> + if (c->depends) { >> + ret = qcow2_cache_flush_dependency(bs, c); >> + if (ret < 0) { >> + return ret; >> + } >> + } >> + >> + ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table, >> + s->cluster_size); >> + >> + c->entries[i].dirty = false; > > In the error case we should leave this entry marked dirty. Will fix. > >> + >> + return ret; >> +} >> + >> +int qcow2_cache_flush(BlockDriverState *bs, QcowCache *c) >> +{ >> + int result = 0; >> + int ret; >> + int i; >> + >> + for (i = 0; i < c->size; i++) { >> + ret = qcow2_cache_entry_flush(bs, c, i); >> + if (ret < 0 && result != -ENOSPC) { >> + result = ret; >> + } >> + } >> + >> + if (result == 0) { >> + ret = bdrv_flush(bs->file); >> + if (ret < 0) { >> + result = ret; >> + } >> + } >> + >> + return result; >> +} >> + >> +int qcow2_cache_set_dependency(BlockDriverState *bs, QcowCache *c, >> + QcowCache *dependency) >> +{ >> + int ret; >> + >> + if (dependency->depends) { >> + ret = qcow2_cache_flush_dependency(bs, dependency); >> + if (ret < 0) { >> + return ret; >> + } >> + } >> + >> + if (c->depends && (c->depends != dependency)) { >> + ret = qcow2_cache_flush_dependency(bs, c); >> + if (ret < 0) { >> + return ret; >> + } >> + } >> + >> + c->depends = dependency; >> + return 0; >> +} >> + >> +static int qcow2_cache_find_entry_to_replace(QcowCache *c) >> +{ >> + int i; >> + int min_count = INT_MAX; >> + int min_index = 0; >> + >> + >> + for (i = 0; i < c->size; i++) { >> + if (c->entries[i].ref) { >> + continue; >> + } >> + >> + if (c->entries[i].cache_hits < min_count) { >> + min_index = i; >> + min_count = c->entries[i].cache_hits; >> + } >> + >> + /* Give newer hits priority */ >> + /* TODO Check how to optimize the replacement strategy */ >> + c->entries[i].cache_hits /= 2; >> + } >> + >> + if (c->entries[min_index].ref) { >> + abort(); /* TODO */ >> + } > > Initialize min_index to -1, then you don't need this check. The > caller could return ENOSPC or ENOBUFS if this function returns < 0. I'll implement the -1 initialization. The check itself is more of a reminder once we make things asynchronous and multiple requests can use the cache at the same time. In the current code it's more like an assert, it can never happen that there is no free cache entry. >> + return min_index; >> +} >> + >> +int qcow2_cache_get(BlockDriverState *bs, QcowCache *c, uint64_t offset, >> + void **table) >> +{ >> + BDRVQcowState *s = bs->opaque; >> + int i; >> + int ret; >> + >> + /* Check if the table is already cached */ >> + for (i = 0; i < c->size; i++) { >> + if (c->entries[i].offset == offset) { >> + goto found; >> + } >> + } >> + >> + /* If not, write a table back and replace it */ >> + i = qcow2_cache_find_entry_to_replace(c); >> + if (i < 0) { >> + return i; >> + } >> + >> + ret = qcow2_cache_entry_flush(bs, c, i); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + c->entries[i].offset = 0; >> + ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + c->entries[i].cache_hits = 32; > > 32? It should be 42, "an arbitrary but carefully chosen number" ;-) The point is that we don't want a new entry to be freed immediately again, so it gets some hits for the start. I'll add a comment for that. Kevin