* [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements
@ 2015-05-11 12:54 Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 1/8] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
New version of the qcow2 cache patches:
v3:
- Removed a dead comment in patch #3
- New document explaining how to configure the cache sizes
v2: https://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg00833.html
- Don't do pointer arithmetic on void *
- Rename table_addr() to qcow2_cache_get_table_addr()
- Add qcow2_cache_get_table_idx()
- Cast cache size to size_t to prevent overflows
- Make qcow2_cache_put() a void function
- Don't store the cluster size in the cache, get it from the BDS instead
v1: https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04355.html
Regards,
Berto
Alberto Garcia (8):
qcow2: use one single memory block for the L2/refcount cache tables
qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
qcow2: use an LRU algorithm to replace entries from the L2 cache
qcow2: remove qcow2_cache_find_entry_to_replace()
qcow2: use a hash to look for entries in the L2 cache
qcow2: make qcow2_cache_put() a void function
qcow2: style fixes in qcow2-cache.c
docs: document how to configure the qcow2 L2/refcount caches
block/qcow2-cache.c | 171 ++++++++++++++++++++++---------------------------
block/qcow2-cluster.c | 62 +++++-------------
block/qcow2-refcount.c | 37 +++--------
block/qcow2.h | 5 +-
docs/qcow2-cache.txt | 117 +++++++++++++++++++++++++++++++++
5 files changed, 221 insertions(+), 171 deletions(-)
create mode 100644 docs/qcow2-cache.txt
--
2.1.4
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/8] qcow2: use one single memory block for the L2/refcount cache tables
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 2/8] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
The qcow2 L2/refcount cache contains one separate table for each cache
entry. Doing one allocation per table adds unnecessary overhead and it
also requires us to store the address of each table separately.
Since the size of the cache is constant during its lifetime, it's
better to have an array that contains all the tables using one single
allocation.
In my tests measuring freshly created caches with sizes 128MB (L2) and
32MB (refcount) this uses around 10MB of RAM less.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 55 ++++++++++++++++++++++++--------------------------
block/qcow2-cluster.c | 12 +++++------
block/qcow2-refcount.c | 8 +++++---
block/qcow2.h | 3 ++-
4 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index b115549..f0dfb69 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -28,7 +28,6 @@
#include "trace.h"
typedef struct Qcow2CachedTable {
- void* table;
int64_t offset;
bool dirty;
int cache_hits;
@@ -40,39 +39,35 @@ struct Qcow2Cache {
struct Qcow2Cache* depends;
int size;
bool depends_on_flush;
+ void *table_array;
};
+static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
+ Qcow2Cache *c, int table)
+{
+ BDRVQcowState *s = bs->opaque;
+ return (uint8_t *) c->table_array + (size_t) table * s->cluster_size;
+}
+
Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
{
BDRVQcowState *s = bs->opaque;
Qcow2Cache *c;
- int i;
c = g_new0(Qcow2Cache, 1);
c->size = num_tables;
c->entries = g_try_new0(Qcow2CachedTable, num_tables);
- if (!c->entries) {
- goto fail;
- }
-
- for (i = 0; i < c->size; i++) {
- c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
- if (c->entries[i].table == NULL) {
- goto fail;
- }
+ c->table_array = qemu_try_blockalign(bs->file,
+ (size_t) num_tables * s->cluster_size);
+
+ if (!c->entries || !c->table_array) {
+ qemu_vfree(c->table_array);
+ g_free(c->entries);
+ g_free(c);
+ c = NULL;
}
return c;
-
-fail:
- if (c->entries) {
- for (i = 0; i < c->size; i++) {
- qemu_vfree(c->entries[i].table);
- }
- }
- g_free(c->entries);
- g_free(c);
- return NULL;
}
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
@@ -81,9 +76,9 @@ int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
for (i = 0; i < c->size; i++) {
assert(c->entries[i].ref == 0);
- qemu_vfree(c->entries[i].table);
}
+ qemu_vfree(c->table_array);
g_free(c->entries);
g_free(c);
@@ -151,8 +146,8 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
}
- ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
- s->cluster_size);
+ ret = bdrv_pwrite(bs->file, c->entries[i].offset,
+ qcow2_cache_get_table_addr(bs, c, i), s->cluster_size);
if (ret < 0) {
return ret;
}
@@ -304,7 +299,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
}
- ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+ ret = bdrv_pread(bs->file, offset, qcow2_cache_get_table_addr(bs, c, i),
+ s->cluster_size);
if (ret < 0) {
return ret;
}
@@ -319,7 +315,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
found:
c->entries[i].cache_hits++;
c->entries[i].ref++;
- *table = c->entries[i].table;
+ *table = qcow2_cache_get_table_addr(bs, c, i);
trace_qcow2_cache_get_done(qemu_coroutine_self(),
c == s->l2_table_cache, i);
@@ -344,7 +340,7 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
int i;
for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == *table) {
+ if (qcow2_cache_get_table_addr(bs, c, i) == *table) {
goto found;
}
}
@@ -358,12 +354,13 @@ found:
return 0;
}
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
+void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
+ void *table)
{
int i;
for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == table) {
+ if (qcow2_cache_get_table_addr(bs, c, i) == table) {
goto found;
}
}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ed2b44d..5cd418a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -263,7 +263,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
trace_qcow2_l2_allocate_write_l2(bs, l1_index);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
ret = qcow2_cache_flush(bs, s->l2_table_cache);
if (ret < 0) {
goto fail;
@@ -692,7 +692,7 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
/* compressed clusters never have the copied flag */
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
l2_table[l2_index] = cpu_to_be64(cluster_offset);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
@@ -771,7 +771,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
if (ret < 0) {
goto err;
}
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
assert(l2_index + m->nb_clusters <= s->l2_size);
for (i = 0; i < m->nb_clusters; i++) {
@@ -1470,7 +1470,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
}
/* First remove L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
if (!full_discard && s->qcow_version >= 3) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
} else {
@@ -1558,7 +1558,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* Update L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
if (old_offset & QCOW_OFLAG_COMPRESSED) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
@@ -1760,7 +1760,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
if (is_active_l1) {
if (l2_dirty) {
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_depends_on_flush(s->l2_table_cache);
}
ret = qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f47260b..35a6a35 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -424,7 +424,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Now the new refcount block needs to be written to disk */
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+ qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, *refcount_block);
ret = qcow2_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
goto fail_block;
@@ -737,7 +737,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
old_table_index = table_index;
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+ qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache,
+ refcount_block);
/* we can update the count and save it */
block_index = cluster_index & (s->refcount_block_size - 1);
@@ -1177,7 +1178,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache,
+ l2_table);
}
}
diff --git a/block/qcow2.h b/block/qcow2.h
index 422b825..5d0995f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -574,7 +574,8 @@ int qcow2_read_snapshots(BlockDriverState *bs);
Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
+void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
+ void *table);
int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
Qcow2Cache *dependency);
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 2/8] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 1/8] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 3/8] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
Since all tables are now stored together, it is possible to obtain
the position of a particular table directly from its address, so the
operation becomes O(1).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index f0dfb69..6e78c8f 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -49,6 +49,16 @@ static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
return (uint8_t *) c->table_array + (size_t) table * s->cluster_size;
}
+static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
+ Qcow2Cache *c, void *table)
+{
+ BDRVQcowState *s = bs->opaque;
+ ptrdiff_t table_offset = (uint8_t *) table - (uint8_t *) c->table_array;
+ int idx = table_offset / s->cluster_size;
+ assert(idx >= 0 && idx < c->size && table_offset % s->cluster_size == 0);
+ return idx;
+}
+
Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
{
BDRVQcowState *s = bs->opaque;
@@ -337,16 +347,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
{
- int i;
+ int i = qcow2_cache_get_table_idx(bs, c, *table);
- for (i = 0; i < c->size; i++) {
- if (qcow2_cache_get_table_addr(bs, c, i) == *table) {
- goto found;
- }
+ if (c->entries[i].offset == 0) {
+ return -ENOENT;
}
- return -ENOENT;
-found:
c->entries[i].ref--;
*table = NULL;
@@ -357,15 +363,7 @@ found:
void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
void *table)
{
- int i;
-
- for (i = 0; i < c->size; i++) {
- if (qcow2_cache_get_table_addr(bs, c, i) == table) {
- goto found;
- }
- }
- abort();
-
-found:
+ int i = qcow2_cache_get_table_idx(bs, c, table);
+ assert(c->entries[i].offset != 0);
c->entries[i].dirty = true;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 3/8] qcow2: use an LRU algorithm to replace entries from the L2 cache
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 1/8] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 2/8] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 4/8] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
The current algorithm to evict entries from the cache gives always
preference to those in the lowest positions. As the size of the cache
increases, the chances of the later elements of being removed decrease
exponentially.
In a scenario with random I/O and lots of cache misses, entries in
positions 8 and higher are rarely (if ever) evicted. This can be seen
even with the default cache size, but with larger caches the problem
becomes more obvious.
Using an LRU algorithm makes the chances of being removed from the
cache independent from the position.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 6e78c8f..3a31895 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -28,10 +28,10 @@
#include "trace.h"
typedef struct Qcow2CachedTable {
- int64_t offset;
- bool dirty;
- int cache_hits;
- int ref;
+ int64_t offset;
+ bool dirty;
+ uint64_t lru_counter;
+ int ref;
} Qcow2CachedTable;
struct Qcow2Cache {
@@ -40,6 +40,7 @@ struct Qcow2Cache {
int size;
bool depends_on_flush;
void *table_array;
+ uint64_t lru_counter;
};
static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -233,16 +234,18 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
for (i = 0; i < c->size; i++) {
assert(c->entries[i].ref == 0);
c->entries[i].offset = 0;
- c->entries[i].cache_hits = 0;
+ c->entries[i].lru_counter = 0;
}
+ c->lru_counter = 0;
+
return 0;
}
static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
{
int i;
- int min_count = INT_MAX;
+ uint64_t min_lru_counter = UINT64_MAX;
int min_index = -1;
@@ -251,15 +254,9 @@ static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
continue;
}
- if (c->entries[i].cache_hits < min_count) {
+ if (c->entries[i].lru_counter < min_lru_counter) {
min_index = i;
- min_count = c->entries[i].cache_hits;
- }
-
- /* Give newer hits priority */
- /* TODO Check how to optimize the replacement strategy */
- if (c->entries[i].cache_hits > 1) {
- c->entries[i].cache_hits /= 2;
+ min_lru_counter = c->entries[i].lru_counter;
}
}
@@ -316,14 +313,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
}
}
- /* Give the table some hits for the start so that it won't be replaced
- * immediately. The number 32 is completely arbitrary. */
- c->entries[i].cache_hits = 32;
c->entries[i].offset = offset;
/* And return the right table */
found:
- c->entries[i].cache_hits++;
c->entries[i].ref++;
*table = qcow2_cache_get_table_addr(bs, c, i);
@@ -356,6 +349,10 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
c->entries[i].ref--;
*table = NULL;
+ if (c->entries[i].ref == 0) {
+ c->entries[i].lru_counter = ++c->lru_counter;
+ }
+
assert(c->entries[i].ref >= 0);
return 0;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 4/8] qcow2: remove qcow2_cache_find_entry_to_replace()
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (2 preceding siblings ...)
2015-05-11 12:54 ` [Qemu-devel] [PATCH 3/8] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 5/8] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
A cache miss means that the whole array was traversed and the entry
we were looking for was not found, so there's no need to traverse it
again in order to select an entry to replace.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 45 ++++++++++++++++-----------------------------
1 file changed, 16 insertions(+), 29 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 3a31895..2035cd8 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -242,51 +242,38 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
return 0;
}
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
- int i;
- uint64_t min_lru_counter = UINT64_MAX;
- int min_index = -1;
-
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].ref) {
- continue;
- }
-
- if (c->entries[i].lru_counter < min_lru_counter) {
- min_index = i;
- min_lru_counter = c->entries[i].lru_counter;
- }
- }
-
- if (min_index == -1) {
- /* This can't happen in current synchronous code, but leave the check
- * here as a reminder for whoever starts using AIO with the cache */
- abort();
- }
- return min_index;
-}
-
static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
uint64_t offset, void **table, bool read_from_disk)
{
BDRVQcowState *s = bs->opaque;
int i;
int ret;
+ uint64_t min_lru_counter = UINT64_MAX;
+ int min_lru_index = -1;
trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
offset, read_from_disk);
/* Check if the table is already cached */
for (i = 0; i < c->size; i++) {
- if (c->entries[i].offset == offset) {
+ const Qcow2CachedTable *t = &c->entries[i];
+ if (t->offset == offset) {
goto found;
}
+ if (t->ref == 0 && t->lru_counter < min_lru_counter) {
+ min_lru_counter = t->lru_counter;
+ min_lru_index = i;
+ }
+ }
+
+ if (min_lru_index == -1) {
+ /* This can't happen in current synchronous code, but leave the check
+ * here as a reminder for whoever starts using AIO with the cache */
+ abort();
}
- /* If not, write a table back and replace it */
- i = qcow2_cache_find_entry_to_replace(c);
+ /* Cache miss: write a table back and replace it */
+ i = min_lru_index;
trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
c == s->l2_table_cache, i);
if (i < 0) {
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 5/8] qcow2: use a hash to look for entries in the L2 cache
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (3 preceding siblings ...)
2015-05-11 12:54 ` [Qemu-devel] [PATCH 4/8] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function Alberto Garcia
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
The current cache algorithm traverses the array starting always from
the beginning, so the average number of comparisons needed to perform
a lookup is proportional to the size of the array.
By using a hash of the offset as the starting point, lookups are
faster and independent from the array size.
The hash is computed using the cluster number of the table, multiplied
by 4 to make it perform better when there are collisions.
In my tests, using a cache with 2048 entries, this reduces the average
number of comparisons per lookup from 430 to 2.5.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/qcow2-cache.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2035cd8..121e6e9 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -248,6 +248,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
BDRVQcowState *s = bs->opaque;
int i;
int ret;
+ int lookup_index;
uint64_t min_lru_counter = UINT64_MAX;
int min_lru_index = -1;
@@ -255,7 +256,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
offset, read_from_disk);
/* Check if the table is already cached */
- for (i = 0; i < c->size; i++) {
+ i = lookup_index = (offset / s->cluster_size * 4) % c->size;
+ do {
const Qcow2CachedTable *t = &c->entries[i];
if (t->offset == offset) {
goto found;
@@ -264,7 +266,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
min_lru_counter = t->lru_counter;
min_lru_index = i;
}
- }
+ if (++i == c->size) {
+ i = 0;
+ }
+ } while (i != lookup_index);
if (min_lru_index == -1) {
/* This can't happen in current synchronous code, but leave the check
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (4 preceding siblings ...)
2015-05-11 12:54 ` [Qemu-devel] [PATCH 5/8] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 13:12 ` Max Reitz
2015-05-11 12:54 ` [Qemu-devel] [PATCH 7/8] qcow2: style fixes in qcow2-cache.c Alberto Garcia
` (2 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
This function never receives an invalid table pointer, so we can make
it void and remove all the error checking code.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/qcow2-cache.c | 7 +------
block/qcow2-cluster.c | 50 ++++++++++----------------------------------------
block/qcow2-refcount.c | 29 +++++------------------------
block/qcow2.h | 2 +-
4 files changed, 17 insertions(+), 71 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 121e6e9..bde3c4f 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -330,14 +330,10 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
return qcow2_cache_do_get(bs, c, offset, table, false);
}
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
+void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
{
int i = qcow2_cache_get_table_idx(bs, c, *table);
- if (c->entries[i].offset == 0) {
- return -ENOENT;
- }
-
c->entries[i].ref--;
*table = NULL;
@@ -346,7 +342,6 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
}
assert(c->entries[i].ref >= 0);
- return 0;
}
void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5cd418a..d74426c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -253,10 +253,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
memcpy(l2_table, old_table, s->cluster_size);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
- if (ret < 0) {
- goto fail;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &old_table);
}
/* write the l2 table to the file */
@@ -694,10 +691,7 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
l2_table[l2_index] = cpu_to_be64(cluster_offset);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- return 0;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
return cluster_offset;
}
@@ -789,10 +783,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- goto err;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
/*
* If this was a COW, we need to decrease the refcount of the old cluster.
@@ -944,7 +935,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
uint64_t *l2_table;
unsigned int nb_clusters;
unsigned int keep_clusters;
- int ret, pret;
+ int ret;
trace_qcow2_handle_copied(qemu_coroutine_self(), guest_offset, *host_offset,
*bytes);
@@ -1011,10 +1002,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
/* Cleanup */
out:
- pret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (pret < 0) {
- return pret;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
/* Only return a host offset if we actually made progress. Otherwise we
* would make requirements for handle_alloc() that it can't fulfill */
@@ -1139,10 +1127,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
* wrong with our code. */
assert(nb_clusters > 0);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- return ret;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
/* Allocate, if necessary at a given offset in the image file */
alloc_cluster_offset = start_of_cluster(s, *host_offset);
@@ -1481,10 +1466,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- return ret;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
return nb_clusters;
}
@@ -1567,10 +1549,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- return ret;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
return nb_clusters;
}
@@ -1763,11 +1742,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_depends_on_flush(s->l2_table_cache);
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
- if (ret < 0) {
- l2_table = NULL;
- goto fail;
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
} else {
if (l2_dirty) {
ret = qcow2_pre_write_overlap_check(bs,
@@ -1798,12 +1773,7 @@ fail:
if (!is_active_l1) {
qemu_vfree(l2_table);
} else {
- if (ret < 0) {
- qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
- } else {
- ret = qcow2_cache_put(bs, s->l2_table_cache,
- (void **)&l2_table);
- }
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
}
}
return ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 35a6a35..e3e7f3a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -265,10 +265,7 @@ int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index,
block_index = cluster_index & (s->refcount_block_size - 1);
*refcount = s->get_refcount(refcount_block, block_index);
- ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
- if (ret < 0) {
- return ret;
- }
+ qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
return 0;
}
@@ -448,10 +445,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
return -EAGAIN;
}
- ret = qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
- if (ret < 0) {
- goto fail_block;
- }
+ qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
/*
* If we come here, we need to grow the refcount table. Again, a new
@@ -723,13 +717,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
/* Load the refcount block and allocate it if needed */
if (table_index != old_table_index) {
if (refcount_block) {
- ret = qcow2_cache_put(bs, s->refcount_block_cache,
- &refcount_block);
- if (ret < 0) {
- goto fail;
- }
+ qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
}
-
ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
if (ret < 0) {
goto fail;
@@ -774,11 +763,7 @@ fail:
/* Write last changed block to disk */
if (refcount_block) {
- int wret;
- wret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
- if (wret < 0) {
- return ret < 0 ? ret : wret;
- }
+ qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
}
/*
@@ -1183,11 +1168,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
- if (ret < 0) {
- goto fail;
- }
-
+ qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (addend != 0) {
ret = qcow2_update_cluster_refcount(bs, l2_offset >>
diff --git a/block/qcow2.h b/block/qcow2.h
index 5d0995f..0076512 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -587,6 +587,6 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
#endif
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 7/8] qcow2: style fixes in qcow2-cache.c
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (5 preceding siblings ...)
2015-05-11 12:54 ` [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function Alberto Garcia
@ 2015-05-11 12:54 ` Alberto Garcia
2015-05-11 12:55 ` [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches Alberto Garcia
2015-05-12 9:57 ` [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Kevin Wolf
8 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:54 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
Fix pointer declaration to make it consistent with the rest of the
code.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index bde3c4f..ed92a09 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -35,8 +35,8 @@ typedef struct Qcow2CachedTable {
} Qcow2CachedTable;
struct Qcow2Cache {
- Qcow2CachedTable* entries;
- struct Qcow2Cache* depends;
+ Qcow2CachedTable *entries;
+ struct Qcow2Cache *depends;
int size;
bool depends_on_flush;
void *table_array;
@@ -81,7 +81,7 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
return c;
}
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
+int qcow2_cache_destroy(BlockDriverState *bs, Qcow2Cache *c)
{
int i;
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (6 preceding siblings ...)
2015-05-11 12:54 ` [Qemu-devel] [PATCH 7/8] qcow2: style fixes in qcow2-cache.c Alberto Garcia
@ 2015-05-11 12:55 ` Alberto Garcia
2015-05-11 13:23 ` Max Reitz
2015-05-11 15:12 ` Eric Blake
2015-05-12 9:57 ` [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Kevin Wolf
8 siblings, 2 replies; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 12:55 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz,
Stefan Hajnoczi
QEMU has options to configure the size of the L2 and refcount caches
for the qcow2 format. However, choosing the right sizes for a
particular disk image is not a straightforward operation since the
ratio between the cache size and the allocated disk space is not
obvious and depends on the size of the cluster.
This document attempts to give an overview of both caches and how to
configure their sizes.
Signed-off-by: Alberto Garcia <berto@igalia.com>
---
docs/qcow2-cache.txt | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 117 insertions(+)
create mode 100644 docs/qcow2-cache.txt
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
new file mode 100644
index 0000000..2570172
--- /dev/null
+++ b/docs/qcow2-cache.txt
@@ -0,0 +1,117 @@
+qcow2 L2/refcount cache configuration
+=====================================
+The QEMU qcow2 driver has two caches that can improve the I/O
+performance significantly. However, setting the right cache sizes is
+not a straightforward operation.
+
+This document attempts to give an overview of the L2 and refcount
+caches, and how to configure them.
+
+Please refer to the docs/specs/qcow2.txt file for an in-depth
+technical description of the qcow2 file format.
+
+
+Clusters
+--------
+A qcow2 file is organized in units of constant size called clusters.
+
+The cluster size is configurable, but it must be a power of two and
+its value 512 bytes or higher. QEMU currently defaults to 64 KB
+clusters, and it does not support sizes larger than 2MB.
+
+The 'qemu-img create' command supports specifying the size using the
+cluster_size option:
+
+ qemu-img create -f qcow2 -o cluster_size=128K hd.img 4G
+
+
+The L2 tables
+-------------
+The qcow2 format uses a two-level structure to map the virtual disk as
+seen by the guest to the disk image in the host. These structures are
+called the L1 and L2 tables.
+
+There is one single L1 table per disk image. The table is small and is
+always kept in memory.
+
+There can be many L2 tables, depending on how much space has been
+allocated in the image. Each table is one cluster in size. In order to
+read or write data from the virtual disk, QEMU needs to read its
+corresponding L2 table to find out where that data is located. Since
+reading the table for each I/O operation can be expensive, QEMU keeps
+an L2 cache in memory to speed up disk access.
+
+The size of the L2 cache can be configured, and setting the right
+value can improve the I/O performance significantly.
+
+
+The refcount blocks
+-------------------
+The qcow2 format also mantains a reference count for each cluster. The
+data is stored in a two-level structure similar to the L1/L2 tables
+described above.
+
+The second level structures are called refcount blocks, are also one
+cluster in size and the number is also variable and dependent on the
+amount of allocated space.
+
+QEMU keeps a refcount cache to speed up I/O much like the
+aforementioned L2 cache, and its size can also be configured.
+
+
+Choosing the right cache sizes
+------------------------------
+In order to choose the cache sizes we need to know how they relate to
+the amount of allocated space.
+
+The amount of virtual disk that can be mapped by the L2 and refcount
+caches (in bytes) is:
+
+ disk_size = l2_cache_size * cluster_size / 8
+ disk_size = refcount_cache_size * cluster_size / 2
+
+With the default cluster size (64KB), that is
+
+ disk_size = l2_cache_size * 8192
+ disk_size = refcount_cache_size * 32768
+
+So in order to cover n GB of disk space with the default cluster size
+we need:
+
+ l2_cache_size = disk_size_GB * 131072
+ refcount_cache_size = disk_size_GB * 32768
+
+QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
+cache of 256KB (262144 bytes), so using the formulas we've just seen
+we have
+
+ 1048576 / 131072 = 8 GB of virtual disk covered by that cache
+ 262144 / 32768 = 8 GB
+
+
+How to configure the cache sizes
+--------------------------------
+Cache sizes can be configured using the -drive option in the
+command-line, or the 'blockdev-add' QMP command.
+
+There are three options available, and all of them take bytes:
+
+"l2-cache-size": maximum size of the L2 table cache
+"refcount-cache-size": maximum size of the refcount block cache
+"cache-size": maximum size of both caches combined
+
+There are two things that need to be taken into account:
+
+ - Both caches must have a size that is a multiple of the cluster
+ size.
+
+ - The L2 cache should be 4 times bigger than the refcount cache in
+ order to cover the same amount of disk space.
+
+However, if you only set one of the options above, QEMU will
+automatically adjust the others so that the 1/4 ratio is maintained.
+This means that these options are equivalent:
+
+-drive file=hd.img,l2-cache-size=2097152
+-drive file=hd.img,refcount-cache-size=524288
+-drive file=hd.img,cache-size=2621440
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function
2015-05-11 12:54 ` [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function Alberto Garcia
@ 2015-05-11 13:12 ` Max Reitz
0 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2015-05-11 13:12 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi
On 11.05.2015 14:54, Alberto Garcia wrote:
> This function never receives an invalid table pointer, so we can make
> it void and remove all the error checking code.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block/qcow2-cache.c | 7 +------
> block/qcow2-cluster.c | 50 ++++++++++----------------------------------------
> block/qcow2-refcount.c | 29 +++++------------------------
> block/qcow2.h | 2 +-
> 4 files changed, 17 insertions(+), 71 deletions(-)
Reviewed-by: Max Reitz <mreitz@redhat.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches
2015-05-11 12:55 ` [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches Alberto Garcia
@ 2015-05-11 13:23 ` Max Reitz
2015-05-11 13:30 ` Alberto Garcia
2015-05-11 15:12 ` Eric Blake
1 sibling, 1 reply; 15+ messages in thread
From: Max Reitz @ 2015-05-11 13:23 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi
On 11.05.2015 14:55, Alberto Garcia wrote:
> QEMU has options to configure the size of the L2 and refcount caches
> for the qcow2 format. However, choosing the right sizes for a
> particular disk image is not a straightforward operation since the
> ratio between the cache size and the allocated disk space is not
> obvious and depends on the size of the cluster.
>
> This document attempts to give an overview of both caches and how to
> configure their sizes.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
> docs/qcow2-cache.txt | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 117 insertions(+)
> create mode 100644 docs/qcow2-cache.txt
>
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> new file mode 100644
> index 0000000..2570172
> --- /dev/null
> +++ b/docs/qcow2-cache.txt
> @@ -0,0 +1,117 @@
> +qcow2 L2/refcount cache configuration
> +=====================================
> +The QEMU qcow2 driver has two caches that can improve the I/O
> +performance significantly. However, setting the right cache sizes is
> +not a straightforward operation.
> +
> +This document attempts to give an overview of the L2 and refcount
> +caches, and how to configure them.
> +
> +Please refer to the docs/specs/qcow2.txt file for an in-depth
> +technical description of the qcow2 file format.
> +
> +
> +Clusters
> +--------
> +A qcow2 file is organized in units of constant size called clusters.
> +
> +The cluster size is configurable, but it must be a power of two and
> +its value 512 bytes or higher. QEMU currently defaults to 64 KB
> +clusters, and it does not support sizes larger than 2MB.
> +
> +The 'qemu-img create' command supports specifying the size using the
> +cluster_size option:
> +
> + qemu-img create -f qcow2 -o cluster_size=128K hd.img 4G
> +
> +
> +The L2 tables
> +-------------
> +The qcow2 format uses a two-level structure to map the virtual disk as
> +seen by the guest to the disk image in the host. These structures are
> +called the L1 and L2 tables.
> +
> +There is one single L1 table per disk image. The table is small and is
> +always kept in memory.
> +
> +There can be many L2 tables, depending on how much space has been
> +allocated in the image. Each table is one cluster in size. In order to
> +read or write data from the virtual disk, QEMU needs to read its
> +corresponding L2 table to find out where that data is located. Since
> +reading the table for each I/O operation can be expensive, QEMU keeps
> +an L2 cache in memory to speed up disk access.
> +
> +The size of the L2 cache can be configured, and setting the right
> +value can improve the I/O performance significantly.
> +
> +
> +The refcount blocks
> +-------------------
> +The qcow2 format also mantains a reference count for each cluster. The
> +data is stored in a two-level structure similar to the L1/L2 tables
> +described above.
> +
> +The second level structures are called refcount blocks, are also one
> +cluster in size and the number is also variable and dependent on the
> +amount of allocated space.
> +
> +QEMU keeps a refcount cache to speed up I/O much like the
> +aforementioned L2 cache, and its size can also be configured.
> +
> +
> +Choosing the right cache sizes
> +------------------------------
> +In order to choose the cache sizes we need to know how they relate to
> +the amount of allocated space.
> +
> +The amount of virtual disk that can be mapped by the L2 and refcount
> +caches (in bytes) is:
> +
> + disk_size = l2_cache_size * cluster_size / 8
> + disk_size = refcount_cache_size * cluster_size / 2
Only with the default of refcount_bits=16. In the general case, it's
refcount_cache_size * cluster_size * 8 / refcount_bits.
> +
> +With the default cluster size (64KB), that is
> +
> + disk_size = l2_cache_size * 8192
> + disk_size = refcount_cache_size * 32768
> +
> +So in order to cover n GB of disk space with the default cluster size
> +we need:
> +
> + l2_cache_size = disk_size_GB * 131072
> + refcount_cache_size = disk_size_GB * 32768
> +
> +QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
> +cache of 256KB (262144 bytes), so using the formulas we've just seen
> +we have
> +
> + 1048576 / 131072 = 8 GB of virtual disk covered by that cache
> + 262144 / 32768 = 8 GB
> +
> +
> +How to configure the cache sizes
> +--------------------------------
> +Cache sizes can be configured using the -drive option in the
> +command-line, or the 'blockdev-add' QMP command.
> +
> +There are three options available, and all of them take bytes:
> +
> +"l2-cache-size": maximum size of the L2 table cache
> +"refcount-cache-size": maximum size of the refcount block cache
> +"cache-size": maximum size of both caches combined
> +
> +There are two things that need to be taken into account:
> +
> + - Both caches must have a size that is a multiple of the cluster
> + size.
> +
> + - The L2 cache should be 4 times bigger than the refcount cache in
> + order to cover the same amount of disk space.
Again, only for refcount_bits=16. For the general case, it's "8 * 8 /
refcount_bits = 64 / refcount_bits times as big".
> +
> +However, if you only set one of the options above, QEMU will
> +automatically adjust the others so that the 1/4 ratio is maintained.
> +This means that these options are equivalent:
> +
> +-drive file=hd.img,l2-cache-size=2097152
> +-drive file=hd.img,refcount-cache-size=524288
> +-drive file=hd.img,cache-size=2621440
Apart from this minor thing, looks good. Well, except that maybe it
should be preferred to use .qcow2 as the extension for qcow2 files
(instead of .img).
Max
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches
2015-05-11 13:23 ` Max Reitz
@ 2015-05-11 13:30 ` Alberto Garcia
2015-05-11 14:08 ` Max Reitz
0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2015-05-11 13:30 UTC (permalink / raw)
To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi
On Mon 11 May 2015 03:23:25 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>> + disk_size = l2_cache_size * cluster_size / 8
>> + disk_size = refcount_cache_size * cluster_size / 2
>
> Only with the default of refcount_bits=16. In the general case, it's
> refcount_cache_size * cluster_size * 8 / refcount_bits.
I actually omitted that on purpose because I didn't want to go into too
many details and I hadn't realized that it's now possible to configure
refcount_bits. I will correct the document.
I wonder what happens then to this DEFAULT_L2_REFCOUNT_SIZE_RATIO of 4,
because that's only valid if refcount_bits == 16.
Is the user supposed to calculate that ratio manually and configure both
cache sizes?
> Apart from this minor thing, looks good. Well, except that maybe it
> should be preferred to use .qcow2 as the extension for qcow2 files
> (instead of .img).
Sure, I can change that too.
Berto
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches
2015-05-11 13:30 ` Alberto Garcia
@ 2015-05-11 14:08 ` Max Reitz
0 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2015-05-11 14:08 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi
On 11.05.2015 15:30, Alberto Garcia wrote:
> On Mon 11 May 2015 03:23:25 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>
>>> + disk_size = l2_cache_size * cluster_size / 8
>>> + disk_size = refcount_cache_size * cluster_size / 2
>> Only with the default of refcount_bits=16. In the general case, it's
>> refcount_cache_size * cluster_size * 8 / refcount_bits.
> I actually omitted that on purpose because I didn't want to go into too
> many details and I hadn't realized that it's now possible to configure
> refcount_bits. I will correct the document.
>
> I wonder what happens then to this DEFAULT_L2_REFCOUNT_SIZE_RATIO of 4,
> because that's only valid if refcount_bits == 16.
>
> Is the user supposed to calculate that ratio manually and configure both
> cache sizes?
Well... Considering nobody will probably ever use refcount_bits > 16, we
don't need to worry about 4 being too big of a ratio. And I think, for
cases of refcount_bits < 16, we wouldn't really save as much to justify
adjusting the ratio. So while it would be possible to make qcow2 choose
a variable ratio based on refcount_bits, I don't think we really need to
care about it.
But as for the user, I do think they should care about it, or at least
know about it. If they are already adjusting the cache sizes manually,
they should know exactly what the ratio should be. For justifying why
qemu does not adjust its default ratio, I think it's enough to write
that that default ratio (4) is geared towards the default value of
refcount_bits (16) and that should be enough.
If we do decide to implement a variable ratio later on, we can still
change this documentation.
Oh, and also note that the "range covered" by the metadata caches are
different things: For the L2 tables, it's a guest offset range, while
for the refblocks it's a host offset range. Also, you basically only
need the refcounts for allocating clusters and doing internal snapshot
operations (you don't need them for reads, and you don't need them for
non-allocating writes, thanks to the O_COPIED flag), so it would
actually make sense for the user to choose a ratio larger than 64 /
refcount_bits because refcounts are (1) rarely ever needed, and (2) when
they are needed, they are most likely for allocation, which most likely
will occur at the image's end (unless you're using internal snapshots or
you're manually discarding things), so you actually only need to cache
the refblock covering the image end.
Max
>
>> Apart from this minor thing, looks good. Well, except that maybe it
>> should be preferred to use .qcow2 as the extension for qcow2 files
>> (instead of .img).
> Sure, I can change that too.
>
> Berto
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches
2015-05-11 12:55 ` [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches Alberto Garcia
2015-05-11 13:23 ` Max Reitz
@ 2015-05-11 15:12 ` Eric Blake
1 sibling, 0 replies; 15+ messages in thread
From: Eric Blake @ 2015-05-11 15:12 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel
Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz
[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]
On 05/11/2015 06:55 AM, Alberto Garcia wrote:
> QEMU has options to configure the size of the L2 and refcount caches
> for the qcow2 format. However, choosing the right sizes for a
> particular disk image is not a straightforward operation since the
> ratio between the cache size and the allocated disk space is not
> obvious and depends on the size of the cluster.
>
> This document attempts to give an overview of both caches and how to
> configure their sizes.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
> docs/qcow2-cache.txt | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 117 insertions(+)
> create mode 100644 docs/qcow2-cache.txt
>
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> new file mode 100644
> index 0000000..2570172
> --- /dev/null
> +++ b/docs/qcow2-cache.txt
> @@ -0,0 +1,117 @@
> +qcow2 L2/refcount cache configuration
> +=====================================
Not a show-stopper, but you might want to consider an explicit copyright
notice and license (as several recent docs/ additions have done),
particularly if you want the documentation to be under a looser license
than the default GPLv2+.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
` (7 preceding siblings ...)
2015-05-11 12:55 ` [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches Alberto Garcia
@ 2015-05-12 9:57 ` Kevin Wolf
8 siblings, 0 replies; 15+ messages in thread
From: Kevin Wolf @ 2015-05-12 9:57 UTC (permalink / raw)
To: Alberto Garcia; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz
Am 11.05.2015 um 14:54 hat Alberto Garcia geschrieben:
> New version of the qcow2 cache patches:
>
> v3:
> - Removed a dead comment in patch #3
> - New document explaining how to configure the cache sizes
>
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg00833.html
> - Don't do pointer arithmetic on void *
> - Rename table_addr() to qcow2_cache_get_table_addr()
> - Add qcow2_cache_get_table_idx()
> - Cast cache size to size_t to prevent overflows
> - Make qcow2_cache_put() a void function
> - Don't store the cluster size in the cache, get it from the BDS instead
>
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04355.html
Thanks, applied patches 1-7. As discussed on IRC, I'm waiting for a new
version for patch 8.
Kevin
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-05-12 9:57 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-11 12:54 [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 1/8] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 2/8] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 3/8] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 4/8] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 5/8] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
2015-05-11 12:54 ` [Qemu-devel] [PATCH 6/8] qcow2: make qcow2_cache_put() a void function Alberto Garcia
2015-05-11 13:12 ` Max Reitz
2015-05-11 12:54 ` [Qemu-devel] [PATCH 7/8] qcow2: style fixes in qcow2-cache.c Alberto Garcia
2015-05-11 12:55 ` [Qemu-devel] [PATCH 8/8] docs: document how to configure the qcow2 L2/refcount caches Alberto Garcia
2015-05-11 13:23 ` Max Reitz
2015-05-11 13:30 ` Alberto Garcia
2015-05-11 14:08 ` Max Reitz
2015-05-11 15:12 ` Eric Blake
2015-05-12 9:57 ` [Qemu-devel] [PATCH v3 0/8] qcow2 L2/refcount cache improvements Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).