* [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
@ 2013-01-02 16:16 Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
` (31 more replies)
0 siblings, 32 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
This patchset is a cleanup of the previous QCOW2 deduplication rfc.
One can compile and install https://github.com/wernerd/Skein3Fish and use the
--enable-skein-dedup configure option in order to use the faster skein HASH.
Images must be created with "-o dedup=[skein|sha256]" in order to activate the
deduplication in the image.
Deduplication is now fast enough to be usable.
v4: Fix and complete qcow2 spec [Stefan]
Hash the hash_algo field in the header extension [Stefan]
Fix qcow2 spec [Eric]
Remove pointer to hash and simplify hash memory management [Stefan]
Rename and move qcow2_read_cluster_data to qcow2.c [Stefan]
Document lock dropping behaviour of the previous function [Stefan]
cleanup qcow2_dedup_read_missing_cluster_data [Stefan]
rename *_offset to *_sect [Stefan]
add a ./configure check for ssl [Stefan]
Replace openssl by gnutls [Stefan]
Implement Skein hashes
Rewrite pretty every qcow2-dedup.c commits after Add
qcow2_dedup_read_missing_and_concatenate to simplify the code
Use 64KB deduplication hash block to reduce allocation flushes
Use 64KB l2 tables to reduce allocation flushes [breaks compatibility]
Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultings
in frequent caches flushes
Do not create and load dedup RAM structures when bdrs->read_only is true
v3: make it work barely
replace kernel red black trees by gtree.
*** BLURB HERE ***
Benoît Canet (30):
qcow2: Add deduplication to the qcow2 specification.
qcow2: Add deduplication structures and fields.
qcow2: Add qcow2_dedup_read_missing_and_concatenate
qcow2: Make update_refcount public.
qcow2: Create a way to link to l2 tables when deduplicating.
qcow2: Add qcow2_dedup and related functions
qcow2: Add qcow2_dedup_store_new_hashes.
qcow2: Implement qcow2_compute_cluster_hash.
qcow2: Extract qcow2_dedup_grow_table
qcow2: Add qcow2_dedup_grow_table and use it.
qcow2: create function to load deduplication hashes at startup.
qcow2: Load and save deduplication table header extension.
qcow2: Extract qcow2_do_table_init.
qcow2-cache: Allow to choose table size at creation.
qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
block: Add qemu-img dedup create option.
qcow2: Behave correctly when refcount reach 0 or 2^16.
qcow2: Integrate deduplication in qcow2_co_writev loop.
qcow2: Serialize write requests when deduplication is activated.
qcow2: Add verification of dedup table.
qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
qcow2: Integrate SKEIN hash algorithm in deduplication.
qcow2: Add lazy refcounts to deduplication to prevent
qcow2_cache_set_dependency loops
qcow2: Use large L2 table for deduplication.
qcow: Set dedup cluster block size to 64KB.
qcow2: init and cleanup deduplication.
qemu-iotests: Filter dedup=on/off so existing tests don't break.
block/Makefile.objs | 1 +
block/qcow2-cache.c | 12 +-
block/qcow2-cluster.c | 116 +++--
block/qcow2-dedup.c | 1157 ++++++++++++++++++++++++++++++++++++++++++
block/qcow2-refcount.c | 157 ++++--
block/qcow2.c | 357 +++++++++++--
block/qcow2.h | 120 ++++-
configure | 55 ++
docs/specs/qcow2.txt | 100 +++-
include/block/block_int.h | 1 +
tests/qemu-iotests/common.rc | 3 +-
11 files changed, 1955 insertions(+), 124 deletions(-)
create mode 100644 block/qcow2-dedup.c
--
1.7.10.4
^ permalink raw reply [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-03 18:18 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 02/30] qcow2: Add deduplication structures and fields Benoît Canet
` (30 subsequent siblings)
31 siblings, 1 reply; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
docs/specs/qcow2.txt | 100 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 1 deletion(-)
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..c9c0d47 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
tables to repair refcounts before accessing the
image.
- Bits 1-63: Reserved (set to 0)
+ Bit 1: Deduplication bit. If this bit is set then
+ deduplication is used on this image.
+ L2 tables size 64KB is different from
+ cluster size 4KB.
+
+ Bits 2-63: Reserved (set to 0)
80 - 87: compatible_features
Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following:
0x00000000 - End of the header extension area
0xE2792ACA - Backing file format name
0x6803f857 - Feature name table
+ 0xCD8E819B - Deduplication
other - Unknown header extension, can be safely
ignored
@@ -159,6 +165,98 @@ the header extension data. Each entry look like this:
terminated if it has full length)
+== Deduplication ==
+
+The deduplication extension contains the informations concerning the
+deduplication.
+
+ Byte 0 - 7: Offset of the RAM deduplication table
+
+ 8 - 11: Size of the RAM deduplication table = number of L1 64-bit
+ pointers
+
+ 12: Hash algo enum field
+ 0: SHA-256
+ 1: SHA3
+ 2: SKEIN-256
+
+ 13: Dedup stategies bitmap
+ 0: RAM based hash lookup
+ 1: Disk based hash lookup
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to store permanently the informations required to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0 l1_dedup_index Size
+ |
+|--------------------------------------------------------------------|
+| | |
+| | L1 Deduplication table |
+| | |
+|--------------------------------------------------------------------|
+ |
+ |
+ |
+0 | l2_dedup_block_entries
+ |
+|---------------------------------|
+| |
+| L2 deduplication block |
+| |
+| l2_dedup_index |
+|---------------------------------|
+ |
+ 0 | 40
+ |
+ |-------------------------------|
+ | |
+ | Deduplication table entry |
+ | |
+ |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+ Byte 0 - 31: hash of data cluster
+
+ 32 - 39: Logical offset of first encountered block having
+ this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = dedup_block_size / (32 + 8)
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+cluster_size = 4096
+dedup_block_size = 65536
+l2_size = 65536
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
== Host cluster management ==
qcow2 manages the allocation of host clusters by maintaining a reference count
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 02/30] qcow2: Add deduplication structures and fields.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
` (29 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/block/qcow2.h b/block/qcow2.h
index 718b52b..637c86a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -58,6 +58,50 @@
#define DEFAULT_CLUSTER_SIZE 65536
+#define HASH_LENGTH 32
+
+typedef enum {
+ QCOW_HASH_SHA256 = 0,
+ QCOW_HASH_SHA3 = 1,
+ QCOW_HASH_SKEIN = 2,
+} QCowHashAlgo;
+
+typedef struct {
+ uint8_t data[HASH_LENGTH]; /* 32 bytes hash of a given cluster */
+} QCowHash;
+
+/* Used to keep a single precomputed hash between the calls of the dedup
+ * function
+ */
+typedef struct {
+ QCowHash hash;
+ bool reuse; /* The hash is precomputed reuse it */
+} QcowPersistantHash;
+
+/* deduplication node */
+typedef struct {
+ QCowHash hash;
+ uint64_t physical_sect; /* where the cluster is stored on disk */
+ uint64_t first_logical_sect; /* logical sector of the first occurence of
+ * this cluster
+ */
+} QCowHashNode;
+
+/* Undedupable hashes that must be written later to disk */
+typedef struct QCowHashElement {
+ QCowHash hash;
+ QTAILQ_ENTRY(QCowHashElement) next;
+} QCowHashElement;
+
+typedef struct {
+ QcowPersistantHash phash; /* contains a hash persisting between calls of
+ * qcow2_dedup()
+ */
+ QTAILQ_HEAD(, QCowHashElement) undedupables;
+ int nb_clusters_processed;
+ int nb_undedupable_sectors;
+} QCowDedupState;
+
typedef struct QCowHeader {
uint32_t magic;
uint32_t version;
@@ -114,8 +158,10 @@ enum {
enum {
QCOW2_INCOMPAT_DIRTY_BITNR = 0,
QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+ QCOW2_INCOMPAT_DEDUP_BITNR = 1,
+ QCOW2_INCOMPAT_DEDUP = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
- QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY,
+ QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
};
/* Compatible feature bits */
@@ -138,6 +184,7 @@ typedef struct BDRVQcowState {
int cluster_sectors;
int l2_bits;
int l2_size;
+ int hash_block_size;
int l1_size;
int l1_vm_state_index;
int csize_shift;
@@ -148,6 +195,7 @@ typedef struct BDRVQcowState {
Qcow2Cache* l2_table_cache;
Qcow2Cache* refcount_block_cache;
+ Qcow2Cache *dedup_cluster_cache;
uint8_t *cluster_cache;
uint8_t *cluster_data;
@@ -160,6 +208,13 @@ typedef struct BDRVQcowState {
int64_t free_cluster_index;
int64_t free_byte_offset;
+ bool has_dedup;
+ QCowHashAlgo dedup_hash_algo;
+ uint64_t *dedup_table;
+ uint64_t dedup_table_offset;
+ int32_t dedup_table_size;
+ GTree *dedup_tree_by_hash;
+ GTree *dedup_tree_by_sect;
CoMutex lock;
uint32_t crypt_method; /* current crypt method, 0 if no key yet */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 02/30] qcow2: Add deduplication structures and fields Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public Benoît Canet
` (28 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
This function is used to read missing data when unaligned writes are
done. This function also concatenate missing data with the given
qiov data in order to prepare a buffer used to look for duplicated
clusters.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/Makefile.objs | 1 +
block/qcow2-dedup.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.c | 36 +++++++++++++++-
block/qcow2.h | 12 ++++++
4 files changed, 167 insertions(+), 1 deletion(-)
create mode 100644 block/qcow2-dedup.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..21afc85 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2-dedup.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
new file mode 100644
index 0000000..4e99eb1
--- /dev/null
+++ b/block/qcow2-dedup.c
@@ -0,0 +1,119 @@
+/*
+ * Deduplication for the QCOW2 format
+ *
+ * Copyright (C) Nodalink, SARL. 2012-2013
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "qcow2.h"
+
+/*
+ * Prepare a buffer containing all the required data required to compute cluster
+ * sized deduplication hashes.
+ * If sector_num or nb_sectors are not cluster-aligned, missing data
+ * before/after the qiov will be read.
+ *
+ * @qiov: the qiov for which missing data must be read
+ * @sector_num: the first sectors that must be read into the qiov
+ * @nb_sectors: the number of sectors to read into the qiov
+ * @data: the place where the data will be concatenated and stored
+ * @nb_data_sectors: the resulting size of the contatenated data (in sectors)
+ * @ret: negative on error
+ */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ uint64_t sector_num,
+ int nb_sectors,
+ uint8_t **data,
+ int *nb_data_sectors)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+ uint64_t cluster_beginning_sector;
+ uint64_t first_sector_after_qiov;
+ int cluster_beginning_nr;
+ int cluster_ending_nr;
+ int unaligned_ending_nr;
+ uint64_t max_cluster_ending_nr;
+
+ /* compute how much and where to read at the beginning */
+ cluster_beginning_nr = sector_num & (s->cluster_sectors - 1);
+ cluster_beginning_sector = sector_num - cluster_beginning_nr;
+
+ /* for the ending */
+ first_sector_after_qiov = sector_num + nb_sectors;
+ unaligned_ending_nr = first_sector_after_qiov & (s->cluster_sectors - 1);
+ cluster_ending_nr = unaligned_ending_nr ?
+ s->cluster_sectors - unaligned_ending_nr : 0;
+
+ /* compute total size in sectors and allocate memory */
+ *nb_data_sectors = cluster_beginning_nr + nb_sectors + cluster_ending_nr;
+ *data = qemu_blockalign(bs, *nb_data_sectors * BDRV_SECTOR_SIZE);
+
+ /* read beginning */
+ if (cluster_beginning_nr) {
+ ret = qcow2_read_cluster_data(bs,
+ *data,
+ cluster_beginning_sector,
+ cluster_beginning_nr);
+ }
+
+ if (ret < 0) {
+ goto fail;
+ }
+
+ /* append qiov content */
+ qemu_iovec_to_buf(qiov, 0, *data + cluster_beginning_nr * BDRV_SECTOR_SIZE,
+ qiov->size);
+
+ /* Fix cluster_ending_nr if we are at risk of reading outside the image
+ * (Cluster unaligned image size)
+ */
+ max_cluster_ending_nr = bs->total_sectors - first_sector_after_qiov;
+ cluster_ending_nr = max_cluster_ending_nr < (uint64_t) cluster_ending_nr ?
+ (int) max_cluster_ending_nr : cluster_ending_nr;
+
+ /* read and add ending */
+ if (cluster_ending_nr) {
+ ret = qcow2_read_cluster_data(bs,
+ *data +
+ (cluster_beginning_nr +
+ nb_sectors) *
+ BDRV_SECTOR_SIZE,
+ first_sector_after_qiov,
+ cluster_ending_nr);
+ }
+
+ if (ret < 0) {
+ goto fail;
+ }
+
+ return 0;
+
+fail:
+ qemu_vfree(*data);
+ *data = NULL;
+ return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index d603f98..410d3c1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -69,7 +69,6 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
-
/*
* read qcow2 extension and fill bs
* start reading from start_offset
@@ -1110,6 +1109,41 @@ fail:
return ret;
}
+/**
+ * Read some data from the QCOW2 file
+ *
+ * Important: s->lock is dropped. Things can change before the function return
+ * to the caller.
+ *
+ * @data: the buffer where the data must be stored
+ * @sector_num: the sector number to read in the QCOW2 file
+ * @nb_sectors: the number of sectors to read
+ * @ret: negative on error
+ */
+int qcow2_read_cluster_data(BlockDriverState *bs,
+ uint8_t *data,
+ uint64_t sector_num,
+ int nb_sectors)
+{
+ BDRVQcowState *s = bs->opaque;
+ QEMUIOVector qiov;
+ struct iovec iov;
+ int ret;
+
+ iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
+ iov.iov_base = data;
+ qemu_iovec_init_external(&qiov, &iov, 1);
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ error_report("failed to read %d sectors at offset %" PRIu64 "\n",
+ nb_sectors, sector_num);
+ }
+
+ return ret;
+}
+
static int qcow2_change_backing_file(BlockDriverState *bs,
const char *backing_file, const char *backing_fmt)
{
diff --git a/block/qcow2.h b/block/qcow2.h
index 637c86a..730c9be 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -361,6 +361,10 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
int qcow2_mark_dirty(BlockDriverState *bs);
int qcow2_update_header(BlockDriverState *bs);
+int qcow2_read_cluster_data(BlockDriverState *bs,
+ uint8_t *data,
+ uint64_t sector_num,
+ int nb_sectors);
/* qcow2-refcount.c functions */
int qcow2_refcount_init(BlockDriverState *bs);
@@ -429,4 +433,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
+/* qcow2-dedup.c functions */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ uint64_t sector,
+ int sectors_nr,
+ uint8_t **dedup_cluster_data,
+ int *dedup_cluster_data_nr);
+
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (2 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
` (27 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 6 +-----
block/qcow2.h | 2 ++
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a95aa6..e014b0e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -27,10 +27,6 @@
#include "block/qcow2.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
- int64_t offset, int64_t length,
- int addend);
-
/*********************************************************/
/* refcount handling */
@@ -413,7 +409,7 @@ fail_block:
}
/* XXX: cache several refcount block clusters ? */
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
int64_t offset, int64_t length, int addend)
{
BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 730c9be..3307481 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -384,6 +384,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
BdrvCheckMode fix);
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+ int64_t offset, int64_t length, int addend);
/* qcow2-cluster.c functions */
int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (3 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions Benoît Canet
` (26 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-cluster.c | 8 ++++++--
block/qcow2.h | 9 +++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 56fccf9..63a7241 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -693,7 +693,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
old_cluster[j++] = l2_table[l2_index + i];
l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
- (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+ (i << s->cluster_bits)) |
+ (m->oflag_copied ? QCOW_OFLAG_COPIED : 0));
}
@@ -706,7 +707,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
* If this was a COW, we need to decrease the refcount of the old cluster.
* Also flush bs->file to get the right order for L2 and refcount update.
*/
- if (j != 0) {
+ if (!m->overwrite && j != 0) {
for (i = 0; i < j; i++) {
qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
}
@@ -1006,6 +1007,9 @@ again:
.offset = nb_sectors * BDRV_SECTOR_SIZE,
.nb_sectors = avail_sectors - nb_sectors,
},
+
+ .oflag_copied = true,
+ .overwrite = false,
};
qemu_co_queue_init(&(*m)->dependent_requests);
QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 3307481..9403431 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -59,6 +59,10 @@
#define DEFAULT_CLUSTER_SIZE 65536
#define HASH_LENGTH 32
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_FLAG_EMPTY (1LL << 62)
+/* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
+#define QCOW_FLAG_FIRST (1LL << 63)
typedef enum {
QCOW_HASH_SHA256 = 0,
@@ -289,6 +293,11 @@ typedef struct QCowL2Meta
*/
CoQueue dependent_requests;
+ /* set to true if QCOW_OFLAG_COPIED must be set in the L2 table entry */
+ bool oflag_copied;
+ /* set to true if we are overwriting an L2 table entry */
+ bool overwrite;
+
/**
* The COW Region between the start of the first allocated cluster and the
* area the guest actually writes to.
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (4 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
` (25 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 436 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.h | 5 +
2 files changed, 441 insertions(+)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4e99eb1..5901749 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -117,3 +117,439 @@ fail:
*data = NULL;
return ret;
}
+
+/*
+ * Build a QCowHashNode structure
+ *
+ * @hash: the given hash
+ * @physical_sect: the cluster offset in the QCOW2 file
+ * @first_logical_sect: the first logical cluster offset written
+ * @ret: the build QCowHashNode
+ */
+static QCowHashNode *qcow2_dedup_build_qcow_hash_node(QCowHash *hash,
+ uint64_t physical_sect,
+ uint64_t first_logical_sect)
+{
+ QCowHashNode *hash_node;
+
+ hash_node = g_new0(QCowHashNode, 1);
+ memcpy(hash_node->hash.data, hash->data, HASH_LENGTH);
+ hash_node->physical_sect = physical_sect;
+ hash_node->first_logical_sect = first_logical_sect;
+
+ return hash_node;
+}
+
+/*
+ * Compute the hash of a given cluster
+ *
+ * @data: a buffer containing the cluster data
+ * @hash: a QCowHash where to store the computed hash
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_compute_cluster_hash(BlockDriverState *bs,
+ QCowHash *hash,
+ uint8_t *data)
+{
+ return 0;
+}
+
+/*
+ * Get a QCowHashNode corresponding to a cluster data
+ *
+ * @phash: if phash can be used no hash is computed
+ * @data: a buffer containing the cluster
+ * @nb_clusters_processed: the number of cluster to skip in the buffer
+ * @err: Error code if any
+ * @ret: QCowHashNode of the duplicated cluster or NULL if not found
+ */
+static QCowHashNode *qcow2_get_hash_node_for_cluster(BlockDriverState *bs,
+ QcowPersistantHash *phash,
+ uint8_t *data,
+ int nb_clusters_processed,
+ int *err)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+ *err = 0;
+
+ /* no hash has been provided compute it and store it for later usage */
+ if (!phash->reuse) {
+ ret = qcow2_compute_cluster_hash(bs,
+ &phash->hash,
+ data +
+ nb_clusters_processed *
+ s->cluster_size);
+ }
+
+ /* do not reuse the hash anymore if it was precomputed */
+ phash->reuse = false;
+
+ if (ret < 0) {
+ *err = ret;
+ return NULL;
+ }
+
+ return g_tree_lookup(s->dedup_tree_by_hash, &phash->hash);
+}
+
+/*
+ * Build a QCowHashNode from a given QCowHash and insert it into the tree
+ *
+ * @hash: the given QCowHash
+ */
+static void qcow2_build_and_insert_hash_node(BlockDriverState *bs,
+ QCowHash *hash)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHashNode *hash_node;
+
+ /* build the hash node with QCOW_FLAG_EMPTY as offsets so we will remember
+ * to fill these field later with real values.
+ */
+ hash_node = qcow2_dedup_build_qcow_hash_node(hash,
+ QCOW_FLAG_EMPTY,
+ QCOW_FLAG_EMPTY);
+ g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+}
+
+/*
+ * Helper used to build a QCowHashElement
+ *
+ * @hash: the QCowHash to use
+ * @ret: a newly allocated QCowHashElement containing the given hash
+ */
+static QCowHashElement *qcow2_build_dedup_hash(QCowHash *hash)
+{
+ QCowHashElement *dedup_hash;
+ dedup_hash = g_new0(QCowHashElement, 1);
+ memcpy(dedup_hash->hash.data, hash->data, HASH_LENGTH);
+ return dedup_hash;
+}
+
+/*
+ * Helper used to link a deduplicated cluster in the l2
+ *
+ * @logical_sect: the cluster sector seen by the guest
+ * @physical_sect: the cluster sector in the QCOW2 file
+ * @overwrite: true if we must overwrite the L2 table entry
+ * @ret:
+ */
+static int qcow2_dedup_link_l2(BlockDriverState *bs,
+ uint64_t logical_sect,
+ uint64_t physical_sect,
+ bool overwrite)
+{
+ QCowL2Meta m = {
+ .alloc_offset = physical_sect << 9,
+ .offset = logical_sect << 9,
+ .nb_clusters = 1,
+ .nb_available = 0,
+ .cow_start = {
+ .offset = 0,
+ .nb_sectors = 0,
+ },
+ .cow_end = {
+ .offset = 0,
+ .nb_sectors = 0,
+ },
+ .oflag_copied = false,
+ .overwrite = overwrite,
+ };
+ return qcow2_alloc_cluster_link_l2(bs, &m);
+}
+
+/* Clear the QCOW_OFLAG_COPIED from the first L2 entry written for a physical
+ * cluster.
+ *
+ * @hash_node: the duplicated hash node
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
+ QCowHashNode *hash_node)
+{
+ int ret = 0;
+ uint64_t first_logical_sect = hash_node->first_logical_sect;
+
+ /* QCOW_OFLAG_COPIED already cleared -> do nothing */
+ if (!(first_logical_sect & QCOW_FLAG_FIRST)) {
+ return 0;
+ }
+
+ /* note : QCOW_FLAG_FIRST == QCOW_OFLAG_COPIED */
+ first_logical_sect &= ~QCOW_FLAG_FIRST;
+
+ /* overwrite first L2 entry to clear QCOW_FLAG_COPIED */
+ ret = qcow2_dedup_link_l2(bs, first_logical_sect,
+ hash_node->physical_sect,
+ true);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
+ hash_node->first_logical_sect &= first_logical_sect;
+
+ return 0;
+}
+
+/* This function deduplicate a cluster
+ *
+ * @logical_sect: The logical sector of the write
+ * @hash_node: The duplicated cluster hash node
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_deduplicate_cluster(BlockDriverState *bs,
+ uint64_t logical_sect,
+ QCowHashNode *hash_node)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+
+ /* create new L2 entry */
+ ret = qcow2_dedup_link_l2(bs, logical_sect,
+ hash_node->physical_sect,
+ false);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* Increment the refcount of the cluster */
+ return update_refcount(bs,
+ (hash_node->physical_sect /
+ s->cluster_sectors) << s->cluster_bits,
+ 1, 1);
+}
+
+/* This function tries to deduplicate a given cluster.
+ *
+ * @sector_num: the logical sector number we are trying to deduplicate
+ * @phash: Used instead of computing the hash if provided
+ * @data: the buffer in which to look for a duplicated cluster
+ * @nb_clusters_processed: the number of cluster that must be skipped in data
+ * @ret: ret < 0 on error, 1 on deduplication else 0
+ */
+static int qcow2_try_dedup_cluster(BlockDriverState *bs,
+ QcowPersistantHash *phash,
+ uint64_t sector_num,
+ uint8_t *data,
+ int nb_clusters_processed)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+ QCowHashNode *hash_node;
+ uint64_t logical_sect;
+ uint64_t existing_physical_offset;
+ int pnum = s->cluster_sectors;
+
+ /* search the tree for duplicated cluster */
+ hash_node = qcow2_get_hash_node_for_cluster(bs,
+ phash,
+ data,
+ nb_clusters_processed,
+ &ret);
+
+ /* we won't reuse the hash on error */
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* if cluster is not duplicated store hash for later usage */
+ if (!hash_node) {
+ qcow2_build_and_insert_hash_node(bs, &phash->hash);
+ return 0;
+ }
+
+ logical_sect = sector_num & ~(s->cluster_sectors - 1);
+ ret = qcow2_get_cluster_offset(bs, logical_sect << 9,
+ &pnum, &existing_physical_offset);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* if we are rewriting the same cluster at the same place do nothing */
+ if (existing_physical_offset == hash_node->physical_sect << 9) {
+ return 1;
+ }
+
+ /* take care of not having refcount > 1 and QCOW_OFLAG_COPIED at once */
+ ret = qcow2_clear_l2_copied_flag_if_needed(bs, hash_node);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* do the deduplication */
+ ret = qcow2_deduplicate_cluster(bs, logical_sect,
+ hash_node);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ return 1;
+}
+
+
+static void add_hash_to_undedupable_list(BlockDriverState *bs,
+ QCowDedupState *ds)
+{
+ /* memorise hash for later storage in gtree and disk */
+ QCowHashElement *dedup_hash = qcow2_build_dedup_hash(&ds->phash.hash);
+ QTAILQ_INSERT_TAIL(&ds->undedupables, dedup_hash, next);
+}
+
+static int qcow2_dedup_starting_from_begining(BlockDriverState *bs,
+ QCowDedupState *ds,
+ uint64_t sector_num,
+ uint8_t *data,
+ int left_to_process)
+{
+ BDRVQcowState *s = bs->opaque;
+ int i;
+ int ret = 0;
+
+ for (i = 0; i < left_to_process; i++) {
+ ret = qcow2_try_dedup_cluster(bs,
+ &ds->phash,
+ sector_num + i * s->cluster_sectors,
+ data,
+ ds->nb_clusters_processed + i);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* stop if a cluster has not been deduplicated */
+ if (ret != 1) {
+ break;
+ }
+ }
+
+ return i;
+}
+
+static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
+ QCowDedupState *ds,
+ uint8_t *data,
+ int left_to_process)
+{
+ int i;
+ int ret = 0;
+ QCowHashNode *hash_node;
+
+ for (i = 0; i < left_to_process; i++) {
+ hash_node = qcow2_get_hash_node_for_cluster(bs,
+ &ds->phash,
+ data,
+ ds->nb_clusters_processed + i,
+ &ret);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* found a duplicated cluster : stop here */
+ if (hash_node) {
+ break;
+ }
+
+ qcow2_build_and_insert_hash_node(bs, &ds->phash.hash);
+ add_hash_to_undedupable_list(bs, ds);
+ }
+
+ return i;
+}
+
+
+/* Deduplicate all the cluster that can be deduplicated.
+ *
+ * Next it compute the number of non deduplicable sectors to come while storing
+ * the hashes of these sectors in a linked list for later usage.
+ * Then it compute the first duplicated cluster hash that come after non
+ * deduplicable cluster, this hash will be used at next call of the function
+ *
+ * @ds: a structure containing the state of the deduplication
+ * for this write request
+ * @sector_num: The logical sector
+ * @data: the buffer containing the data to deduplicate
+ * @data_nr: the size of the buffer in sectors
+ *
+ */
+int qcow2_dedup(BlockDriverState *bs,
+ QCowDedupState *ds,
+ uint64_t sector_num,
+ uint8_t *data,
+ int data_nr)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+ int deduped_clusters_nr = 0;
+ int left_to_process;
+ int begining_index;
+
+ begining_index = sector_num & (s->cluster_sectors - 1);
+
+ left_to_process = (data_nr / s->cluster_sectors) -
+ ds->nb_clusters_processed;
+
+ /* start deduplicating all that can be cluster after cluster */
+ ret = qcow2_dedup_starting_from_begining(bs,
+ ds,
+ sector_num,
+ data,
+ left_to_process);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ deduped_clusters_nr = ret;
+
+ left_to_process -= ret;
+ ds->nb_clusters_processed += ret;
+
+ /* We deduped everything till the end */
+ if (!left_to_process) {
+ ds->nb_undedupable_sectors = 0;
+ goto exit;
+ }
+
+ /* skip and account the first undedupable cluster found */
+ left_to_process--;
+ ds->nb_clusters_processed++;
+ ds->nb_undedupable_sectors += s->cluster_sectors;
+
+ add_hash_to_undedupable_list(bs, ds);
+
+ /* Count how many non duplicated sector can be written and memorize hashes
+ * to write them after data has reached disk.
+ */
+ ret = qcow2_count_next_non_dedupable_clusters(bs,
+ ds,
+ data,
+ left_to_process);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ left_to_process -= ret;
+ ds->nb_clusters_processed += ret;
+ ds->nb_undedupable_sectors += ret * s->cluster_sectors;
+
+ /* remember to reuse the last hash computed at new qcow2_dedup call */
+ if (left_to_process) {
+ ds->phash.reuse = true;
+ }
+
+exit:
+ if (!deduped_clusters_nr) {
+ return 0;
+ }
+
+ return deduped_clusters_nr * s->cluster_sectors - begining_index;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 9403431..a61e004 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -451,5 +451,10 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
int sectors_nr,
uint8_t **dedup_cluster_data,
int *dedup_cluster_data_nr);
+int qcow2_dedup(BlockDriverState *bs,
+ QCowDedupState *ds,
+ uint64_t sector_num,
+ uint8_t *data,
+ int data_nr);
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (5 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
` (24 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++++-
block/qcow2.h | 5 +
2 files changed, 319 insertions(+), 1 deletion(-)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 5901749..2a444f5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -29,6 +29,12 @@
#include "qemu-common.h"
#include "qcow2.h"
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+ QCowHash *hash,
+ uint64_t *first_logical_sect,
+ uint64_t physical_sect,
+ bool write);
+
/*
* Prepare a buffer containing all the required data required to compute cluster
* sized deduplication hashes.
@@ -291,7 +297,11 @@ static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
/* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
hash_node->first_logical_sect &= first_logical_sect;
- return 0;
+ /* clear the QCOW_FLAG_FIRST flag from disk */
+ return qcow2_dedup_read_write_hash(bs, &hash_node->hash,
+ &hash_node->first_logical_sect,
+ hash_node->physical_sect,
+ true);
}
/* This function deduplicate a cluster
@@ -553,3 +563,306 @@ exit:
return deduped_clusters_nr * s->cluster_sectors - begining_index;
}
+
+
+/* Create a deduplication table hash block, write it's offset to disk and
+ * reference it in the RAM deduplication table
+ *
+ * sync this to disk and get the dedup cluster cache entry
+ *
+ * @index: index in the RAM deduplication table
+ * @ret: offset on success, negative on error
+ */
+static uint64_t qcow2_create_block(BlockDriverState *bs,
+ int32_t index)
+{
+ BDRVQcowState *s = bs->opaque;
+ int64_t offset;
+ uint64_t data64;
+ int ret = 0;
+
+ /* allocate a new dedup table hash block */
+ offset = qcow2_alloc_clusters(bs, s->hash_block_size);
+
+ if (offset < 0) {
+ return offset;
+ }
+
+ ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ if (ret < 0) {
+ goto free_fail;
+ }
+
+ /* write the new block offset in the dedup table L1 */
+ data64 = cpu_to_be64(offset);
+ ret = bdrv_pwrite_sync(bs->file,
+ s->dedup_table_offset +
+ index * sizeof(uint64_t),
+ &data64, sizeof(data64));
+
+ if (ret < 0) {
+ goto free_fail;
+ }
+
+ s->dedup_table[index] = offset;
+
+ return offset;
+
+free_fail:
+ qcow2_free_clusters(bs, offset, s->hash_block_size);
+ return ret;
+}
+
+static int qcow2_create_and_get_block(BlockDriverState *bs,
+ uint32_t index,
+ uint8_t **block)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret = 0;
+ int64_t offset;
+
+ offset = qcow2_create_block(bs, index);
+
+ if (offset < 0) {
+ return offset;
+ }
+
+
+ /* get an empty cluster from the dedup cache */
+ ret = qcow2_cache_get_empty(bs, s->dedup_cluster_cache,
+ offset,
+ (void **) block);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* clear it */
+ memset(*block, 0, s->hash_block_size);
+
+ return 0;
+}
+
+static inline bool qcow2_has_dedup_block(BlockDriverState *bs,
+ uint32_t index)
+{
+ BDRVQcowState *s = bs->opaque;
+ return s->dedup_table[index] == 0 ? false : true;
+}
+
+static inline void qcow2_write_hash_to_block_and_dirty(BlockDriverState *bs,
+ uint8_t *block,
+ QCowHash *hash,
+ int offset,
+ uint64_t *logical_sect)
+{
+ BDRVQcowState *s = bs->opaque;
+ uint64_t first;
+ first = cpu_to_be64(*logical_sect);
+ memcpy(block + offset, hash->data, HASH_LENGTH);
+ memcpy(block + offset + HASH_LENGTH, &first, 8);
+ qcow2_cache_entry_mark_dirty(s->dedup_cluster_cache, block);
+}
+
+static inline uint64_t qcow2_read_hash_from_block(uint8_t *block,
+ QCowHash *hash,
+ int offset)
+{
+ uint64_t first;
+ memcpy(hash->data, block + offset, HASH_LENGTH);
+ memcpy(&first, block + offset + HASH_LENGTH, 8);
+ return be64_to_cpu(first);
+}
+
+/* Read/write a given hash and cluster_sect from/to the dedup table
+ *
+ * This function doesn't flush the dedup cache to disk
+ *
+ * @hash: the hash to read or store
+ * @first_logical_sect: logical sector of the QCOW_FLAG_OCOPIED cluster
+ * @physical_sect: sector of the cluster in QCOW2 file (in sectors)
+ * @write: true to write, false to read
+ * @ret: 0 on succes, errno on error
+ */
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+ QCowHash *hash,
+ uint64_t *first_logical_sect,
+ uint64_t physical_sect,
+ bool write)
+{
+ BDRVQcowState *s = bs->opaque;
+ uint8_t *block = NULL;
+ int ret = 0;
+ int64_t cluster_number;
+ uint32_t index_in_dedup_table;
+ int offset_in_block;
+ int nb_hash_in_block = s->hash_block_size / (HASH_LENGTH + 8);
+
+ cluster_number = physical_sect / s->cluster_sectors;
+ index_in_dedup_table = cluster_number / nb_hash_in_block;
+
+ if (s->dedup_table_size <= index_in_dedup_table) {
+ return -ENOSPC;
+ }
+
+ /* if we must read and there is nothing to read return a null hash */
+ if (!qcow2_has_dedup_block(bs, index_in_dedup_table) && !write) {
+ memset(hash->data, 0, HASH_LENGTH);
+ *first_logical_sect = 0;
+ return 0;
+ }
+
+ if (qcow2_has_dedup_block(bs, index_in_dedup_table)) {
+ ret = qcow2_cache_get(bs,
+ s->dedup_cluster_cache,
+ s->dedup_table[index_in_dedup_table],
+ (void **) &block);
+ } else {
+ ret = qcow2_create_and_get_block(bs,
+ index_in_dedup_table,
+ &block);
+ }
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ offset_in_block = (cluster_number % nb_hash_in_block) *
+ (HASH_LENGTH + 8);
+
+ if (write) {
+ qcow2_write_hash_to_block_and_dirty(bs,
+ block,
+ hash,
+ offset_in_block,
+ first_logical_sect);
+ } else {
+ *first_logical_sect = qcow2_read_hash_from_block(block,
+ hash,
+ offset_in_block);
+ }
+
+ qcow2_cache_put(bs, s->dedup_cluster_cache, (void **) &block);
+
+ return 0;
+}
+
+static inline bool is_hash_node_empty(QCowHashNode *hash_node)
+{
+ return hash_node->physical_sect & QCOW_FLAG_EMPTY;
+}
+
+/* This function removes a hash_node from the trees given a physical sector
+ *
+ * @physical_sect: The physical sector of the cluster corresponding to the hash
+ */
+static void qcow_remove_hash_node_by_sector(BlockDriverState *bs,
+ uint64_t physical_sect)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHashNode *hash_node;
+
+ hash_node = g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+ if (!hash_node) {
+ return;
+ }
+
+ g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
+ g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+}
+
+/* This function store a dedup hash information to disk and RAM
+ *
+ * @dedup_hash: the QCowHashElement to process
+ * @logical_sect: the logical sector of the cluster seen by the guest
+ * @physical_sect: the physical sector of the stored cluster
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_store_dedup_hash(BlockDriverState *bs,
+ QCowHashElement *dedup_hash,
+ uint64_t logical_sect,
+ uint64_t physical_sect)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHashNode *hash_node;
+
+ hash_node = g_tree_lookup(s->dedup_tree_by_hash, &dedup_hash->hash);
+
+ /* no hash node found for this hash */
+ if (!hash_node) {
+ return 0;
+ }
+
+ /* the hash node information are already completed */
+ if (!is_hash_node_empty(hash_node)) {
+ return 0;
+ }
+
+ /* Remember that this QCowHashNoderepresent the first occurence of the
+ * cluste so we will be able to clear QCOW_OFLAG_COPIED from the L2 table
+ * entry when refcount will go > 1.
+ */
+ logical_sect = logical_sect | QCOW_FLAG_FIRST;
+
+ /* remove stale hash node pointing to this physical sector from the trees */
+ qcow_remove_hash_node_by_sector(bs, physical_sect);
+
+ /* fill the missing fields of the hash node */
+ hash_node->physical_sect = physical_sect;
+ hash_node->first_logical_sect = logical_sect;
+
+ /* insert the hash node in the second tree: it's already in the first one */
+ g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+
+ /* write the hash to disk */
+ return qcow2_dedup_read_write_hash(bs,
+ &dedup_hash->hash,
+ &logical_sect,
+ physical_sect,
+ true);
+}
+
+/* This function store the hashes of the clusters which are not duplicated
+ *
+ * @ds: The deduplication state
+ * @count: the number of dedup hash to process
+ * @logical_sect: logical offset of the first cluster (in sectors)
+ * @physical_sect: offset of the first cluster (in sectors)
+ * @ret: 0 on succes, errno on error
+ */
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+ QCowDedupState *ds,
+ int count,
+ uint64_t logical_sect,
+ uint64_t physical_sect)
+{
+ int ret = 0;
+ int i = 0;
+ BDRVQcowState *s = bs->opaque;
+ QCowHashElement *dedup_hash, *next_dedup_hash;
+
+
+ QTAILQ_FOREACH_SAFE(dedup_hash, &ds->undedupables, next, next_dedup_hash) {
+
+ ret = qcow2_store_dedup_hash(bs,
+ dedup_hash,
+ logical_sect + i * s->cluster_sectors,
+ physical_sect + i * s->cluster_sectors);
+
+ QTAILQ_REMOVE(&ds->undedupables, dedup_hash, next);
+ g_free(dedup_hash);
+
+ if (ret < 0) {
+ break;
+ }
+
+ i++;
+
+ if (i == count) {
+ break;
+ }
+ }
+
+ return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index a61e004..2b23dc3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -456,5 +456,10 @@ int qcow2_dedup(BlockDriverState *bs,
uint64_t sector_num,
uint8_t *data,
int data_nr);
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+ QCowDedupState *ds,
+ int count,
+ uint64_t logical_sect,
+ uint64_t physical_sect);
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (6 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
` (23 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Add detection of libgnutls used to compute SHA256 hashes
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 13 ++++++++++++-
configure | 22 ++++++++++++++++++++++
2 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 2a444f5..0914267 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -25,6 +25,8 @@
* THE SOFTWARE.
*/
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
#include "block/block_int.h"
#include "qemu-common.h"
#include "qcow2.h"
@@ -157,7 +159,16 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
QCowHash *hash,
uint8_t *data)
{
- return 0;
+ BDRVQcowState *s = bs->opaque;
+ switch (s->dedup_hash_algo) {
+ case QCOW_HASH_SHA256:
+ return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
+ s->cluster_size, hash->data);
+ default:
+ error_report("Invalid deduplication hash algorithm %i",
+ s->dedup_hash_algo);
+ abort();
+ }
}
/*
diff --git a/configure b/configure
index 99c1ec3..390326e 100755
--- a/configure
+++ b/configure
@@ -1724,6 +1724,28 @@ EOF
fi
##########################################
+# QCOW Deduplication gnutls detection
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$qcow_tls_cflags" "$qcow_tls_libs" ; then
+ qcow_tls=yes
+ libs_softmmu="$qcow_tls_libs $libs_softmmu"
+ libs_tools="$qcow_tls_libs $libs_softmmu"
+ QEMU_CFLAGS="$QEMU_CFLAGS $qcow_tls_cflags"
+else
+ echo "gnutls > 2.10.0 required to compile QEMU"
+ exit 1
+fi
+
+##########################################
# VNC SASL detection
if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
cat > $TMPC <<EOF
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (7 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
` (22 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-cluster.c | 102 +++++++++++++++++++++++++++++++------------------
block/qcow2-dedup.c | 3 +-
block/qcow2.h | 6 +++
3 files changed, 71 insertions(+), 40 deletions(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 63a7241..dbcb6d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -29,44 +29,48 @@
#include "block/qcow2.h"
#include "trace.h"
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+ uint64_t **table, uint64_t *table_offset,
+ int *table_size, qcow2_save_table save_table,
+ const char *table_name)
{
BDRVQcowState *s = bs->opaque;
- int new_l1_size, new_l1_size2, ret, i;
- uint64_t *new_l1_table;
- int64_t new_l1_table_offset;
- uint8_t data[12];
+ int new_size, new_size2, ret, i;
+ uint64_t *new_table;
+ int64_t new_table_offset;
- if (min_size <= s->l1_size)
+ if (min_size <= *table_size) {
return 0;
+ }
if (exact_size) {
- new_l1_size = min_size;
+ new_size = min_size;
} else {
/* Bump size up to reduce the number of times we have to grow */
- new_l1_size = s->l1_size;
- if (new_l1_size == 0) {
- new_l1_size = 1;
+ new_size = *table_size;
+ if (new_size == 0) {
+ new_size = 1;
}
- while (min_size > new_l1_size) {
- new_l1_size = (new_l1_size * 3 + 1) / 2;
+ while (min_size > new_size) {
+ new_size = (new_size * 3 + 1) / 2;
}
}
#ifdef DEBUG_ALLOC2
- fprintf(stderr, "grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
+ fprintf(stderr, "grow %s_table from %d to %d\n",
+ table_name, *table_size, new_size);
#endif
- new_l1_size2 = sizeof(uint64_t) * new_l1_size;
- new_l1_table = g_malloc0(align_offset(new_l1_size2, 512));
- memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+ new_size2 = sizeof(uint64_t) * new_size;
+ new_table = g_malloc0(align_offset(new_size2, 512));
+ memcpy(new_table, *table, *table_size * sizeof(uint64_t));
/* write new table (align to cluster) */
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
- new_l1_table_offset = qcow2_alloc_clusters(bs, new_l1_size2);
- if (new_l1_table_offset < 0) {
- g_free(new_l1_table);
- return new_l1_table_offset;
+ new_table_offset = qcow2_alloc_clusters(bs, new_size2);
+ if (new_table_offset < 0) {
+ g_free(new_table);
+ return new_table_offset;
}
ret = qcow2_cache_flush(bs, s->refcount_block_cache);
@@ -75,34 +79,56 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
}
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
- for(i = 0; i < s->l1_size; i++)
- new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
- ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset, new_l1_table, new_l1_size2);
+ for (i = 0; i < *table_size; i++) {
+ new_table[i] = cpu_to_be64(new_table[i]);
+ }
+ ret = bdrv_pwrite_sync(bs->file, new_table_offset, new_table, new_size2);
if (ret < 0)
goto fail;
- for(i = 0; i < s->l1_size; i++)
- new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
+ for (i = 0; i < *table_size; i++) {
+ new_table[i] = be64_to_cpu(new_table[i]);
+ }
+
+ g_free(*table);
+ qcow2_free_clusters(bs, *table_offset, *table_size * sizeof(uint64_t));
+ *table_offset = new_table_offset;
+ *table = new_table;
+ *table_size = new_size;
/* set new table */
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
- cpu_to_be32w((uint32_t*)data, new_l1_size);
- cpu_to_be64wu((uint64_t*)(data + 4), new_l1_table_offset);
- ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size), data,sizeof(data));
- if (ret < 0) {
- goto fail;
- }
- g_free(s->l1_table);
- qcow2_free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
- s->l1_table_offset = new_l1_table_offset;
- s->l1_table = new_l1_table;
- s->l1_size = new_l1_size;
+ save_table(bs, *table_offset, *table_size);
+
return 0;
fail:
- g_free(new_l1_table);
- qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2);
+ g_free(new_table);
+ qcow2_free_clusters(bs, new_table_offset, new_size2);
return ret;
}
+static int qcow2_l1_save_table(BlockDriverState *bs,
+ int64_t table_offset, int size)
+{
+ uint8_t data[12];
+ cpu_to_be32w((uint32_t *)data, size);
+ cpu_to_be64wu((uint64_t *)(data + 4), table_offset);
+ return bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+ data, sizeof(data));
+}
+
+int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+{
+ BDRVQcowState *s = bs->opaque;
+ return qcow2_do_grow_table(bs,
+ min_size,
+ exact_size,
+ &s->l1_table,
+ &s->l1_table_offset,
+ &s->l1_size,
+ qcow2_l1_save_table,
+ "l1");
+}
+
/*
* l2_load
*
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 0914267..7adaaba 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -575,7 +575,6 @@ exit:
return deduped_clusters_nr * s->cluster_sectors - begining_index;
}
-
/* Create a deduplication table hash block, write it's offset to disk and
* reference it in the RAM deduplication table
*
@@ -592,7 +591,7 @@ static uint64_t qcow2_create_block(BlockDriverState *bs,
uint64_t data64;
int ret = 0;
- /* allocate a new dedup table hash block */
+ /* allocate a new dedup table cluster */
offset = qcow2_alloc_clusters(bs, s->hash_block_size);
if (offset < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 2b23dc3..afa730e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -397,6 +397,12 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
int64_t offset, int64_t length, int addend);
/* qcow2-cluster.c functions */
+typedef int (*qcow2_save_table)(BlockDriverState *bs,
+ int64_t table_offset, int size);
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+ uint64_t **table, uint64_t *table_offset,
+ int *table_size, qcow2_save_table save_table,
+ const char *table_name);
int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
void qcow2_l2_cache_reset(BlockDriverState *bs);
int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (8 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup Benoît Canet
` (21 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 7adaaba..b998a2d 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -38,6 +38,44 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
bool write);
/*
+ * Save the dedup table information into the header extensions
+ *
+ * @table_offset: the dedup table offset in the QCOW2 file
+ * @size: the size of the dedup table
+ * @ret: 0 on success, -errno on error
+ */
+static int qcow2_dedup_save_table_info(BlockDriverState *bs,
+ int64_t table_offset, int size)
+{
+ BDRVQcowState *s = bs->opaque;
+ s->dedup_table_offset = table_offset;
+ s->dedup_table_size = size;
+ return qcow2_update_header(bs);
+}
+
+/*
+ * Grow the deduplication table
+ *
+ * @min_size: minimal size
+ * @exact_size: if true force to grow to the exact size
+ * @ret: 0 on success, -errno on error
+ */
+static int qcow2_dedup_grow_table(BlockDriverState *bs,
+ int min_size,
+ bool exact_size)
+{
+ BDRVQcowState *s = bs->opaque;
+ return qcow2_do_grow_table(bs,
+ min_size,
+ exact_size,
+ &s->dedup_table,
+ &s->dedup_table_offset,
+ &s->dedup_table_size,
+ qcow2_dedup_save_table_info,
+ "dedup");
+}
+
+/*
* Prepare a buffer containing all the required data required to compute cluster
* sized deduplication hashes.
* If sector_num or nb_sectors are not cluster-aligned, missing data
@@ -712,7 +750,11 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
index_in_dedup_table = cluster_number / nb_hash_in_block;
if (s->dedup_table_size <= index_in_dedup_table) {
- return -ENOSPC;
+ ret = qcow2_dedup_grow_table(bs, index_in_dedup_table + 1, false);
+ }
+
+ if (ret < 0) {
+ return ret;
}
/* if we must read and there is nothing to read return a null hash */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (9 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension Benoît Canet
` (20 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.h | 1 +
2 files changed, 69 insertions(+)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index b998a2d..4c391e5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -918,3 +918,71 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
return ret;
}
+
+static void qcow2_dedup_insert_hash_and_preserve_newer(BlockDriverState *bs,
+ QCowHashNode *hash_node)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHashNode *newer_hash_node;
+
+ newer_hash_node = g_tree_lookup(s->dedup_tree_by_sect,
+ &hash_node->physical_sect);
+
+ if (!newer_hash_node) {
+ g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+ g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect,
+ hash_node);
+ } else {
+ g_free(hash_node);
+ }
+}
+
+/*
+ * This coroutine load the deduplication hashes in the tree
+ *
+ * @data: the given BlockDriverState
+ * @ret: NULL
+ */
+void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
+{
+ BlockDriverState *bs = opaque;
+ BDRVQcowState *s = bs->opaque;
+ int ret;
+ QCowHash hash, null_hash;
+ uint64_t max_clusters, i;
+ uint64_t first_logical_sect;
+ int nb_hash_in_hash_block = s->hash_block_size / (HASH_LENGTH + 8);
+ QCowHashNode *hash_node;
+
+ /* prepare the null hash */
+ memset(&null_hash, 0, sizeof(null_hash));
+
+ max_clusters = s->dedup_table_size * nb_hash_in_hash_block;
+
+ for (i = 0; i < max_clusters; i++) {
+ /* get the hash */
+ qemu_co_mutex_lock(&s->lock);
+ ret = qcow2_dedup_read_write_hash(bs, &hash,
+ &first_logical_sect,
+ i * s->cluster_sectors,
+ false);
+
+ if (ret < 0) {
+ qemu_co_mutex_unlock(&s->lock);
+ error_report("Failed to load deduplication hash.");
+ continue;
+ }
+
+ /* if the hash is null don't load it */
+ if (!memcmp(hash.data, null_hash.data, HASH_LENGTH)) {
+ qemu_co_mutex_unlock(&s->lock);
+ continue;
+ }
+
+ hash_node = qcow2_dedup_build_qcow_hash_node(&hash,
+ i * s->cluster_sectors,
+ first_logical_sect);
+ qcow2_dedup_insert_hash_and_preserve_newer(bs, hash_node);
+ qemu_co_mutex_unlock(&s->lock);
+ }
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index afa730e..5cbfc82 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -467,5 +467,6 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
int count,
uint64_t logical_sect,
uint64_t physical_sect);
+void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque);
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (10 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-05 0:02 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init Benoît Canet
` (19 subsequent siblings)
31 siblings, 1 reply; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/block/qcow2.c b/block/qcow2.c
index 410d3c1..9a7177b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -53,9 +53,16 @@ typedef struct {
uint32_t len;
} QCowExtension;
+typedef struct {
+ uint64_t offset;
+ int32_t size;
+ uint8_t hash_algo;
+} QCowDedupTableExtension;
+
#define QCOW2_EXT_MAGIC_END 0
#define QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
#define QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define QCOW2_EXT_MAGIC_DEDUP_TABLE 0xCD8E819B
static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
{
@@ -83,6 +90,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
QCowExtension ext;
uint64_t offset;
int ret;
+ QCowDedupTableExtension dedup_table_extension;
#ifdef DEBUG_EXT
printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
@@ -147,6 +155,19 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
}
break;
+ case QCOW2_EXT_MAGIC_DEDUP_TABLE:
+ ret = bdrv_pread(bs->file, offset,
+ &dedup_table_extension, ext.len);
+ if (ret < 0) {
+ return ret;
+ }
+ s->dedup_table_offset =
+ be64_to_cpu(dedup_table_extension.offset);
+ s->dedup_table_size =
+ be32_to_cpu(dedup_table_extension.size);
+ s->dedup_hash_algo = dedup_table_extension.hash_algo;
+ break;
+
default:
/* unknown magic - save it in case we need to rewrite the header */
{
@@ -958,6 +979,7 @@ int qcow2_update_header(BlockDriverState *bs)
uint32_t refcount_table_clusters;
size_t header_length;
Qcow2UnknownHeaderExtension *uext;
+ QCowDedupTableExtension dedup_table_extension;
buf = qemu_blockalign(bs, buflen);
@@ -1061,6 +1083,22 @@ int qcow2_update_header(BlockDriverState *bs)
buf += ret;
buflen -= ret;
+ if (s->has_dedup) {
+ dedup_table_extension.offset = cpu_to_be64(s->dedup_table_offset);
+ dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
+ dedup_table_extension.hash_algo = s->dedup_hash_algo;
+ ret = header_ext_add(buf,
+ QCOW2_EXT_MAGIC_DEDUP_TABLE,
+ &dedup_table_extension,
+ sizeof(dedup_table_extension),
+ buflen);
+ if (ret < 0) {
+ goto fail;
+ }
+ buf += ret;
+ buflen -= ret;
+ }
+
/* Keep unknown header extensions */
QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (11 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation Benoît Canet
` (18 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 43 ++++++++++++++++++++++++++++++-------------
block/qcow2.h | 5 +++++
2 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e014b0e..75c2bde 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,27 +31,44 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
/*********************************************************/
/* refcount handling */
-int qcow2_refcount_init(BlockDriverState *bs)
+int qcow2_do_table_init(BlockDriverState *bs,
+ uint64_t **table,
+ int64_t offset,
+ int size,
+ bool is_refcount)
{
- BDRVQcowState *s = bs->opaque;
- int ret, refcount_table_size2, i;
-
- refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
- s->refcount_table = g_malloc(refcount_table_size2);
- if (s->refcount_table_size > 0) {
- BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
- ret = bdrv_pread(bs->file, s->refcount_table_offset,
- s->refcount_table, refcount_table_size2);
- if (ret != refcount_table_size2)
+ int ret, size2, i;
+
+ size2 = size * sizeof(uint64_t);
+ *table = g_malloc(size2);
+ if (size > 0) {
+ if (is_refcount) {
+ BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
+ }
+ ret = bdrv_pread(bs->file, offset,
+ *table, size2);
+ if (ret != size2) {
goto fail;
- for(i = 0; i < s->refcount_table_size; i++)
- be64_to_cpus(&s->refcount_table[i]);
+ }
+ for (i = 0; i < size; i++) {
+ be64_to_cpus(&(*table)[i]);
+ }
}
return 0;
fail:
return -ENOMEM;
}
+int qcow2_refcount_init(BlockDriverState *bs)
+{
+ BDRVQcowState *s = bs->opaque;
+ return qcow2_do_table_init(bs,
+ &s->refcount_table,
+ s->refcount_table_offset,
+ s->refcount_table_size,
+ true);
+}
+
void qcow2_refcount_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 5cbfc82..9add0f1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -376,6 +376,11 @@ int qcow2_read_cluster_data(BlockDriverState *bs,
int nb_sectors);
/* qcow2-refcount.c functions */
+int qcow2_do_table_init(BlockDriverState *bs,
+ uint64_t **table,
+ int64_t offset,
+ int size,
+ bool is_refcount);
int qcow2_refcount_init(BlockDriverState *bs);
void qcow2_refcount_close(BlockDriverState *bs);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (12 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
` (17 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-cache.c | 12 +++++++-----
block/qcow2.c | 5 +++--
block/qcow2.h | 3 ++-
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2f3114e..83f2814 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -40,20 +40,22 @@ struct Qcow2Cache {
struct Qcow2Cache* depends;
int size;
bool depends_on_flush;
+ int table_size;
};
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+ int table_size)
{
- BDRVQcowState *s = bs->opaque;
Qcow2Cache *c;
int i;
c = g_malloc0(sizeof(*c));
c->size = num_tables;
c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+ c->table_size = table_size;
for (i = 0; i < c->size; i++) {
- c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
+ c->entries[i].table = qemu_blockalign(bs, c->table_size);
}
return c;
@@ -121,7 +123,7 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
}
ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
- s->cluster_size);
+ c->table_size);
if (ret < 0) {
return ret;
}
@@ -253,7 +255,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
}
- ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+ ret = bdrv_pread(bs->file, offset, c->entries[i].table, c->table_size);
if (ret < 0) {
return ret;
}
diff --git a/block/qcow2.c b/block/qcow2.c
index 9a7177b..499e939 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -450,8 +450,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
}
/* alloc L2 table/refcount block cache */
- s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
- s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+ s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+ s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
+ s->cluster_size);
s->cluster_cache = g_malloc(s->cluster_size);
/* one more sector for decompressed data alignment */
diff --git a/block/qcow2.h b/block/qcow2.h
index 9add0f1..4932750 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -440,7 +440,8 @@ void qcow2_free_snapshots(BlockDriverState *bs);
int qcow2_read_snapshots(BlockDriverState *bs);
/* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+ int table_size);
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (13 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
` (16 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 16 ++++++++++++++++
block/qcow2.h | 2 ++
2 files changed, 18 insertions(+)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4c391e5..12a2dad 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -986,3 +986,19 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
qemu_co_mutex_unlock(&s->lock);
}
}
+
+int qcow2_dedup_init(BlockDriverState *bs)
+{
+ BDRVQcowState *s = bs->opaque;
+ return qcow2_do_table_init(bs,
+ &s->dedup_table,
+ s->dedup_table_offset,
+ s->dedup_table_size,
+ false);
+}
+
+void qcow2_dedup_close(BlockDriverState *bs)
+{
+ BDRVQcowState *s = bs->opaque;
+ g_free(s->dedup_table);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 4932750..43586f2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -474,5 +474,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
uint64_t logical_sect,
uint64_t physical_sect);
void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque);
+int qcow2_dedup_init(BlockDriverState *bs);
+void qcow2_dedup_close(BlockDriverState *bs);
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (14 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option Benoît Canet
` (15 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 49 ++++++++++++++++++++++++++++++-------------------
block/qcow2.h | 4 ++--
2 files changed, 32 insertions(+), 21 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 499e939..ad399c8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -236,61 +236,72 @@ static void report_unsupported_feature(BlockDriverState *bs,
}
/*
- * Sets the dirty bit and flushes afterwards if necessary.
+ * Sets the an incompatible feature bit and flushes afterwards if necessary.
*
* The incompatible_features bit is only set if the image file header was
* updated successfully. Therefore it is not required to check the return
* value of this function.
*/
-int qcow2_mark_dirty(BlockDriverState *bs)
+static int qcow2_add_feature(BlockDriverState *bs,
+ QCow2IncompatibleFeature feature)
{
BDRVQcowState *s = bs->opaque;
uint64_t val;
- int ret;
+ int ret = 0;
assert(s->qcow_version >= 3);
- if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
- return 0; /* already dirty */
+ if (s->incompatible_features & feature) {
+ return 0; /* already added */
}
- val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
+ val = cpu_to_be64(s->incompatible_features | feature);
ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
&val, sizeof(val));
if (ret < 0) {
return ret;
}
- ret = bdrv_flush(bs->file);
- if (ret < 0) {
- return ret;
- }
- /* Only treat image as dirty if the header was updated successfully */
- s->incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+ /* Only treat image as having the feature if the header was updated
+ * successfully
+ */
+ s->incompatible_features |= feature;
return 0;
}
+int qcow2_mark_dirty(BlockDriverState *bs)
+{
+ return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
/*
- * Clears the dirty bit and flushes before if necessary. Only call this
- * function when there are no pending requests, it does not guard against
- * concurrent requests dirtying the image.
+ * Clears an incompatible feature bit and flushes before if necessary.
+ * Only call this function when there are no pending requests, it does not
+ * guard against concurrent requests adding a feature to the image.
*/
-static int qcow2_mark_clean(BlockDriverState *bs)
+static int qcow2_remove_feature(BlockDriverState *bs,
+ QCow2IncompatibleFeature feature)
{
BDRVQcowState *s = bs->opaque;
+ int ret = 0;
- if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
- int ret = bdrv_flush(bs);
+ if (s->incompatible_features & feature) {
+ ret = bdrv_flush(bs);
if (ret < 0) {
return ret;
}
- s->incompatible_features &= ~QCOW2_INCOMPAT_DIRTY;
+ s->incompatible_features &= ~feature;
return qcow2_update_header(bs);
}
return 0;
}
+static int qcow2_mark_clean(BlockDriverState *bs)
+{
+ return qcow2_remove_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
BdrvCheckMode fix)
{
diff --git a/block/qcow2.h b/block/qcow2.h
index 43586f2..7813c4c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -159,14 +159,14 @@ enum {
};
/* Incompatible feature bits */
-enum {
+typedef enum {
QCOW2_INCOMPAT_DIRTY_BITNR = 0,
QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
QCOW2_INCOMPAT_DEDUP_BITNR = 1,
QCOW2_INCOMPAT_DEDUP = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
-};
+} QCow2IncompatibleFeature;
/* Compatible feature bits */
enum {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (15 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16 Benoît Canet
` (14 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 113 +++++++++++++++++++++++++++++++++++++++------
block/qcow2.h | 2 +
include/block/block_int.h | 1 +
3 files changed, 103 insertions(+), 13 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index ad399c8..9130638 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -274,6 +274,11 @@ int qcow2_mark_dirty(BlockDriverState *bs)
return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
}
+static int qcow2_activate_dedup(BlockDriverState *bs)
+{
+ return qcow2_add_feature(bs, QCOW2_INCOMPAT_DEDUP);
+}
+
/*
* Clears an incompatible feature bit and flushes before if necessary.
* Only call this function when there are no pending requests, it does not
@@ -905,6 +910,11 @@ static void qcow2_close(BlockDriverState *bs)
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
+ if (s->has_dedup) {
+ qcow2_cache_flush(bs, s->dedup_cluster_cache);
+ qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+ }
+
qcow2_cache_flush(bs, s->l2_table_cache);
qcow2_cache_flush(bs, s->refcount_block_cache);
@@ -1261,7 +1271,8 @@ static int preallocate(BlockDriverState *bs)
static int qcow2_create2(const char *filename, int64_t total_size,
const char *backing_file, const char *backing_format,
int flags, size_t cluster_size, int prealloc,
- QEMUOptionParameter *options, int version)
+ QEMUOptionParameter *options, int version,
+ bool dedup, uint8_t hash_algo)
{
/* Calculate cluster_bits */
int cluster_bits;
@@ -1288,8 +1299,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* size for any qcow2 image.
*/
BlockDriverState* bs;
+ BDRVQcowState *s;
QCowHeader header;
- uint8_t* refcount_table;
+ uint8_t *tables;
+ int size;
int ret;
ret = bdrv_create_file(filename, options);
@@ -1331,10 +1344,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
goto out;
}
- /* Write an empty refcount table */
- refcount_table = g_malloc0(cluster_size);
- ret = bdrv_pwrite(bs, cluster_size, refcount_table, cluster_size);
- g_free(refcount_table);
+ /* Write an empty refcount table + extra space for dedup table if needed */
+ size = dedup ? 2 : 1;
+ tables = g_malloc0(size * cluster_size);
+ ret = bdrv_pwrite(bs, cluster_size, tables, size * cluster_size);
+ g_free(tables);
if (ret < 0) {
goto out;
@@ -1345,7 +1359,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/*
* And now open the image and make it consistent first (i.e. increase the
* refcount of the cluster that is occupied by the header and the refcount
- * table)
+ * table and the eventual dedup table)
*/
BlockDriver* drv = bdrv_find_format("qcow2");
assert(drv != NULL);
@@ -1355,7 +1369,8 @@ static int qcow2_create2(const char *filename, int64_t total_size,
goto out;
}
- ret = qcow2_alloc_clusters(bs, 2 * cluster_size);
+ size++; /* Add a cluster for the header */
+ ret = qcow2_alloc_clusters(bs, size * cluster_size);
if (ret < 0) {
goto out;
@@ -1365,11 +1380,33 @@ static int qcow2_create2(const char *filename, int64_t total_size,
}
/* Okay, now that we have a valid image, let's give it the right size */
+ s = bs->opaque;
ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
if (ret < 0) {
goto out;
}
+ if (dedup) {
+ s->has_dedup = true;
+ s->dedup_table_offset = cluster_size * 2;
+ s->dedup_table_size = cluster_size / sizeof(uint64_t);
+ s->dedup_hash_algo = hash_algo;
+
+ ret = qcow2_activate_dedup(bs);
+ if (ret < 0) {
+ goto out;
+ }
+
+ ret = qcow2_update_header(bs);
+ if (ret < 0) {
+ goto out;
+ }
+
+ /* minimal init */
+ s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+ s->hash_block_size);
+ }
+
/* Want a backing file? There you go.*/
if (backing_file) {
ret = bdrv_change_backing_file(bs, backing_file, backing_format);
@@ -1395,15 +1432,41 @@ out:
return ret;
}
+static int qcow2_warn_if_version_3_is_needed(int version,
+ bool has_feature,
+ const char *feature)
+{
+ if (version < 3 && has_feature) {
+ fprintf(stderr, "%s only supported with compatibility "
+ "level 1.1 and above (use compat=1.1 or greater)\n",
+ feature);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int8_t qcow2_get_dedup_hash_algo(char *value)
+{
+ if (!strcmp(value, "sha256")) {
+ return QCOW_HASH_SHA256;
+ }
+
+ error_printf("Unsupported deduplication hash algorithm.\n");
+ return -EINVAL;
+}
+
static int qcow2_create(const char *filename, QEMUOptionParameter *options)
{
const char *backing_file = NULL;
const char *backing_fmt = NULL;
uint64_t sectors = 0;
int flags = 0;
+ int ret;
size_t cluster_size = DEFAULT_CLUSTER_SIZE;
int prealloc = 0;
int version = 2;
+ bool dedup = false;
+ int8_t hash_algo = 0;
/* Read out options */
while (options && options->name) {
@@ -1441,24 +1504,43 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
}
} else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
+ } else if (!strcmp(options->name, BLOCK_OPT_DEDUP) &&
+ options->value.s) {
+ hash_algo = qcow2_get_dedup_hash_algo(options->value.s);
+ if (hash_algo < 0) {
+ return hash_algo;
+ }
+ dedup = true;
}
options++;
}
+ if (dedup) {
+ cluster_size = 4096;
+ }
+
if (backing_file && prealloc) {
fprintf(stderr, "Backing file and preallocation cannot be used at "
"the same time\n");
return -EINVAL;
}
- if (version < 3 && (flags & BLOCK_FLAG_LAZY_REFCOUNTS)) {
- fprintf(stderr, "Lazy refcounts only supported with compatibility "
- "level 1.1 and above (use compat=1.1 or greater)\n");
- return -EINVAL;
+ ret = qcow2_warn_if_version_3_is_needed(version,
+ flags & BLOCK_FLAG_LAZY_REFCOUNTS,
+ "Lazy refcounts");
+ if (ret < 0) {
+ return ret;
+ }
+ ret = qcow2_warn_if_version_3_is_needed(version,
+ dedup,
+ "Deduplication");
+ if (ret < 0) {
+ return ret;
}
return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
- cluster_size, prealloc, options, version);
+ cluster_size, prealloc, options, version,
+ dedup, hash_algo);
}
static int qcow2_make_empty(BlockDriverState *bs)
@@ -1761,6 +1843,11 @@ static QEMUOptionParameter qcow2_create_options[] = {
.type = OPT_FLAG,
.help = "Postpone refcount updates",
},
+ {
+ .name = BLOCK_OPT_DEDUP,
+ .type = OPT_STRING,
+ .help = "Deduplication",
+ },
{ NULL }
};
diff --git a/block/qcow2.h b/block/qcow2.h
index 7813c4c..63353d9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -56,6 +56,8 @@
/* Must be at least 4 to cover all cases of refcount table growth */
#define REFCOUNT_CACHE_SIZE 4
+#define DEDUP_CACHE_SIZE 4
+
#define DEFAULT_CLUSTER_SIZE 65536
#define HASH_LENGTH 32
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f83ffb8..b7ed3e6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -55,6 +55,7 @@
#define BLOCK_OPT_SUBFMT "subformat"
#define BLOCK_OPT_COMPAT_LEVEL "compat"
#define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
+#define BLOCK_OPT_DEDUP "dedup"
typedef struct BdrvTrackedRequest BdrvTrackedRequest;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (16 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
` (13 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
When refcount reach zero we destroy the hash on disk and remove it from GTree.
When refcount is at it's maximum value we mark the hash so it won't be loaded
at next startup and remove it from GTree.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 79 +++++++++++++++++++++++++++++++++++++++++++++---
block/qcow2-refcount.c | 6 ++++
block/qcow2.h | 6 ++++
3 files changed, 87 insertions(+), 4 deletions(-)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 12a2dad..28001c6 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -804,11 +804,19 @@ static inline bool is_hash_node_empty(QCowHashNode *hash_node)
return hash_node->physical_sect & QCOW_FLAG_EMPTY;
}
+static void qcow2_remove_hash_node(BlockDriverState *bs,
+ QCowHashNode *hash_node)
+{
+ BDRVQcowState *s = bs->opaque;
+ g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
+ g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+}
+
/* This function removes a hash_node from the trees given a physical sector
*
* @physical_sect: The physical sector of the cluster corresponding to the hash
*/
-static void qcow_remove_hash_node_by_sector(BlockDriverState *bs,
+static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
uint64_t physical_sect)
{
BDRVQcowState *s = bs->opaque;
@@ -820,8 +828,7 @@ static void qcow_remove_hash_node_by_sector(BlockDriverState *bs,
return;
}
- g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
- g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+ qcow2_remove_hash_node(bs, hash_node);
}
/* This function store a dedup hash information to disk and RAM
@@ -858,7 +865,7 @@ static int qcow2_store_dedup_hash(BlockDriverState *bs,
logical_sect = logical_sect | QCOW_FLAG_FIRST;
/* remove stale hash node pointing to this physical sector from the trees */
- qcow_remove_hash_node_by_sector(bs, physical_sect);
+ qcow2_remove_hash_node_by_sector(bs, physical_sect);
/* fill the missing fields of the hash node */
hash_node->physical_sect = physical_sect;
@@ -979,6 +986,12 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
continue;
}
+ /* if this cluster has reached max refcount don't load it */
+ if (first_logical_sect & QCOW_FLAG_MAX_REFCOUNT) {
+ qemu_co_mutex_unlock(&s->lock);
+ continue;
+ }
+
hash_node = qcow2_dedup_build_qcow_hash_node(&hash,
i * s->cluster_sectors,
first_logical_sect);
@@ -1002,3 +1015,61 @@ void qcow2_dedup_close(BlockDriverState *bs)
BDRVQcowState *s = bs->opaque;
g_free(s->dedup_table);
}
+
+/* Clean the last reference to a given cluster when it's refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+ uint64_t cluster_index)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHash null_hash;
+ uint64_t logical_sect = 0;
+ uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+ /* prepare null hash */
+ memset(&null_hash, 0, sizeof(null_hash));
+
+ /* clear from disk */
+ qcow2_dedup_read_write_hash(bs,
+ &null_hash,
+ &logical_sect,
+ physical_sect,
+ true);
+
+ /* remove from ram if present so we won't dedup with it anymore */
+ qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
+/* Force to use a new physical cluster and QCowHashNode when the refcount limit
+ * of 2^16 is about to break.
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_max_reached(BlockDriverState *bs,
+ uint64_t cluster_index)
+{
+ BDRVQcowState *s = bs->opaque;
+ QCowHashNode *hash_node;
+ uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+ hash_node = g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+ if (!hash_node) {
+ return;
+ }
+
+ /* mark this hash so we won't load it anymore at startup after writing it */
+ hash_node->first_logical_sect |= QCOW_FLAG_MAX_REFCOUNT;
+
+ /* write to disk */
+ qcow2_dedup_read_write_hash(bs,
+ &hash_node->hash,
+ &hash_node->first_logical_sect,
+ hash_node->physical_sect,
+ true);
+
+ /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
+ qcow2_remove_hash_node(bs, hash_node);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 75c2bde..aef280d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -489,6 +489,12 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
ret = -EINVAL;
goto fail;
}
+ if (s->has_dedup && refcount == 0) {
+ qcow2_dedup_refcount_zero_reached(bs, cluster_index);
+ }
+ if (s->has_dedup && refcount == 0xffff) {
+ qcow2_dedup_refcount_max_reached(bs, cluster_index);
+ }
if (refcount == 0 && cluster_index < s->free_cluster_index) {
s->free_cluster_index = cluster_index;
}
diff --git a/block/qcow2.h b/block/qcow2.h
index 63353d9..f5576be 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -61,6 +61,8 @@
#define DEFAULT_CLUSTER_SIZE 65536
#define HASH_LENGTH 32
+/* indicate that this cluster refcount has reached its maximum value */
+#define QCOW_FLAG_MAX_REFCOUNT (1LL << 61)
/* indicate that the hash structure is empty and miss offset */
#define QCOW_FLAG_EMPTY (1LL << 62)
/* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
@@ -478,5 +480,9 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque);
int qcow2_dedup_init(BlockDriverState *bs);
void qcow2_dedup_close(BlockDriverState *bs);
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+ uint64_t cluster_index);
+void qcow2_dedup_refcount_max_reached(BlockDriverState *bs,
+ uint64_t cluster_index);
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (17 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16 Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated Benoît Canet
` (12 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 83 insertions(+), 2 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 9130638..54c8847 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -328,6 +328,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
QCowHeader header;
uint64_t ext_end;
+ s->has_dedup = false;
ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
if (ret < 0) {
goto fail;
@@ -790,13 +791,17 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
BDRVQcowState *s = bs->opaque;
int index_in_cluster;
int n_end;
- int ret;
+ int ret = 0;
int cur_nr_sectors; /* number of sectors in current iteration */
uint64_t cluster_offset;
QEMUIOVector hd_qiov;
uint64_t bytes_done = 0;
uint8_t *cluster_data = NULL;
QCowL2Meta *l2meta;
+ uint8_t *dedup_cluster_data = NULL;
+ int dedup_cluster_data_nr;
+ int deduped_sectors_nr;
+ QCowDedupState ds;
trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
remaining_sectors);
@@ -807,13 +812,69 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
qemu_co_mutex_lock(&s->lock);
+ if (s->has_dedup) {
+ QTAILQ_INIT(&ds.undedupables);
+ ds.phash.reuse = false;
+ ds.nb_undedupable_sectors = 0;
+ ds.nb_clusters_processed = 0;
+
+ /* if deduplication is on we make sure dedup_cluster_data
+ * contains a multiple of cluster size of data in order
+ * to compute the hashes
+ */
+ ret = qcow2_dedup_read_missing_and_concatenate(bs,
+ qiov,
+ sector_num,
+ remaining_sectors,
+ &dedup_cluster_data,
+ &dedup_cluster_data_nr);
+
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
while (remaining_sectors != 0) {
l2meta = NULL;
trace_qcow2_writev_start_part(qemu_coroutine_self());
+
+ if (s->has_dedup && ds.nb_undedupable_sectors == 0) {
+ /* Try to deduplicate as much clusters as possible */
+ deduped_sectors_nr = qcow2_dedup(bs,
+ &ds,
+ sector_num,
+ dedup_cluster_data,
+ dedup_cluster_data_nr);
+
+ if (deduped_sectors_nr < 0) {
+ goto fail;
+ }
+
+ remaining_sectors -= deduped_sectors_nr;
+ sector_num += deduped_sectors_nr;
+ bytes_done += deduped_sectors_nr * 512;
+
+ /* no more data to write -> exit */
+ if (remaining_sectors <= 0) {
+ goto fail;
+ }
+
+ /* if we deduped something trace it */
+ if (deduped_sectors_nr) {
+ trace_qcow2_writev_done_part(qemu_coroutine_self(),
+ deduped_sectors_nr);
+ trace_qcow2_writev_start_part(qemu_coroutine_self());
+ }
+ }
+
index_in_cluster = sector_num & (s->cluster_sectors - 1);
- n_end = index_in_cluster + remaining_sectors;
+ n_end = s->has_dedup &&
+ ds.nb_undedupable_sectors < remaining_sectors ?
+ index_in_cluster + ds.nb_undedupable_sectors :
+ index_in_cluster + remaining_sectors;
+
if (s->crypt_method &&
n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) {
n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
@@ -849,6 +910,24 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
cur_nr_sectors * 512);
}
+ /* Write the non duplicated clusters hashes to disk */
+ if (s->has_dedup) {
+ int count = cur_nr_sectors / s->cluster_sectors;
+ int has_ending = ((cluster_offset >> 9) + index_in_cluster +
+ cur_nr_sectors) & (s->cluster_sectors - 1);
+ count = index_in_cluster ? count + 1 : count;
+ count = has_ending ? count + 1 : count;
+ ret = qcow2_dedup_store_new_hashes(bs,
+ &ds,
+ count,
+ sector_num,
+ (cluster_offset >> 9));
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
qemu_co_mutex_unlock(&s->lock);
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
trace_qcow2_writev_data(qemu_coroutine_self(),
@@ -880,6 +959,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
l2meta = NULL;
}
+ ds.nb_undedupable_sectors -= cur_nr_sectors;
remaining_sectors -= cur_nr_sectors;
sector_num += cur_nr_sectors;
bytes_done += cur_nr_sectors * 512;
@@ -900,6 +980,7 @@ fail:
qemu_iovec_destroy(&hd_qiov);
qemu_vfree(cluster_data);
+ qemu_vfree(dedup_cluster_data);
trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
return ret;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (18 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table Benoît Canet
` (11 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
This fix the sub cluster sized writes race conditions while waiting
for a more faster solution.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 9 +++++++++
block/qcow2.h | 1 +
2 files changed, 10 insertions(+)
diff --git a/block/qcow2.c b/block/qcow2.c
index 54c8847..13f6a5c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -521,6 +521,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
/* Initialise locks */
qemu_co_mutex_init(&s->lock);
+ qemu_co_mutex_init(&s->dedup_lock);
/* Repair image if dirty */
if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -810,6 +811,10 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
s->cluster_cache_offset = -1; /* disable compressed cache */
+ if (s->has_dedup) {
+ qemu_co_mutex_lock(&s->dedup_lock);
+ }
+
qemu_co_mutex_lock(&s->lock);
if (s->has_dedup) {
@@ -978,6 +983,10 @@ fail:
g_free(l2meta);
}
+ if (s->has_dedup) {
+ qemu_co_mutex_unlock(&s->dedup_lock);
+ }
+
qemu_iovec_destroy(&hd_qiov);
qemu_vfree(cluster_data);
qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index f5576be..fd31f4f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -224,6 +224,7 @@ typedef struct BDRVQcowState {
GTree *dedup_tree_by_hash;
GTree *dedup_tree_by_sect;
CoMutex lock;
+ CoMutex dedup_lock;
uint32_t crypt_method; /* current crypt method, 0 if no key yet */
uint32_t crypt_method_header;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (19 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
` (10 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index aef280d..7e6d02f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1156,6 +1156,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
goto fail;
}
+ if (s->has_dedup) {
+ ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+ s->dedup_table_offset, s->dedup_table_size, 0);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
/* snapshots */
for(i = 0; i < s->nb_snapshots; i++) {
sn = s->snapshots + i;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (20 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 23/30] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
` (9 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7e6d02f..9aef608 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1001,7 +1001,14 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
PRIx64 ": %s\n", l2_entry, strerror(-refcount));
goto fail;
}
- if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+ if (!s->has_dedup &&
+ (refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+ fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
+ PRIx64 " refcount=%d\n", l2_entry, refcount);
+ res->corruptions++;
+ }
+ if (s->has_dedup && refcount > 1 &&
+ ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
PRIx64 " refcount=%d\n", l2_entry, refcount);
res->corruptions++;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 23/30] qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (21 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
` (8 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 65 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 9 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9aef608..0c6e75a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1045,6 +1045,43 @@ fail:
return -EIO;
}
+static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
+ int64_t l2_offset)
+{
+ BDRVQcowState *s = bs->opaque;
+ uint64_t *l2_table;
+ int i, l2_size;
+
+ /* Read L2 table from disk */
+ l2_size = s->cluster_size;
+ l2_table = g_malloc(l2_size);
+
+ if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
+ goto fail;
+ }
+
+ /* Do the actual checks */
+ for (i = 0; i < (s->l2_size - 5); i += 5) {
+ uint64_t first_logical_offset = be64_to_cpu(l2_table[i + 4]) &
+ ~QCOW_FLAG_FIRST;
+ if (first_logical_offset > (bs->total_sectors * BDRV_SECTOR_SIZE)) {
+ fprintf(stderr, "ERROR: l2 deduplication first_logical_offset"
+ "=%" PRIi64 " outside of deduplicated volume in l2 table "
+ "with offset %" PRIi64 ".\n", first_logical_offset,
+ l2_offset);
+ res->corruptions++;
+ }
+ }
+
+ g_free(l2_table);
+ return 0;
+
+fail:
+ fprintf(stderr, "ERROR: I/O error in check_dedup_l2\n");
+ g_free(l2_table);
+ return -EIO;
+}
+
/*
* Increases the refcount for the L1 table, its L2 tables and all referenced
* clusters in the given refcount table. While doing so, performs some checks
@@ -1058,7 +1095,8 @@ static int check_refcounts_l1(BlockDriverState *bs,
uint16_t *refcount_table,
int refcount_table_size,
int64_t l1_table_offset, int l1_size,
- int check_copied)
+ int check_copied,
+ bool dedup)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l1_table, l2_offset, l1_size2;
@@ -1114,11 +1152,19 @@ static int check_refcounts_l1(BlockDriverState *bs,
res->corruptions++;
}
- /* Process and check L2 entries */
- ret = check_refcounts_l2(bs, res, refcount_table,
- refcount_table_size, l2_offset, check_copied);
- if (ret < 0) {
- goto fail;
+ if (dedup) {
+ /* Process and check dedup l2 entries */
+ ret = check_dedup_l2(bs, res, l2_offset);
+ if (ret < 0) {
+ goto fail;
+ }
+ } else {
+ /* Process and check L2 entries */
+ ret = check_refcounts_l2(bs, res, refcount_table,
+ refcount_table_size, l2_offset, check_copied);
+ if (ret < 0) {
+ goto fail;
+ }
}
}
}
@@ -1158,14 +1204,15 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
/* current L1 table */
ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
- s->l1_table_offset, s->l1_size, 1);
+ s->l1_table_offset, s->l1_size, 1, false);
if (ret < 0) {
goto fail;
}
if (s->has_dedup) {
ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
- s->dedup_table_offset, s->dedup_table_size, 0);
+ s->dedup_table_offset, s->dedup_table_size,
+ 0, true);
if (ret < 0) {
goto fail;
}
@@ -1175,7 +1222,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
for(i = 0; i < s->nb_snapshots; i++) {
sn = s->snapshots + i;
ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
- sn->l1_table_offset, sn->l1_size, 0);
+ sn->l1_table_offset, sn->l1_size, 0, false);
if (ret < 0) {
goto fail;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (22 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 23/30] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
` (7 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
In the case of a race condition between two writes a l2 entry can be written
without QCOW_OFLAG_COPIED before the first write fill it.
This patch simply check if the l2 entry has the correct offset without
QCOW_OFLAG_COPIED and do nothing.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-cluster.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dbcb6d2..07037a0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -709,6 +709,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
for (i = 0; i < m->nb_clusters; i++) {
+ if (be64_to_cpu(l2_table[l2_index + i]) ==
+ (cluster_offset + (i << s->cluster_bits))) {
+ continue;
+ }
/* if two concurrent writes happen to the same unallocated cluster
* each write allocates separate cluster and writes data concurrently.
* The first one to complete updates l2 table with pointer to its
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (23 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
` (6 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 14 ++++++++++++++
block/qcow2.c | 5 +++++
configure | 33 +++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 28001c6..bd8397e 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -30,6 +30,9 @@
#include "block/block_int.h"
#include "qemu-common.h"
#include "qcow2.h"
+#ifdef CONFIG_SKEIN_DEDUP
+#include <skeinApi.h>
+#endif
static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
QCowHash *hash,
@@ -202,6 +205,17 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
case QCOW_HASH_SHA256:
return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
s->cluster_size, hash->data);
+#if defined(CONFIG_SKEIN_DEDUP)
+ case QCOW_HASH_SKEIN:
+ {
+ SkeinCtx_t ctx;
+ skeinCtxPrepare(&ctx, Skein256);
+ skeinInit(&ctx, Skein256);
+ skeinUpdate(&ctx, data, s->cluster_size);
+ skeinFinal(&ctx, hash->data);
+ }
+ return 0;
+#endif
default:
error_report("Invalid deduplication hash algorithm %i",
s->dedup_hash_algo);
diff --git a/block/qcow2.c b/block/qcow2.c
index 13f6a5c..0154d50 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1540,6 +1540,11 @@ static int8_t qcow2_get_dedup_hash_algo(char *value)
if (!strcmp(value, "sha256")) {
return QCOW_HASH_SHA256;
}
+#if defined(CONFIG_SKEIN_DEDUP)
+ if (!strcmp(value, "skein")) {
+ return QCOW_HASH_SKEIN;
+ }
+#endif
error_printf("Unsupported deduplication hash algorithm.\n");
return -EINVAL;
diff --git a/configure b/configure
index 390326e..97497af 100755
--- a/configure
+++ b/configure
@@ -223,6 +223,7 @@ libiscsi=""
coroutine=""
seccomp=""
glusterfs=""
+skein_dedup="no"
# parse CC options first
for opt do
@@ -882,6 +883,8 @@ for opt do
;;
--enable-glusterfs) glusterfs="yes"
;;
+ --enable-skein-dedup) skein_dedup="yes"
+ ;;
*) echo "ERROR: unknown option $opt"; show_help="yes"
;;
esac
@@ -1130,6 +1133,7 @@ echo " --with-coroutine=BACKEND coroutine backend. Supported options:"
echo " gthread, ucontext, sigaltstack, windows"
echo " --enable-glusterfs enable GlusterFS backend"
echo " --disable-glusterfs disable GlusterFS backend"
+echo " --enable-skein-dedup enable computing dedup hashes with SKEIN"
echo ""
echo "NOTE: The object files are built at the place where configure is launched"
exit 1
@@ -2412,6 +2416,30 @@ EOF
fi
fi
+##########################################
+# SKEIN dedup hash function probe
+if test "$skein_dedup" != "no" ; then
+ cat > $TMPC <<EOF
+#include <skeinApi.h>
+int main(void) {
+ SkeinCtx_t ctx;
+ skeinCtxPrepare(&ctx, 512);
+ return 0;
+}
+EOF
+ skein_libs="-lskein3fish"
+ if compile_prog "" "$skein_libs" ; then
+ skein_dedup=yes
+ libs_tools="$skein_libs $libs_tools"
+ libs_softmmu="$skein_libs $libs_softmmu"
+ else
+ if test "$skein_dedup" = "yes" ; then
+ feature_not_found "libskein3fish not found"
+ fi
+ skein_dedup=no
+ fi
+fi
+
#
# Check for xxxat() functions when we are building linux-user
# emulator. This is done because older glibc versions don't
@@ -3296,6 +3324,7 @@ echo "build guest agent $guest_agent"
echo "seccomp support $seccomp"
echo "coroutine backend $coroutine_backend"
echo "GlusterFS support $glusterfs"
+echo "SKEIN support $skein_dedup"
if test "$sdl_too_old" = "yes"; then
echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3637,6 +3666,10 @@ if test "$glusterfs" = "yes" ; then
echo "CONFIG_GLUSTERFS=y" >> $config_host_mak
fi
+if test "$skein_dedup" = "yes" ; then
+ echo "CONFIG_SKEIN_DEDUP=y" >> $config_host_mak
+fi
+
# USB host support
case "$usb" in
linux)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (24 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication Benoît Canet
` (5 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/qcow2.c b/block/qcow2.c
index 0154d50..f66e67d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1606,6 +1606,7 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
return hash_algo;
}
dedup = true;
+ flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
}
options++;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (25 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB Benoît Canet
` (4 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-cluster.c | 2 +-
block/qcow2-refcount.c | 22 +++++++++++++++-------
block/qcow2.c | 8 ++++++--
3 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 07037a0..d69af17 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -236,7 +236,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
goto fail;
}
- memcpy(l2_table, old_table, s->cluster_size);
+ memcpy(l2_table, old_table, s->l2_size << 3);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
if (ret < 0) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0c6e75a..092546d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -535,12 +535,15 @@ fail:
*/
static int update_cluster_refcount(BlockDriverState *bs,
int64_t cluster_index,
- int addend)
+ int addend,
+ bool is_l2)
{
BDRVQcowState *s = bs->opaque;
int ret;
- ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend);
+ int size = is_l2 ? s->l2_size << 3 : 1;
+
+ ret = update_refcount(bs, cluster_index << s->cluster_bits, size, addend);
if (ret < 0) {
return ret;
}
@@ -664,7 +667,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
if (free_in_cluster == 0)
s->free_byte_offset = 0;
if ((offset & (s->cluster_size - 1)) != 0)
- update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+ update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
} else {
offset = qcow2_alloc_clusters(bs, s->cluster_size);
if (offset < 0) {
@@ -674,7 +677,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
if ((cluster_offset + s->cluster_size) == offset) {
/* we are lucky: contiguous data */
offset = s->free_byte_offset;
- update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+ update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
s->free_byte_offset += size;
} else {
s->free_byte_offset = offset;
@@ -815,7 +818,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
} else {
uint64_t cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
if (addend != 0) {
- refcount = update_cluster_refcount(bs, cluster_index, addend);
+ refcount = update_cluster_refcount(bs,
+ cluster_index,
+ addend,
+ false);
} else {
refcount = get_refcount(bs, cluster_index);
}
@@ -847,7 +853,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
if (addend != 0) {
- refcount = update_cluster_refcount(bs, l2_offset >> s->cluster_bits, addend);
+ refcount = update_cluster_refcount(bs,
+ l2_offset >> s->cluster_bits,
+ addend, true);
} else {
refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
}
@@ -1143,7 +1151,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
/* Mark L2 table as used */
l2_offset &= L1E_OFFSET_MASK;
inc_refcounts(bs, res, refcount_table, refcount_table_size,
- l2_offset, s->cluster_size);
+ l2_offset, s->l2_size << 3);
/* L2 tables are cluster aligned */
if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index f66e67d..16038db 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -430,7 +430,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
s->cluster_bits = header.cluster_bits;
s->cluster_size = 1 << s->cluster_bits;
s->cluster_sectors = 1 << (s->cluster_bits - 9);
- s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+ if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+ s->l2_bits = 16 - 3; /* 64 KB L2 */
+ } else {
+ s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+ }
s->l2_size = 1 << s->l2_bits;
bs->total_sectors = header.size / 512;
s->csize_shift = (62 - (s->cluster_bits - 8));
@@ -467,7 +471,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
}
/* alloc L2 table/refcount block cache */
- s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+ s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->l2_size << 3);
s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
s->cluster_size);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (26 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication Benoît Canet
` (3 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-refcount.c | 4 ++--
block/qcow2.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 092546d..3f3efd8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1061,7 +1061,7 @@ static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
int i, l2_size;
/* Read L2 table from disk */
- l2_size = s->cluster_size;
+ l2_size = s->hash_block_size;
l2_table = g_malloc(l2_size);
if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
@@ -1151,7 +1151,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
/* Mark L2 table as used */
l2_offset &= L1E_OFFSET_MASK;
inc_refcounts(bs, res, refcount_table, refcount_table_size,
- l2_offset, s->l2_size << 3);
+ l2_offset, dedup ? s->hash_block_size : s->l2_size << 3);
/* L2 tables are cluster aligned */
if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 16038db..f1e0f5f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -432,6 +432,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
s->cluster_sectors = 1 << (s->cluster_bits - 9);
if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
s->l2_bits = 16 - 3; /* 64 KB L2 */
+ s->hash_block_size = DEFAULT_CLUSTER_SIZE;
} else {
s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (27 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
` (2 subsequent siblings)
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qcow2-dedup.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++----
block/qcow2.c | 17 ++++++++---
2 files changed, 86 insertions(+), 9 deletions(-)
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index bd8397e..da1a668 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1014,20 +1014,88 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
}
}
+static gint qcow2_dedup_compare_by_hash(gconstpointer a,
+ gconstpointer b,
+ gpointer data)
+{
+ QCowHash *hash_a = (QCowHash *) a;
+ QCowHash *hash_b = (QCowHash *) b;
+ return memcmp(hash_a->data, hash_b->data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_destroy_qcow_hash_node(gpointer p)
+{
+ QCowHashNode *hash_node = (QCowHashNode *) p;
+ g_free(hash_node);
+}
+
+static gint qcow2_dedup_compare_by_offset(gconstpointer a,
+ gconstpointer b,
+ gpointer data)
+{
+ uint64_t offset_a = *((uint64_t *) a);
+ uint64_t offset_b = *((uint64_t *) b);
+
+ if (offset_a > offset_b) {
+ return 1;
+ }
+ if (offset_a < offset_b) {
+ return -1;
+ }
+ return 0;
+}
+
int qcow2_dedup_init(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
- return qcow2_do_table_init(bs,
- &s->dedup_table,
- s->dedup_table_offset,
- s->dedup_table_size,
- false);
+ Coroutine *co;
+ int ret;
+
+ s->has_dedup = true;
+
+ ret = qcow2_do_table_init(bs,
+ &s->dedup_table,
+ s->dedup_table_offset,
+ s->dedup_table_size,
+ false);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* if we are read-only we don't deduplicate anything */
+ if (bs->read_only) {
+ return 0;
+ }
+
+ s->dedup_tree_by_hash = g_tree_new_full(qcow2_dedup_compare_by_hash, NULL,
+ NULL,
+ qcow2_dedup_destroy_qcow_hash_node);
+ s->dedup_tree_by_sect = g_tree_new_full(qcow2_dedup_compare_by_offset,
+ NULL, NULL, NULL);
+
+ s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+ s->hash_block_size);
+
+ /* load asynchronously the hashes */
+ co = qemu_coroutine_create(qcow2_co_load_dedup_hashes);
+ qemu_coroutine_enter(co, bs);
+ return 0;
}
void qcow2_dedup_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
g_free(s->dedup_table);
+
+ if (bs->read_only) {
+ return;
+ }
+
+ qcow2_cache_flush(bs, s->dedup_cluster_cache);
+ qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+ g_tree_destroy(s->dedup_tree_by_sect);
+ g_tree_destroy(s->dedup_tree_by_hash);
}
/* Clean the last reference to a given cluster when it's refcount is zero
diff --git a/block/qcow2.c b/block/qcow2.c
index f1e0f5f..d534077 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -539,6 +539,13 @@ static int qcow2_open(BlockDriverState *bs, int flags)
}
}
+ if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+ ret = qcow2_dedup_init(bs);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
#ifdef DEBUG_ALLOC
{
BdrvCheckResult result = {0};
@@ -1003,11 +1010,11 @@ fail:
static void qcow2_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
+
g_free(s->l1_table);
if (s->has_dedup) {
- qcow2_cache_flush(bs, s->dedup_cluster_cache);
- qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+ qcow2_dedup_close(bs);
}
qcow2_cache_flush(bs, s->l2_table_cache);
@@ -1498,8 +1505,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
}
/* minimal init */
- s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
- s->hash_block_size);
+ ret = qcow2_dedup_init(bs);
+ if (ret < 0) {
+ goto out;
+ }
}
/* Want a backing file? There you go.*/
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (28 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication Benoît Canet
@ 2013-01-02 16:16 ` Benoît Canet
2013-01-02 16:42 ` Eric Blake
2013-01-02 17:10 ` [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Troy Benjegerdes
2013-01-03 17:18 ` Benoît Canet
31 siblings, 1 reply; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
tests/qemu-iotests/common.rc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index aef5f52..72e746d 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,7 +124,8 @@ _make_test_img()
-e "s# compat='[^']*'##g" \
-e "s# compat6=\\(on\\|off\\)##g" \
-e "s# static=\\(on\\|off\\)##g" \
- -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+ -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+ -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
# Start an NBD server on the image file, which is what we'll be talking to
if [ $IMGPROTO = "nbd" ]; then
--
1.7.10.4
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
@ 2013-01-02 16:42 ` Eric Blake
2013-01-02 16:50 ` Benoît Canet
0 siblings, 1 reply; 53+ messages in thread
From: Eric Blake @ 2013-01-02 16:42 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]
On 01/02/2013 09:16 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> tests/qemu-iotests/common.rc | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> index aef5f52..72e746d 100644
> --- a/tests/qemu-iotests/common.rc
> +++ b/tests/qemu-iotests/common.rc
> @@ -124,7 +124,8 @@ _make_test_img()
> -e "s# compat='[^']*'##g" \
> -e "s# compat6=\\(on\\|off\\)##g" \
> -e "s# static=\\(on\\|off\\)##g" \
> - -e "s# lazy_refcounts=\\(on\\|off\\)##g"
> + -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
> + -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
Shouldn't this patch be hoisted earlier into the series, or even
squashed in with the patch that introduced the temporary test failures?
That is, you want 'git bisect' to pass on every patch in the series,
rather than introducing problems in one patch that only get cleaned up
in a later patch.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.
2013-01-02 16:42 ` Eric Blake
@ 2013-01-02 16:50 ` Benoît Canet
0 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 16:50 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, stefanha
Ack.
There is more than one patch to move.
I'll do in for the next RFC.
Regards
Benoît
Le Wednesday 02 Jan 2013 à 09:42:06 (-0700), Eric Blake a écrit :
> On 01/02/2013 09:16 AM, Benoît Canet wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> > tests/qemu-iotests/common.rc | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> > index aef5f52..72e746d 100644
> > --- a/tests/qemu-iotests/common.rc
> > +++ b/tests/qemu-iotests/common.rc
> > @@ -124,7 +124,8 @@ _make_test_img()
> > -e "s# compat='[^']*'##g" \
> > -e "s# compat6=\\(on\\|off\\)##g" \
> > -e "s# static=\\(on\\|off\\)##g" \
> > - -e "s# lazy_refcounts=\\(on\\|off\\)##g"
> > + -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
> > + -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
>
> Shouldn't this patch be hoisted earlier into the series, or even
> squashed in with the patch that introduced the temporary test failures?
> That is, you want 'git bisect' to pass on every patch in the series,
> rather than introducing problems in one patch that only get cleaned up
> in a later patch.
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (29 preceding siblings ...)
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
@ 2013-01-02 17:10 ` Troy Benjegerdes
2013-01-02 17:33 ` Benoît Canet
2013-01-03 17:18 ` Benoît Canet
31 siblings, 1 reply; 53+ messages in thread
From: Troy Benjegerdes @ 2013-01-02 17:10 UTC (permalink / raw)
To: Beno??t Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Wed, Jan 02, 2013 at 05:16:03PM +0100, Beno??t Canet wrote:
> This patchset is a cleanup of the previous QCOW2 deduplication rfc.
>
> One can compile and install https://github.com/wernerd/Skein3Fish and use the
> --enable-skein-dedup configure option in order to use the faster skein HASH.
>
> Images must be created with "-o dedup=[skein|sha256]" in order to activate the
> deduplication in the image.
>
> Deduplication is now fast enough to be usable.
How does this code handle hash collisions, and do you have some regression
tests that purposefully create a dedup hash collision, and verify that the
'right thing' happens?
The next question is .. what's the right thing?
It's great that this almost works, but it seems rather dangerous to put
something like this into the mainline code without some regression tests.
(I'm also suspecting the regression test will be a great way to find
flakey hardware)
--------------------------------------------------------------------------
Troy Benjegerdes 'da hozer' hozer@hozed.org
Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software & hardware (http://q3u.be) stuff and not get a real job.
Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 17:10 ` [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Troy Benjegerdes
@ 2013-01-02 17:33 ` Benoît Canet
2013-01-02 18:01 ` Eric Blake
2013-01-02 18:26 ` Troy Benjegerdes
0 siblings, 2 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 17:33 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: kwolf, pbonzini, qemu-devel, stefanha
> How does this code handle hash collisions, and do you have some regression
> tests that purposefully create a dedup hash collision, and verify that the
> 'right thing' happens?
The two hash function that can be used are cryptographics and not broken yet.
So nobody knows how to generate a collision.
You can do the math to calculate the probability of collision using a 256 bit
hash while processing 1EiB of data the result is so low you can consider it
won't happen.
The sha256 ZFS deduplication works the same way regarding collisions.
I currently use qemu-io-test for testing purpose and iozone with the -w flag in
the guest.
I would like to find a good deduplication stress test to run in a guest.
Regards
Benoît
> It's great that this almost works, but it seems rather dangerous to put
> something like this into the mainline code without some regression tests.
>
> (I'm also suspecting the regression test will be a great way to find
> flakey hardware)
>
> --------------------------------------------------------------------------
> Troy Benjegerdes 'da hozer' hozer@hozed.org
>
> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> software & hardware (http://q3u.be) stuff and not get a real job.
> Charles Shultz had the best answer:
>
> "Why do musicians compose symphonies and poets write poems? They do it
> because life wouldn't have any meaning for them if they didn't. That's why
> I draw cartoons. It's my life." -- Charles Shultz
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 17:33 ` Benoît Canet
@ 2013-01-02 18:01 ` Eric Blake
2013-01-02 18:16 ` Benoît Canet
2013-01-02 18:26 ` Troy Benjegerdes
1 sibling, 1 reply; 53+ messages in thread
From: Eric Blake @ 2013-01-02 18:01 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha, pbonzini
[-- Attachment #1: Type: text/plain, Size: 849 bytes --]
On 01/02/2013 10:33 AM, Benoît Canet wrote:
>> How does this code handle hash collisions, and do you have some regression
>> tests that purposefully create a dedup hash collision, and verify that the
>> 'right thing' happens?
>
> The two hash function that can be used are cryptographics and not broken yet.
> So nobody knows how to generate a collision.
I can understand that it is hard to write a test for two distinct data
sectors hashing to the same value, but perhaps it's worth including a
debug-only hash algorithm that intentionally generates collisions, just
to prove that you handle them correctly. De-duplicating collided data,
while unlikely, is still a case of data loss that not everyone is happy
to risk.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:01 ` Eric Blake
@ 2013-01-02 18:16 ` Benoît Canet
0 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 18:16 UTC (permalink / raw)
To: Eric Blake; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha, pbonzini
I think I can easily add a "verify" option at image creation.
This way the code would read the cluster already on disk and compare it with
the cluster to write.
If there are different it would print some debug message and return -EIO to the
upper layers.
> Le Wednesday 02 Jan 2013 à 11:01:04 (-0700), Eric Blake a écrit :
> On 01/02/2013 10:33 AM, Benoît Canet wrote:
> >> How does this code handle hash collisions, and do you have some regression
> >> tests that purposefully create a dedup hash collision, and verify that the
> >> 'right thing' happens?
> >
> > The two hash function that can be used are cryptographics and not broken yet.
> > So nobody knows how to generate a collision.
>
> I can understand that it is hard to write a test for two distinct data
> sectors hashing to the same value, but perhaps it's worth including a
> debug-only hash algorithm that intentionally generates collisions, just
> to prove that you handle them correctly. De-duplicating collided data,
> while unlikely, is still a case of data loss that not everyone is happy
> to risk.
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 17:33 ` Benoît Canet
2013-01-02 18:01 ` Eric Blake
@ 2013-01-02 18:26 ` Troy Benjegerdes
2013-01-02 18:40 ` Benoît Canet
2013-01-03 12:39 ` Stefan Hajnoczi
1 sibling, 2 replies; 53+ messages in thread
From: Troy Benjegerdes @ 2013-01-02 18:26 UTC (permalink / raw)
To: Beno?t Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
The probability may be 'low' but it is not zero. Just because it's
hard to calculate the hash doesn't mean you can't do it. If your
input data is not random the probability of a hash collision is
going to get scewed.
Read about how Bitcoin uses hashes.
I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
and I can make a regression test that will create deduplication hash
collisions on purpose.
On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
> > How does this code handle hash collisions, and do you have some regression
> > tests that purposefully create a dedup hash collision, and verify that the
> > 'right thing' happens?
>
> The two hash function that can be used are cryptographics and not broken yet.
> So nobody knows how to generate a collision.
>
> You can do the math to calculate the probability of collision using a 256 bit
> hash while processing 1EiB of data the result is so low you can consider it
> won't happen.
> The sha256 ZFS deduplication works the same way regarding collisions.
>
> I currently use qemu-io-test for testing purpose and iozone with the -w flag in
> the guest.
> I would like to find a good deduplication stress test to run in a guest.
>
> Regards
>
> Beno?t
>
> > It's great that this almost works, but it seems rather dangerous to put
> > something like this into the mainline code without some regression tests.
> >
> > (I'm also suspecting the regression test will be a great way to find
> > flakey hardware)
> >
> > --------------------------------------------------------------------------
> > Troy Benjegerdes 'da hozer' hozer@hozed.org
> >
> > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> > software & hardware (http://q3u.be) stuff and not get a real job.
> > Charles Shultz had the best answer:
> >
> > "Why do musicians compose symphonies and poets write poems? They do it
> > because life wouldn't have any meaning for them if they didn't. That's why
> > I draw cartoons. It's my life." -- Charles Shultz
--
--------------------------------------------------------------------------
Troy Benjegerdes 'da hozer' hozer@hozed.org
Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software & hardware (http://q3u.be) stuff and not get a real job.
Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:26 ` Troy Benjegerdes
@ 2013-01-02 18:40 ` Benoît Canet
2013-01-02 18:47 ` ronnie sahlberg
2013-01-03 12:39 ` Stefan Hajnoczi
1 sibling, 1 reply; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 18:40 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Beno?t Canet, kwolf, qemu-devel, stefanha, pbonzini
Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit :
> The probability may be 'low' but it is not zero. Just because it's
> hard to calculate the hash doesn't mean you can't do it. If your
> input data is not random the probability of a hash collision is
> going to get scewed.
>
> Read about how Bitcoin uses hashes.
>
> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
> and I can make a regression test that will create deduplication hash
> collisions on purpose.
It's not a problem as Eric pointed out while reviewing the previous patchset
there is a small place left with zeroes on the deduplication block.
A bit could be set on it when a collision is detected and an offset could point
to a cluster used to resolve collisions.
>
>
> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
> > > How does this code handle hash collisions, and do you have some regression
> > > tests that purposefully create a dedup hash collision, and verify that the
> > > 'right thing' happens?
> >
> > The two hash function that can be used are cryptographics and not broken yet.
> > So nobody knows how to generate a collision.
> >
> > You can do the math to calculate the probability of collision using a 256 bit
> > hash while processing 1EiB of data the result is so low you can consider it
> > won't happen.
> > The sha256 ZFS deduplication works the same way regarding collisions.
> >
> > I currently use qemu-io-test for testing purpose and iozone with the -w flag in
> > the guest.
> > I would like to find a good deduplication stress test to run in a guest.
> >
> > Regards
> >
> > Beno?t
> >
> > > It's great that this almost works, but it seems rather dangerous to put
> > > something like this into the mainline code without some regression tests.
> > >
> > > (I'm also suspecting the regression test will be a great way to find
> > > flakey hardware)
> > >
> > > --------------------------------------------------------------------------
> > > Troy Benjegerdes 'da hozer' hozer@hozed.org
> > >
> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> > > software & hardware (http://q3u.be) stuff and not get a real job.
> > > Charles Shultz had the best answer:
> > >
> > > "Why do musicians compose symphonies and poets write poems? They do it
> > > because life wouldn't have any meaning for them if they didn't. That's why
> > > I draw cartoons. It's my life." -- Charles Shultz
>
> --
> --------------------------------------------------------------------------
> Troy Benjegerdes 'da hozer' hozer@hozed.org
>
> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> software & hardware (http://q3u.be) stuff and not get a real job.
> Charles Shultz had the best answer:
>
> "Why do musicians compose symphonies and poets write poems? They do it
> because life wouldn't have any meaning for them if they didn't. That's why
> I draw cartoons. It's my life." -- Charles Shultz
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:40 ` Benoît Canet
@ 2013-01-02 18:47 ` ronnie sahlberg
2013-01-02 18:55 ` Benoît Canet
2013-01-02 19:18 ` Troy Benjegerdes
0 siblings, 2 replies; 53+ messages in thread
From: ronnie sahlberg @ 2013-01-02 18:47 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha, pbonzini
Do you really need to resolve the conflicts?
It might be easier and sufficient to just flag those hashes where a
conflict has been detected as : "dont dedup this hash anymore,
collissions have been seen."
On Wed, Jan 2, 2013 at 10:40 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit :
>> The probability may be 'low' but it is not zero. Just because it's
>> hard to calculate the hash doesn't mean you can't do it. If your
>> input data is not random the probability of a hash collision is
>> going to get scewed.
>>
>> Read about how Bitcoin uses hashes.
>>
>> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
>> and I can make a regression test that will create deduplication hash
>> collisions on purpose.
>
> It's not a problem as Eric pointed out while reviewing the previous patchset
> there is a small place left with zeroes on the deduplication block.
> A bit could be set on it when a collision is detected and an offset could point
> to a cluster used to resolve collisions.
>
>>
>>
>> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
>> > > How does this code handle hash collisions, and do you have some regression
>> > > tests that purposefully create a dedup hash collision, and verify that the
>> > > 'right thing' happens?
>> >
>> > The two hash function that can be used are cryptographics and not broken yet.
>> > So nobody knows how to generate a collision.
>> >
>> > You can do the math to calculate the probability of collision using a 256 bit
>> > hash while processing 1EiB of data the result is so low you can consider it
>> > won't happen.
>> > The sha256 ZFS deduplication works the same way regarding collisions.
>> >
>> > I currently use qemu-io-test for testing purpose and iozone with the -w flag in
>> > the guest.
>> > I would like to find a good deduplication stress test to run in a guest.
>> >
>> > Regards
>> >
>> > Beno?t
>> >
>> > > It's great that this almost works, but it seems rather dangerous to put
>> > > something like this into the mainline code without some regression tests.
>> > >
>> > > (I'm also suspecting the regression test will be a great way to find
>> > > flakey hardware)
>> > >
>> > > --------------------------------------------------------------------------
>> > > Troy Benjegerdes 'da hozer' hozer@hozed.org
>> > >
>> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
>> > > software & hardware (http://q3u.be) stuff and not get a real job.
>> > > Charles Shultz had the best answer:
>> > >
>> > > "Why do musicians compose symphonies and poets write poems? They do it
>> > > because life wouldn't have any meaning for them if they didn't. That's why
>> > > I draw cartoons. It's my life." -- Charles Shultz
>>
>> --
>> --------------------------------------------------------------------------
>> Troy Benjegerdes 'da hozer' hozer@hozed.org
>>
>> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
>> software & hardware (http://q3u.be) stuff and not get a real job.
>> Charles Shultz had the best answer:
>>
>> "Why do musicians compose symphonies and poets write poems? They do it
>> because life wouldn't have any meaning for them if they didn't. That's why
>> I draw cartoons. It's my life." -- Charles Shultz
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:47 ` ronnie sahlberg
@ 2013-01-02 18:55 ` Benoît Canet
2013-01-02 19:18 ` Troy Benjegerdes
1 sibling, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-02 18:55 UTC (permalink / raw)
To: ronnie sahlberg; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha, pbonzini
Le Wednesday 02 Jan 2013 à 10:47:48 (-0800), ronnie sahlberg a écrit :
> Do you really need to resolve the conflicts?
> It might be easier and sufficient to just flag those hashes where a
> conflict has been detected as : "dont dedup this hash anymore,
> collissions have been seen."
True, that's more elegant.
The user would still need to specify the verify option at creation
and it would require to do a read before verify but it would not make
the qcow2 format uglier.
>
>
> On Wed, Jan 2, 2013 at 10:40 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> > Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit :
> >> The probability may be 'low' but it is not zero. Just because it's
> >> hard to calculate the hash doesn't mean you can't do it. If your
> >> input data is not random the probability of a hash collision is
> >> going to get scewed.
> >>
> >> Read about how Bitcoin uses hashes.
> >>
> >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
> >> and I can make a regression test that will create deduplication hash
> >> collisions on purpose.
> >
> > It's not a problem as Eric pointed out while reviewing the previous patchset
> > there is a small place left with zeroes on the deduplication block.
> > A bit could be set on it when a collision is detected and an offset could point
> > to a cluster used to resolve collisions.
> >
> >>
> >>
> >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
> >> > > How does this code handle hash collisions, and do you have some regression
> >> > > tests that purposefully create a dedup hash collision, and verify that the
> >> > > 'right thing' happens?
> >> >
> >> > The two hash function that can be used are cryptographics and not broken yet.
> >> > So nobody knows how to generate a collision.
> >> >
> >> > You can do the math to calculate the probability of collision using a 256 bit
> >> > hash while processing 1EiB of data the result is so low you can consider it
> >> > won't happen.
> >> > The sha256 ZFS deduplication works the same way regarding collisions.
> >> >
> >> > I currently use qemu-io-test for testing purpose and iozone with the -w flag in
> >> > the guest.
> >> > I would like to find a good deduplication stress test to run in a guest.
> >> >
> >> > Regards
> >> >
> >> > Beno?t
> >> >
> >> > > It's great that this almost works, but it seems rather dangerous to put
> >> > > something like this into the mainline code without some regression tests.
> >> > >
> >> > > (I'm also suspecting the regression test will be a great way to find
> >> > > flakey hardware)
> >> > >
> >> > > --------------------------------------------------------------------------
> >> > > Troy Benjegerdes 'da hozer' hozer@hozed.org
> >> > >
> >> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> > > software & hardware (http://q3u.be) stuff and not get a real job.
> >> > > Charles Shultz had the best answer:
> >> > >
> >> > > "Why do musicians compose symphonies and poets write poems? They do it
> >> > > because life wouldn't have any meaning for them if they didn't. That's why
> >> > > I draw cartoons. It's my life." -- Charles Shultz
> >>
> >> --
> >> --------------------------------------------------------------------------
> >> Troy Benjegerdes 'da hozer' hozer@hozed.org
> >>
> >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> software & hardware (http://q3u.be) stuff and not get a real job.
> >> Charles Shultz had the best answer:
> >>
> >> "Why do musicians compose symphonies and poets write poems? They do it
> >> because life wouldn't have any meaning for them if they didn't. That's why
> >> I draw cartoons. It's my life." -- Charles Shultz
> >>
> >
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:47 ` ronnie sahlberg
2013-01-02 18:55 ` Benoît Canet
@ 2013-01-02 19:18 ` Troy Benjegerdes
2013-01-03 2:16 ` ronnie sahlberg
1 sibling, 1 reply; 53+ messages in thread
From: Troy Benjegerdes @ 2013-01-02 19:18 UTC (permalink / raw)
To: ronnie sahlberg; +Cc: Beno?t Canet, kwolf, qemu-devel, stefanha, pbonzini
If you do get a hash collision, it's a rather exceptional event, so I'd
say every effort should be made to log the event and the data that created
it in multiple places.
There are three questions I'd ask on a hash collision:
1) was it the data?
2) was it the hardware?
3) was it a software bug?
On Wed, Jan 02, 2013 at 10:47:48AM -0800, ronnie sahlberg wrote:
> Do you really need to resolve the conflicts?
> It might be easier and sufficient to just flag those hashes where a
> conflict has been detected as : "dont dedup this hash anymore,
> collissions have been seen."
>
>
> On Wed, Jan 2, 2013 at 10:40 AM, Beno?t Canet <benoit.canet@irqsave.net> wrote:
> > Le Wednesday 02 Jan 2013 ? 12:26:37 (-0600), Troy Benjegerdes a ?crit :
> >> The probability may be 'low' but it is not zero. Just because it's
> >> hard to calculate the hash doesn't mean you can't do it. If your
> >> input data is not random the probability of a hash collision is
> >> going to get scewed.
> >>
> >> Read about how Bitcoin uses hashes.
> >>
> >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
> >> and I can make a regression test that will create deduplication hash
> >> collisions on purpose.
> >
> > It's not a problem as Eric pointed out while reviewing the previous patchset
> > there is a small place left with zeroes on the deduplication block.
> > A bit could be set on it when a collision is detected and an offset could point
> > to a cluster used to resolve collisions.
> >
> >>
> >>
> >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
> >> > > How does this code handle hash collisions, and do you have some regression
> >> > > tests that purposefully create a dedup hash collision, and verify that the
> >> > > 'right thing' happens?
> >> >
> >> > The two hash function that can be used are cryptographics and not broken yet.
> >> > So nobody knows how to generate a collision.
> >> >
> >> > You can do the math to calculate the probability of collision using a 256 bit
> >> > hash while processing 1EiB of data the result is so low you can consider it
> >> > won't happen.
> >> > The sha256 ZFS deduplication works the same way regarding collisions.
> >> >
> >> > I currently use qemu-io-test for testing purpose and iozone with the -w flag in
> >> > the guest.
> >> > I would like to find a good deduplication stress test to run in a guest.
> >> >
> >> > Regards
> >> >
> >> > Beno?t
> >> >
> >> > > It's great that this almost works, but it seems rather dangerous to put
> >> > > something like this into the mainline code without some regression tests.
> >> > >
> >> > > (I'm also suspecting the regression test will be a great way to find
> >> > > flakey hardware)
> >> > >
> >> > > --------------------------------------------------------------------------
> >> > > Troy Benjegerdes 'da hozer' hozer@hozed.org
> >> > >
> >> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> > > software & hardware (http://q3u.be) stuff and not get a real job.
> >> > > Charles Shultz had the best answer:
> >> > >
> >> > > "Why do musicians compose symphonies and poets write poems? They do it
> >> > > because life wouldn't have any meaning for them if they didn't. That's why
> >> > > I draw cartoons. It's my life." -- Charles Shultz
> >>
> >> --
> >> --------------------------------------------------------------------------
> >> Troy Benjegerdes 'da hozer' hozer@hozed.org
> >>
> >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> software & hardware (http://q3u.be) stuff and not get a real job.
> >> Charles Shultz had the best answer:
> >>
> >> "Why do musicians compose symphonies and poets write poems? They do it
> >> because life wouldn't have any meaning for them if they didn't. That's why
> >> I draw cartoons. It's my life." -- Charles Shultz
> >>
> >
--
--------------------------------------------------------------------------
Troy Benjegerdes 'da hozer' hozer@hozed.org
Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software & hardware (http://q3u.be) stuff and not get a real job.
Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 19:18 ` Troy Benjegerdes
@ 2013-01-03 2:16 ` ronnie sahlberg
0 siblings, 0 replies; 53+ messages in thread
From: ronnie sahlberg @ 2013-01-03 2:16 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Beno?t Canet, kwolf, qemu-devel, stefanha, pbonzini
On Wed, Jan 2, 2013 at 11:18 AM, Troy Benjegerdes <hozer@hozed.org> wrote:
> If you do get a hash collision, it's a rather exceptional event, so I'd
> say every effort should be made to log the event and the data that created
> it in multiple places.
>
> There are three questions I'd ask on a hash collision:
>
> 1) was it the data?
> 2) was it the hardware?
> 3) was it a software bug?
Yes, that is probably good too, and saving off the old and new block
content that collided.
Unless you are checksumming the blocks, I suspect that the most common
reason for "collisions" would just be cases where the original block
was corrupted/changed on disk and you dont detect it and then when you
re-write an identical one the blocks no longer match and thus you get
a false collision.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 18:26 ` Troy Benjegerdes
2013-01-02 18:40 ` Benoît Canet
@ 2013-01-03 12:39 ` Stefan Hajnoczi
2013-01-03 19:51 ` Troy Benjegerdes
1 sibling, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-03 12:39 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Beno?t Canet, kwolf, qemu-devel, pbonzini
On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> The probability may be 'low' but it is not zero. Just because it's
> hard to calculate the hash doesn't mean you can't do it. If your
> input data is not random the probability of a hash collision is
> going to get scewed.
The cost of catching hash collisions is an extra read for every write.
It's possible to reduce this with a 2nd hash function and/or caching.
I'm not sure it's worth it given the extremely low probability of a hash
collision.
Venti is an example of an existing system where hash collisions were
ignored because the probability is so low. See 3.1. Choice of Hash
Function section:
http://plan9.bell-labs.com/sys/doc/venti/venti.html
Stefan
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
` (30 preceding siblings ...)
2013-01-02 17:10 ` [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Troy Benjegerdes
@ 2013-01-03 17:18 ` Benoît Canet
31 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-03 17:18 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha
Hello,
I started to write the deduplication metrics code in order to be able
to design asynchronous deduplication.
I am looking for a way to create a metric allowing deduplication to be paused
or resumed on a given threshold.
Does anyone have a sugestion regarding the metric that could be used for this ?
Best regards
Benoît
> Le Wednesday 02 Jan 2013 à 17:16:03 (+0100), Benoît Canet a écrit :
> This patchset is a cleanup of the previous QCOW2 deduplication rfc.
>
> One can compile and install https://github.com/wernerd/Skein3Fish and use the
> --enable-skein-dedup configure option in order to use the faster skein HASH.
>
> Images must be created with "-o dedup=[skein|sha256]" in order to activate the
> deduplication in the image.
>
> Deduplication is now fast enough to be usable.
>
> v4: Fix and complete qcow2 spec [Stefan]
> Hash the hash_algo field in the header extension [Stefan]
> Fix qcow2 spec [Eric]
> Remove pointer to hash and simplify hash memory management [Stefan]
> Rename and move qcow2_read_cluster_data to qcow2.c [Stefan]
> Document lock dropping behaviour of the previous function [Stefan]
> cleanup qcow2_dedup_read_missing_cluster_data [Stefan]
> rename *_offset to *_sect [Stefan]
> add a ./configure check for ssl [Stefan]
> Replace openssl by gnutls [Stefan]
> Implement Skein hashes
> Rewrite pretty every qcow2-dedup.c commits after Add
> qcow2_dedup_read_missing_and_concatenate to simplify the code
> Use 64KB deduplication hash block to reduce allocation flushes
> Use 64KB l2 tables to reduce allocation flushes [breaks compatibility]
> Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultings
> in frequent caches flushes
> Do not create and load dedup RAM structures when bdrs->read_only is true
>
> v3: make it work barely
> replace kernel red black trees by gtree.
>
> *** BLURB HERE ***
>
> Benoît Canet (30):
> qcow2: Add deduplication to the qcow2 specification.
> qcow2: Add deduplication structures and fields.
> qcow2: Add qcow2_dedup_read_missing_and_concatenate
> qcow2: Make update_refcount public.
> qcow2: Create a way to link to l2 tables when deduplicating.
> qcow2: Add qcow2_dedup and related functions
> qcow2: Add qcow2_dedup_store_new_hashes.
> qcow2: Implement qcow2_compute_cluster_hash.
> qcow2: Extract qcow2_dedup_grow_table
> qcow2: Add qcow2_dedup_grow_table and use it.
> qcow2: create function to load deduplication hashes at startup.
> qcow2: Load and save deduplication table header extension.
> qcow2: Extract qcow2_do_table_init.
> qcow2-cache: Allow to choose table size at creation.
> qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
> qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
> block: Add qemu-img dedup create option.
> qcow2: Behave correctly when refcount reach 0 or 2^16.
> qcow2: Integrate deduplication in qcow2_co_writev loop.
> qcow2: Serialize write requests when deduplication is activated.
> qcow2: Add verification of dedup table.
> qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
> qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
> qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
> qcow2: Integrate SKEIN hash algorithm in deduplication.
> qcow2: Add lazy refcounts to deduplication to prevent
> qcow2_cache_set_dependency loops
> qcow2: Use large L2 table for deduplication.
> qcow: Set dedup cluster block size to 64KB.
> qcow2: init and cleanup deduplication.
> qemu-iotests: Filter dedup=on/off so existing tests don't break.
>
> block/Makefile.objs | 1 +
> block/qcow2-cache.c | 12 +-
> block/qcow2-cluster.c | 116 +++--
> block/qcow2-dedup.c | 1157 ++++++++++++++++++++++++++++++++++++++++++
> block/qcow2-refcount.c | 157 ++++--
> block/qcow2.c | 357 +++++++++++--
> block/qcow2.h | 120 ++++-
> configure | 55 ++
> docs/specs/qcow2.txt | 100 +++-
> include/block/block_int.h | 1 +
> tests/qemu-iotests/common.rc | 3 +-
> 11 files changed, 1955 insertions(+), 124 deletions(-)
> create mode 100644 block/qcow2-dedup.c
>
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
@ 2013-01-03 18:18 ` Eric Blake
2013-01-04 14:49 ` Benoît Canet
2013-01-16 14:50 ` Benoît Canet
0 siblings, 2 replies; 53+ messages in thread
From: Eric Blake @ 2013-01-03 18:18 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 5730 bytes --]
On 01/02/2013 09:16 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> docs/specs/qcow2.txt | 100 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 99 insertions(+), 1 deletion(-)
>
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..c9c0d47 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,12 @@ in the description of a field.
> tables to repair refcounts before accessing the
> image.
>
> - Bits 1-63: Reserved (set to 0)
> + Bit 1: Deduplication bit. If this bit is set then
> + deduplication is used on this image.
This part seems fine; and I agree with making this an incompatible
feature (as an older qemu that does not understand dedup would not keep
the dedup table up-to-date).
> + L2 tables size 64KB is different from
> + cluster size 4KB.
Umm, doesn't the cluster_bits (bytes 20-23 of the header) determine the
size of a cluster, rather than assuming a cluster is always 4KB? And
later on, the spec says that "L2 tables are exactly one cluster in
size.", so I'm not sure what this comment is doing here. Or are you
stating that deduplication _also_ has an L2 table, which is fixed in
size (unlike the normal L2 table for actual data)?
> +== Deduplication ==
> +
> +The deduplication extension contains the informations concerning the
s/informations concerning the/information concerning/
> +deduplication.
> +
> + Byte 0 - 7: Offset of the RAM deduplication table
> +
> + 8 - 11: Size of the RAM deduplication table = number of L1 64-bit
> + pointers
> +
> + 12: Hash algo enum field
> + 0: SHA-256
> + 1: SHA3
> + 2: SKEIN-256
> +
> + 13: Dedup stategies bitmap
s/stategies/strategies/
> + 0: RAM based hash lookup
> + 1: Disk based hash lookup
> +
> +Disk based lookup structure will be described in a future QCOW2 specification.
Does that mean that strategy must be 0 for now?
> +
> +== Deduplication table (RAM method) ==
> +
> +The deduplication table maps a physical offset to a data hash and
> +logical offset. It is used to store permanently the informations required to
s/store permanently the informations/permanently store the information/
> +do the deduplication. It is loaded at startup into a RAM based representation
> +used to do the lookups.
> +
> +The deduplication table contains 64-bit offsets to the level 2 deduplication
> +table blocks.
> +Each entry of these blocks contains a 32-byte SHA256 hash followed by the
> +64-bit logical offset of the first encountered cluster having this hash.
> +
> +== Deduplication table schematic (RAM method) ==
> +
> +0 l1_dedup_index Size
> + |
> +|--------------------------------------------------------------------|
> +| | |
> +| | L1 Deduplication table |
> +| | |
> +|--------------------------------------------------------------------|
> + |
> + |
> + |
> +0 | l2_dedup_block_entries
> + |
> +|---------------------------------|
> +| |
> +| L2 deduplication block |
> +| |
> +| l2_dedup_index |
> +|---------------------------------|
> + |
> + 0 | 40
> + |
> + |-------------------------------|
> + | |
> + | Deduplication table entry |
> + | |
> + |-------------------------------|
> +
> +
> +== Deduplication table entry description (RAM method) ==
> +
> +Each L2 deduplication table entry has the following structure:
> +
> + Byte 0 - 31: hash of data cluster
> +
> + 32 - 39: Logical offset of first encountered block having
> + this hash
> +
> +== Deduplication table arithmetics (RAM method) ==
> +
> +Entries in the deduplication table are ordered by physical cluster index.
> +
> +The number of entries in an l2 deduplication table block is :
> +l2_dedup_block_entries = dedup_block_size / (32 + 8)
I'd write this as CEIL(dedup_block_size / (32 + 8)) to make it clear
that it rounds up...
> +
> +The index in the level 1 deduplication table is :
> +l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
> +
> +The index in the level 2 deduplication table is:
> +l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
> +
> +cluster_size = 4096
> +dedup_block_size = 65536
> +l2_size = 65536
> +
> +The 16 remaining bytes in each l2 deduplication blocks are set to zero and
> +reserved for a future usage.
...and move this paragraph closer to the point where you mention the
rounding up in size.
> +
> == Host cluster management ==
>
> qcow2 manages the allocation of host clusters by maintaining a reference count
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-03 12:39 ` Stefan Hajnoczi
@ 2013-01-03 19:51 ` Troy Benjegerdes
2013-01-04 7:09 ` Dietmar Maurer
2013-01-04 9:49 ` Stefan Hajnoczi
0 siblings, 2 replies; 53+ messages in thread
From: Troy Benjegerdes @ 2013-01-03 19:51 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Beno?t Canet, kwolf, qemu-devel, pbonzini
On Thu, Jan 03, 2013 at 01:39:48PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> > The probability may be 'low' but it is not zero. Just because it's
> > hard to calculate the hash doesn't mean you can't do it. If your
> > input data is not random the probability of a hash collision is
> > going to get scewed.
>
> The cost of catching hash collisions is an extra read for every write.
> It's possible to reduce this with a 2nd hash function and/or caching.
>
> I'm not sure it's worth it given the extremely low probability of a hash
> collision.
>
> Venti is an example of an existing system where hash collisions were
> ignored because the probability is so low. See 3.1. Choice of Hash
> Function section:
>
> http://plan9.bell-labs.com/sys/doc/venti/venti.html
If you believe that it's 'extremely low', then please provide either:
* experimental evidence to prove your claim
* an insurance underwriter who will pay-out if data is lost due to
a hash collision.
What I have heard so far is a lot of theoretical posturing and no
experimental evidence.
Please google for "when TCP checksums and CRC disagree" for experimental
evidence of problems assuming that probability is low. This is the
abstract:
"Traces of Internet packets from the past two years show that between 1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on links where link-level CRCs should catch all but 1 in 4 billion errors. For certain situations, the rate of checksum failures can be even higher: in one hour-long test we observed a checksum failure of 1 packet in 400. We investigate why so many errors are observed, when link-level CRCs should catch nearly all of them.We have collected nearly 500,000 packets which failed the TCP or UDP or IP checksum. This dataset shows the Internet has a wide variety of error sources which can not be detected by link-level checks. We describe analysis tools that have identified nearly 100 different error patterns. Categorizing packet errors, we can infer likely causes which explain roughly half the observed errors. The causes span the entire spectrum of a network stack, from memory errors to bugs in TCP.After an analysis we conclude that the checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets. From our analysis of the cause of errors, we propose simple changes to several protocols which will decrease the rate of undetected error. Even so, the highly non-random distribution of errors strongly suggests some applications should employ application-level checksums or equivalents."
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-03 19:51 ` Troy Benjegerdes
@ 2013-01-04 7:09 ` Dietmar Maurer
2013-01-04 9:49 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Dietmar Maurer @ 2013-01-04 7:09 UTC (permalink / raw)
To: Troy Benjegerdes, Stefan Hajnoczi
Cc: Beno?t Canet, kwolf@redhat.com, qemu-devel@nongnu.org,
pbonzini@redhat.com
> > Venti is an example of an existing system where hash collisions were
> > ignored because the probability is so low. See 3.1. Choice of Hash
> > Function section:
> >
> > http://plan9.bell-labs.com/sys/doc/venti/venti.html
>
>
> If you believe that it's 'extremely low', then please provide either:
>
> * experimental evidence to prove your claim
> * an insurance underwriter who will pay-out if data is lost due to a hash
> collision.
>
> What I have heard so far is a lot of theoretical posturing and no experimental
> evidence.
Venti is a well-known system, in use for more than 10 years - isn't that enough experimental evidence?
- Dietmar
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
2013-01-03 19:51 ` Troy Benjegerdes
2013-01-04 7:09 ` Dietmar Maurer
@ 2013-01-04 9:49 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-04 9:49 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Beno?t Canet, kwolf, qemu-devel, pbonzini
On Thu, Jan 03, 2013 at 01:51:02PM -0600, Troy Benjegerdes wrote:
> On Thu, Jan 03, 2013 at 01:39:48PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> > > The probability may be 'low' but it is not zero. Just because it's
> > > hard to calculate the hash doesn't mean you can't do it. If your
> > > input data is not random the probability of a hash collision is
> > > going to get scewed.
> >
> > The cost of catching hash collisions is an extra read for every write.
> > It's possible to reduce this with a 2nd hash function and/or caching.
> >
> > I'm not sure it's worth it given the extremely low probability of a hash
> > collision.
> >
> > Venti is an example of an existing system where hash collisions were
> > ignored because the probability is so low. See 3.1. Choice of Hash
> > Function section:
> >
> > http://plan9.bell-labs.com/sys/doc/venti/venti.html
>
>
> If you believe that it's 'extremely low', then please provide either:
>
> * experimental evidence to prove your claim
> * an insurance underwriter who will pay-out if data is lost due to
> a hash collision.
Read the paper, the point is that if the probability of collision is so
extremely low, then it's not worth worrying about since other effects
are much more likely (i.e. cosmic rays).
The TCP/IP checksums are weak and not comparable to what Benoit is
using.
Stefan
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.
2013-01-03 18:18 ` Eric Blake
@ 2013-01-04 14:49 ` Benoît Canet
2013-01-16 14:50 ` Benoît Canet
1 sibling, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2013-01-04 14:49 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, stefanha
> > + L2 tables size 64KB is different from
> > + cluster size 4KB.
>
> Umm, doesn't the cluster_bits (bytes 20-23 of the header) determine the
> size of a cluster, rather than assuming a cluster is always 4KB? And
> later on, the spec says that "L2 tables are exactly one cluster in
> size.", so I'm not sure what this comment is doing here. Or are you
> stating that deduplication _also_ has an L2 table, which is fixed in
> size (unlike the normal L2 table for actual data)?
As most filesystems (ntfs, ext2/3/4, xfs) use 4KB blocs the deduplication
works very well with 4KB clusters.
The problem with 4KB cluster is that L2 table allocations are done very often
and require a flush to disk which kill performance.
So my patchset break compatibility with regular qcow2 image by using 64KB L2
table which are of a different size than the 4KB cluster.
I'll let the choice for the user to choose the cluster size but will default
to 4KB when creating a deduplicated image.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension.
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-01-05 0:02 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2013-01-05 0:02 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]
On 01/02/2013 09:16 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qcow2.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 410d3c1..9a7177b 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -53,9 +53,16 @@ typedef struct {
> uint32_t len;
> } QCowExtension;
>
> +typedef struct {
> + uint64_t offset;
> + int32_t size;
> + uint8_t hash_algo;
> +} QCowDedupTableExtension;
This struct has a hole at the end (that is, you only specify 13 bytes,
but sizeof(QCowDedupTableExtension) is 16)...
> + if (s->has_dedup) {
> + dedup_table_extension.offset = cpu_to_be64(s->dedup_table_offset);
> + dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
> + dedup_table_extension.hash_algo = s->dedup_hash_algo;
> + ret = header_ext_add(buf,
> + QCOW2_EXT_MAGIC_DEDUP_TABLE,
> + &dedup_table_extension,
> + sizeof(dedup_table_extension),
> + buflen);
...but here you are writing out that hole. It would be better if you
explicitly accounted for all bytes being written, so that the reserved
fields are guaranteed to be 0 instead of random data, so that future
extensions can make use of those reserved bytes.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.
2013-01-03 18:18 ` Eric Blake
2013-01-04 14:49 ` Benoît Canet
@ 2013-01-16 14:50 ` Benoît Canet
2013-01-16 15:58 ` Eric Blake
1 sibling, 1 reply; 53+ messages in thread
From: Benoît Canet @ 2013-01-16 14:50 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, stefanha
> I'd write this as CEIL(dedup_block_size / (32 + 8)) to make it clear
> that it rounds up...
Isn't it FLOOR instead of CEIL ? (off by one error) ?
Benoît
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.
2013-01-16 14:50 ` Benoît Canet
@ 2013-01-16 15:58 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2013-01-16 15:58 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 608 bytes --]
On 01/16/2013 07:50 AM, Benoît Canet wrote:
>> I'd write this as CEIL(dedup_block_size / (32 + 8)) to make it clear
>> that it rounds up...
>
> Isn't it FLOOR instead of CEIL ? (off by one error) ?
Indeed, my reply was a bit too hasty, and I mixed terminology. The
number of entries that fits in a page is determined by rounding down
(floor); the amount of memory consumed by that many entries is
determined by rounding up to page size (ceil). But at least you got
what I meant :)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2013-01-16 15:58 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
2013-01-03 18:18 ` Eric Blake
2013-01-04 14:49 ` Benoît Canet
2013-01-16 14:50 ` Benoît Canet
2013-01-16 15:58 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 02/30] qcow2: Add deduplication structures and fields Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension Benoît Canet
2013-01-05 0:02 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16 Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 23/30] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
2013-01-02 16:42 ` Eric Blake
2013-01-02 16:50 ` Benoît Canet
2013-01-02 17:10 ` [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Troy Benjegerdes
2013-01-02 17:33 ` Benoît Canet
2013-01-02 18:01 ` Eric Blake
2013-01-02 18:16 ` Benoît Canet
2013-01-02 18:26 ` Troy Benjegerdes
2013-01-02 18:40 ` Benoît Canet
2013-01-02 18:47 ` ronnie sahlberg
2013-01-02 18:55 ` Benoît Canet
2013-01-02 19:18 ` Troy Benjegerdes
2013-01-03 2:16 ` ronnie sahlberg
2013-01-03 12:39 ` Stefan Hajnoczi
2013-01-03 19:51 ` Troy Benjegerdes
2013-01-04 7:09 ` Dietmar Maurer
2013-01-04 9:49 ` Stefan Hajnoczi
2013-01-03 17:18 ` Benoît Canet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).