qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication
@ 2013-01-16 15:47 Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Benoît Canet
                   ` (62 more replies)
  0 siblings, 63 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This 3 step patchset implements deduplication in QCOW2.

First patchset create the core infrastructure for deduplication and enable it
in QCOW2 image.
It ends at "qcow2: Enable the deduplication feature."

Second patchset implements some metrics in QMP.
It ends at "qapi: Return virtual block device deduplication metrics in QMP"

Third patchset implements asynchronous deduplication.
It's a work in progress patchset that is included in this post so reviewers
can have a grasp of where the feature is heading.

One can compile and install https://github.com/wernerd/Skein3Fish and use the
--enable-skein-dedup configure option in order to use the faster skein HASH.

Images must be created with "-o dedup=[skein|sha256]" in order to activate the
deduplication in the image.

Deduplication is now fast enough to be usable.
Nice side effect is that duplicated writes are faster than native QCOW2:

v5:
    Move qemu-io-test dedup patch [Eric]
    Reserve some room at the end of the QCOW header extensions. [Eric]
    Fix the specification. [Eric]
    Now overflow deduplication refcount at 2^16/2 [Stefan]
    Implements metrics.
    Implement asynchronous deduplication.
    Increase L2 table size and deduplication block hash size.
    Random cleanups 

v4: Fix and complete qcow2 spec [Stefan]
    Hash the hash_algo field in the header extension [Stefan]
    Fix qcow2 spec [Eric]
    Remove pointer to hash and simplify hash memory management [Stefan]
    Rename and move qcow2_read_cluster_data to qcow2.c [Stefan]
    Document lock dropping behaviour of the previous function [Stefan]
    cleanup qcow2_dedup_read_missing_cluster_data [Stefan]
    rename *_offset to *_sect [Stefan]
    add a ./configure check for ssl [Stefan]
    Replace openssl by gnutls [Stefan]
    Implement Skein hashes
    Rewrite pretty every qcow2-dedup.c commits after Add
       qcow2_dedup_read_missing_and_concatenate to simplify the code
    Use 64KB deduplication hash block to reduce allocation flushes
    Use 64KB l2 tables to reduce allocation flushes [breaks compatibility]
    Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultings
       in frequent caches flushes
    Do not create and load dedup RAM structures when bdrs->read_only is true

v3: make it work barely
    replace kernel red black trees by gtree.

Benoît Canet (62):
  qcow2: Add deduplication to the qcow2 specification.
  qcow2: Add deduplication structures and fields.
  qcow2: Add qcow2_dedup_read_missing_and_concatenate
  qcow2: Make update_refcount public.
  qcow2: Create a way to link to l2 tables when deduplicating.
  qcow2: Add qcow2_dedup and related functions
  qcow2: Add qcow2_dedup_store_new_hashes.
  qcow2: Implement qcow2_compute_cluster_hash.
  qcow2: Extract qcow2_dedup_grow_table
  qcow2: Add qcow2_dedup_grow_table and use it.
  qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate
    clusters.
  qcow2: make the deduplication forget a cluster hash when a cluster is
    to dedupe
  qcow2: Create qcow2_is_cluster_to_dedup.
  qcow2: Load and save deduplication table header extension.
  qcow2: Extract qcow2_do_table_init.
  qcow2-cache: Allow to choose table size at creation.
  qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
  block: Add qemu-img dedup create option.
  qcow2: Add a deduplication boolean to update_refcount.
  qcow2: Drop hash for a given cluster when dedup makes refcount >
    2^16/2.
  qcow2: Remove hash when cluster is deleted.
  qcow2: Add qcow2_dedup_is_running to probe if dedup is running.
  qcow2: Integrate deduplication in qcow2_co_writev loop.
  qcow2: Serialize write requests when deduplication is activated.
  qcow2: Add verification of dedup table.
  qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
  qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
  qcow2: Integrate SKEIN hash algorithm in deduplication.
  qcow2: Add lazy refcounts to deduplication to prevent
    qcow2_cache_set_dependency loops
  qcow2: Use large L2 table for deduplication.
  qcow: Set large dedup hash block size.
  qemu-iotests: Filter dedup=on/off so existing tests don't break.
  qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  qcow2: Add qcow2_co_dedup_resume to restart deduplication.
  qcow2: Enable the deduplication feature.

  qcow2: Add deduplication metrics structures.
  qcow2: Initialize deduplication metrics.
  qcow2: Collect unaligned writes missing data reads metric.
  qcow2: Collect deduplicated cluster metric.
  qcow2: Collect undeduplicated cluster metric.
  qcow2: Count QCowHashNode creation metrics.
  qcow2: Count QCowHashNode removal from tree for metrics.
  qcow2: Count cluster deleted metric
  qcow2: Count deduplication refcount overflow metric.
  qapi: Add support for deduplication infos in qapi-schema.json.
  block: Add deduplication metrics to BlockDriverInfo.
  qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage.
  qcow2: returns deduplication metrics and status via bdrv_get_info()
  qapi: Return virtual block device deduplication metrics in QMP

  block: Add BlockDriver function prototype to pause and resume
    deduplication.
  qcow2: Add code to deduplicate cluster flagged with
    QCOW_OFLAG_TO_DEDUP.
  block: Add bdrv_has_dedup.
  block: Add bdrv_is_dedup_running.
  block: Add bdrv_resume_dedup.
  block: Add bdrv_pause_dedup.
  qcow2: Add qcow2_pause_dedup.
  qcow2: Add qcow2_resume_dedup.
  qcow2: Make dedup status persists.
  qerror: Add QERR_DEVICE_NOT_DEDUPLICATED.
  qmp: Add block-pause-dedup.
  qmp: Add block_resume_dedup.

 block.c                      |  108 +++
 block/Makefile.objs          |    1 +
 block/qcow2-cache.c          |   12 +-
 block/qcow2-cluster.c        |  182 ++++--
 block/qcow2-dedup.c          | 1492 ++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c       |  175 +++--
 block/qcow2.c                |  378 +++++++++--
 block/qcow2.h                |  149 ++++-
 blockdev.c                   |   36 +
 configure                    |   55 ++
 docs/specs/qcow2.txt         |  104 ++-
 include/block/block.h        |   18 +
 include/block/block_int.h    |    5 +
 include/qapi/qmp/qerror.h    |    3 +
 qapi-schema.json             |   76 ++-
 qmp-commands.hx              |   46 ++
 tests/qemu-iotests/common.rc |    3 +-
 17 files changed, 2708 insertions(+), 135 deletions(-)
 create mode 100644 block/qcow2-dedup.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 16:43   ` Eric Blake
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields Benoît Canet
                   ` (61 subsequent siblings)
  62 siblings, 1 reply; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..d5f8072 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+                                L2 tables size 64KB is different from
+                                cluster size 4KB.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +165,100 @@ the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains information concerning deduplication.
+
+    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
+
+          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+                    pointers
+
+              12:   Hash algo enum field
+                        0: SHA-256
+                        1: SHA3
+                        2: SKEIN-256
+
+              13:   Dedup strategies bitmap
+                        0: RAM based hash lookup (always set to 1 for now)
+                        1: Disk based hash lookup
+                        2: Deduplication running if set to 1
+
+        14 - 69:    Set to zero and reserved for future use
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to permanently store the information to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0       l1_dedup_index                                              Size
+              |
+|--------------------------------------------------------------------|
+|             |                                                      |
+|             |        L1 Deduplication table                        |
+|             |                                                      |
+|--------------------------------------------------------------------|
+              |
+              |
+              |
+0             |           l2_dedup_block_entries
+              |
+|---------------------------------|
+|                                 |
+|    L2 deduplication block       |
+|                                 |
+|                 l2_dedup_index  |
+|---------------------------------|
+                         |
+         0               |              40
+                         |
+         |-------------------------------|
+         |                               |
+         |    Deduplication table entry  |
+         |                               |
+         |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+    Byte  0 - 31:   hash of data cluster
+
+         32 - 39:   Logical offset of first encountered block having
+                    this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+cluster_size = 4096
+dedup_block_size = 65536 * 5
+l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -211,7 +311,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
 
 The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
-exactly one cluster in size.
+exactly one cluster in size excepted for the deduplication case.
 
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 16:30   ` Eric Blake
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 03/62] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
                   ` (60 subsequent siblings)
  62 siblings, 1 reply; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.h |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 718b52b..b31b64e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -43,6 +43,10 @@
 #define QCOW_OFLAG_COPIED     (1LL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
 #define QCOW_OFLAG_COMPRESSED (1LL << 62)
+/* indicate that the cluster must be processed when deduplication restart
+ * also indicate that the on disk dedup hash must be ignored and discarded
+ */
+#define QCOW_OFLAG_TO_DEDUP (1LL << 61)
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1LL << 0)
 
@@ -58,6 +62,57 @@
 
 #define DEFAULT_CLUSTER_SIZE 65536
 
+#define HASH_LENGTH 32
+
+typedef enum {
+    QCOW_DEDUP_STOPPED,
+    QCOW_DEDUP_STARTING,
+    QCOW_DEDUP_STARTED,
+    QCOW_DEDUP_STOPPING,
+} QCowDedupStatus;
+
+typedef enum {
+    QCOW_HASH_SHA256 = 0,
+    QCOW_HASH_SHA3   = 1,
+    QCOW_HASH_SKEIN  = 2,
+} QCowHashAlgo;
+
+typedef struct {
+    uint8_t data[HASH_LENGTH]; /* 32 bytes hash of a given cluster */
+} QCowHash;
+
+/* Used to keep a single precomputed hash between the calls of the dedup
+ * function
+ */
+typedef struct {
+    QCowHash hash;
+    bool reuse;                  /* The hash is precomputed reuse it */
+} QcowPersistantHash;
+
+/* deduplication node */
+typedef struct {
+    QCowHash hash;
+    uint64_t physical_sect;       /* where the cluster is stored on disk */
+    uint64_t first_logical_sect;  /* logical sector of the first occurence of
+                                   * this cluster
+                                   */
+} QCowHashNode;
+
+/* Undedupable hashes that must be written later to disk */
+typedef struct QCowHashElement {
+    QCowHash hash;
+    QTAILQ_ENTRY(QCowHashElement) next;
+} QCowHashElement;
+
+typedef struct {
+    QcowPersistantHash phash;  /* contains a hash persisting between calls of
+                                * qcow2_dedup()
+                                */
+    QTAILQ_HEAD(, QCowHashElement) undedupables;
+    int nb_clusters_processed;
+    int nb_undedupable_sectors;
+} QCowDedupState;
+
 typedef struct QCowHeader {
     uint32_t magic;
     uint32_t version;
@@ -114,8 +169,10 @@ enum {
 enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+    QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
+    QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
-    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY,
+    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
 };
 
 /* Compatible feature bits */
@@ -138,6 +195,7 @@ typedef struct BDRVQcowState {
     int cluster_sectors;
     int l2_bits;
     int l2_size;
+    int hash_block_size;
     int l1_size;
     int l1_vm_state_index;
     int csize_shift;
@@ -148,6 +206,7 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    Qcow2Cache *dedup_cluster_cache;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -160,6 +219,17 @@ typedef struct BDRVQcowState {
     int64_t free_cluster_index;
     int64_t free_byte_offset;
 
+    bool has_dedup;
+    QCowDedupStatus dedup_status;
+    QCowHashAlgo dedup_hash_algo;
+    Coroutine *dedup_resume_co;
+    int dedup_co_delay;
+    uint64_t *dedup_table;
+    uint64_t dedup_table_offset;
+    int32_t dedup_table_size;
+    GTree *dedup_tree_by_hash;
+    GTree *dedup_tree_by_sect;
+
     CoMutex lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 03/62] qcow2: Add qcow2_dedup_read_missing_and_concatenate
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 04/62] qcow2: Make update_refcount public Benoît Canet
                   ` (59 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This function is used to read missing data when unaligned writes are
done. This function also concatenate missing data with the given
qiov data in order to prepare a buffer used to look for duplicated
clusters.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    1 +
 block/qcow2-dedup.c |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c       |   36 +++++++++++++++-
 block/qcow2.h       |   12 ++++++
 4 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dedup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..21afc85 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2-dedup.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
new file mode 100644
index 0000000..4e99eb1
--- /dev/null
+++ b/block/qcow2-dedup.c
@@ -0,0 +1,119 @@
+/*
+ * Deduplication for the QCOW2 format
+ *
+ * Copyright (C) Nodalink, SARL. 2012-2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "qcow2.h"
+
+/*
+ * Prepare a buffer containing all the required data required to compute cluster
+ * sized deduplication hashes.
+ * If sector_num or nb_sectors are not cluster-aligned, missing data
+ * before/after the qiov will be read.
+ *
+ * @qiov:               the qiov for which missing data must be read
+ * @sector_num:         the first sectors that must be read into the qiov
+ * @nb_sectors:         the number of sectors to read into the qiov
+ * @data:               the place where the data will be concatenated and stored
+ * @nb_data_sectors:    the resulting size of the contatenated data (in sectors)
+ * @ret:                negative on error
+ */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector_num,
+                                             int nb_sectors,
+                                             uint8_t **data,
+                                             int *nb_data_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    uint64_t cluster_beginning_sector;
+    uint64_t first_sector_after_qiov;
+    int cluster_beginning_nr;
+    int cluster_ending_nr;
+    int unaligned_ending_nr;
+    uint64_t max_cluster_ending_nr;
+
+    /* compute how much and where to read at the beginning */
+    cluster_beginning_nr = sector_num & (s->cluster_sectors - 1);
+    cluster_beginning_sector = sector_num - cluster_beginning_nr;
+
+    /* for the ending */
+    first_sector_after_qiov = sector_num + nb_sectors;
+    unaligned_ending_nr = first_sector_after_qiov & (s->cluster_sectors - 1);
+    cluster_ending_nr = unaligned_ending_nr ?
+                        s->cluster_sectors - unaligned_ending_nr : 0;
+
+    /* compute total size in sectors and allocate memory */
+    *nb_data_sectors = cluster_beginning_nr + nb_sectors + cluster_ending_nr;
+    *data = qemu_blockalign(bs, *nb_data_sectors * BDRV_SECTOR_SIZE);
+
+    /* read beginning */
+    if (cluster_beginning_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data,
+                                      cluster_beginning_sector,
+                                      cluster_beginning_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* append qiov content */
+    qemu_iovec_to_buf(qiov, 0, *data + cluster_beginning_nr * BDRV_SECTOR_SIZE,
+                      qiov->size);
+
+    /* Fix cluster_ending_nr if we are at risk of reading outside the image
+     * (Cluster unaligned image size)
+     */
+    max_cluster_ending_nr = bs->total_sectors - first_sector_after_qiov;
+    cluster_ending_nr = max_cluster_ending_nr < (uint64_t) cluster_ending_nr ?
+                        (int) max_cluster_ending_nr : cluster_ending_nr;
+
+    /* read and add ending */
+    if (cluster_ending_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data +
+                                      (cluster_beginning_nr +
+                                      nb_sectors) *
+                                      BDRV_SECTOR_SIZE,
+                                      first_sector_after_qiov,
+                                      cluster_ending_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    return 0;
+
+fail:
+    qemu_vfree(*data);
+    *data = NULL;
+    return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index d603f98..410d3c1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -69,7 +69,6 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
         return 0;
 }
 
-
 /* 
  * read qcow2 extension and fill bs
  * start reading from start_offset
@@ -1110,6 +1109,41 @@ fail:
     return ret;
 }
 
+/**
+ * Read some data from the QCOW2 file
+ *
+ * Important: s->lock is dropped. Things can change before the function return
+ *            to the caller.
+ *
+ * @data:       the buffer where the data must be stored
+ * @sector_num: the sector number to read in the QCOW2 file
+ * @nb_sectors: the number of sectors to read
+ * @ret:        negative on error
+ */
+int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int ret;
+
+    iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
+    iov.iov_base = data;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+    qemu_co_mutex_unlock(&s->lock);
+    ret = bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
+    qemu_co_mutex_lock(&s->lock);
+    if (ret < 0) {
+        error_report("failed to read %d sectors at offset %" PRIu64 "\n",
+                     nb_sectors, sector_num);
+    }
+
+    return ret;
+}
+
 static int qcow2_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index b31b64e..1fceb65 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -376,6 +376,10 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
 
 int qcow2_mark_dirty(BlockDriverState *bs);
 int qcow2_update_header(BlockDriverState *bs);
+int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors);
 
 /* qcow2-refcount.c functions */
 int qcow2_refcount_init(BlockDriverState *bs);
@@ -444,4 +448,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
     void **table);
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
+/* qcow2-dedup.c functions */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector,
+                                             int sectors_nr,
+                                             uint8_t **dedup_cluster_data,
+                                             int *dedup_cluster_data_nr);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 04/62] qcow2: Make update_refcount public.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (2 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 03/62] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 05/62] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
                   ` (58 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    6 +-----
 block/qcow2.h          |    2 ++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a95aa6..e014b0e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -27,10 +27,6 @@
 #include "block/qcow2.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-                            int64_t offset, int64_t length,
-                            int addend);
-
 
 /*********************************************************/
 /* refcount handling */
@@ -413,7 +409,7 @@ fail_block:
 }
 
 /* XXX: cache several refcount block clusters ? */
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     int64_t offset, int64_t length, int addend)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 1fceb65..803aeda 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -399,6 +399,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix);
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+    int64_t offset, int64_t length, int addend);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 05/62] qcow2: Create a way to link to l2 tables when deduplicating.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (3 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 04/62] qcow2: Make update_refcount public Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 06/62] qcow2: Add qcow2_dedup and related functions Benoît Canet
                   ` (57 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    8 ++++++--
 block/qcow2.h         |    9 +++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 56fccf9..63a7241 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -693,7 +693,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
             old_cluster[j++] = l2_table[l2_index + i];
 
         l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+                    (i << s->cluster_bits)) |
+                    (m->oflag_copied ? QCOW_OFLAG_COPIED : 0));
      }
 
 
@@ -706,7 +707,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * If this was a COW, we need to decrease the refcount of the old cluster.
      * Also flush bs->file to get the right order for L2 and refcount update.
      */
-    if (j != 0) {
+    if (!m->overwrite && j != 0) {
         for (i = 0; i < j; i++) {
             qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
         }
@@ -1006,6 +1007,9 @@ again:
                     .offset     = nb_sectors * BDRV_SECTOR_SIZE,
                     .nb_sectors = avail_sectors - nb_sectors,
                 },
+
+                .oflag_copied   = true,
+                .overwrite      = false,
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
             QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 803aeda..4273e7c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -63,6 +63,10 @@
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_FLAG_EMPTY   (1LL << 62)
+/* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
+#define QCOW_FLAG_FIRST   (1LL << 63)
 
 typedef enum {
     QCOW_DEDUP_STOPPED,
@@ -304,6 +308,11 @@ typedef struct QCowL2Meta
      */
     CoQueue dependent_requests;
 
+    /* set to true if QCOW_OFLAG_COPIED must be set in the L2 table entry */
+    bool oflag_copied;
+    /* set to true if we are overwriting an L2 table entry */
+    bool overwrite;
+
     /**
      * The COW Region between the start of the first allocated cluster and the
      * area the guest actually writes to.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 06/62] qcow2: Add qcow2_dedup and related functions
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (4 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 05/62] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 07/62] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
                   ` (56 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |  436 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |    5 +
 2 files changed, 441 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4e99eb1..5901749 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -117,3 +117,439 @@ fail:
     *data = NULL;
     return ret;
 }
+
+/*
+ * Build a QCowHashNode structure
+ *
+ * @hash:               the given hash
+ * @physical_sect:      the cluster offset in the QCOW2 file
+ * @first_logical_sect: the first logical cluster offset written
+ * @ret:                the build QCowHashNode
+ */
+static QCowHashNode *qcow2_dedup_build_qcow_hash_node(QCowHash *hash,
+                                                  uint64_t physical_sect,
+                                                  uint64_t first_logical_sect)
+{
+    QCowHashNode *hash_node;
+
+    hash_node = g_new0(QCowHashNode, 1);
+    memcpy(hash_node->hash.data, hash->data, HASH_LENGTH);
+    hash_node->physical_sect = physical_sect;
+    hash_node->first_logical_sect = first_logical_sect;
+
+    return hash_node;
+}
+
+/*
+ * Compute the hash of a given cluster
+ *
+ * @data: a buffer containing the cluster data
+ * @hash: a QCowHash where to store the computed hash
+ * @ret:  0 on success, negative on error
+ */
+static int qcow2_compute_cluster_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint8_t *data)
+{
+    return 0;
+}
+
+/*
+ * Get a QCowHashNode corresponding to a cluster data
+ *
+ * @phash:           if phash can be used no hash is computed
+ * @data:            a buffer containing the cluster
+ * @nb_clusters_processed: the number of cluster to skip in the buffer
+ * @err:             Error code if any
+ * @ret:             QCowHashNode of the duplicated cluster or NULL if not found
+ */
+static QCowHashNode *qcow2_get_hash_node_for_cluster(BlockDriverState *bs,
+                                                     QcowPersistantHash *phash,
+                                                     uint8_t *data,
+                                                     int nb_clusters_processed,
+                                                     int *err)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    *err = 0;
+
+    /* no hash has been provided compute it and store it for later usage */
+    if (!phash->reuse) {
+        ret = qcow2_compute_cluster_hash(bs,
+                                         &phash->hash,
+                                         data +
+                                         nb_clusters_processed *
+                                         s->cluster_size);
+    }
+
+    /* do not reuse the hash anymore if it was precomputed */
+    phash->reuse = false;
+
+    if (ret < 0) {
+        *err = ret;
+        return NULL;
+    }
+
+    return g_tree_lookup(s->dedup_tree_by_hash, &phash->hash);
+}
+
+/*
+ * Build a QCowHashNode from a given QCowHash and insert it into the tree
+ *
+ * @hash: the given QCowHash
+ */
+static void qcow2_build_and_insert_hash_node(BlockDriverState *bs,
+                                             QCowHash *hash)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    /* build the hash node with QCOW_FLAG_EMPTY as offsets so we will remember
+     * to fill these field later with real values.
+     */
+    hash_node = qcow2_dedup_build_qcow_hash_node(hash,
+                                                 QCOW_FLAG_EMPTY,
+                                                 QCOW_FLAG_EMPTY);
+    g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+}
+
+/*
+ * Helper used to build a QCowHashElement
+ *
+ * @hash: the QCowHash to use
+ * @ret:  a newly allocated QCowHashElement containing the given hash
+ */
+static QCowHashElement *qcow2_build_dedup_hash(QCowHash *hash)
+{
+    QCowHashElement *dedup_hash;
+    dedup_hash = g_new0(QCowHashElement, 1);
+    memcpy(dedup_hash->hash.data, hash->data, HASH_LENGTH);
+    return dedup_hash;
+}
+
+/*
+ * Helper used to link a deduplicated cluster in the l2
+ *
+ * @logical_sect:  the cluster sector seen by the guest
+ * @physical_sect: the cluster sector in the QCOW2 file
+ * @overwrite:     true if we must overwrite the L2 table entry
+ * @ret:
+ */
+static int qcow2_dedup_link_l2(BlockDriverState *bs,
+                               uint64_t logical_sect,
+                               uint64_t physical_sect,
+                               bool overwrite)
+{
+    QCowL2Meta m = {
+        .alloc_offset   = physical_sect << 9,
+        .offset         = logical_sect << 9,
+        .nb_clusters    = 1,
+        .nb_available   = 0,
+        .cow_start = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .cow_end = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .oflag_copied   = false,
+        .overwrite      = overwrite,
+    };
+    return qcow2_alloc_cluster_link_l2(bs, &m);
+}
+
+/* Clear the QCOW_OFLAG_COPIED from the first L2 entry written for a physical
+ * cluster.
+ *
+ * @hash_node: the duplicated hash node
+ * @ret:       0 on success, negative on error
+ */
+static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
+                                                QCowHashNode *hash_node)
+{
+    int ret = 0;
+    uint64_t first_logical_sect = hash_node->first_logical_sect;
+
+    /* QCOW_OFLAG_COPIED already cleared -> do nothing */
+    if (!(first_logical_sect & QCOW_FLAG_FIRST)) {
+        return 0;
+    }
+
+    /* note : QCOW_FLAG_FIRST == QCOW_OFLAG_COPIED */
+    first_logical_sect &= ~QCOW_FLAG_FIRST;
+
+    /* overwrite first L2 entry to clear QCOW_FLAG_COPIED */
+    ret = qcow2_dedup_link_l2(bs, first_logical_sect,
+                              hash_node->physical_sect,
+                              true);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
+    hash_node->first_logical_sect &= first_logical_sect;
+
+    return 0;
+}
+
+/* This function deduplicate a cluster
+ *
+ * @logical_sect: The logical sector of the write
+ * @hash_node:    The duplicated cluster hash node
+ * @ret:          0 on success, negative on error
+ */
+static int qcow2_deduplicate_cluster(BlockDriverState *bs,
+                                     uint64_t logical_sect,
+                                     QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* create new L2 entry */
+    ret = qcow2_dedup_link_l2(bs, logical_sect,
+                              hash_node->physical_sect,
+                              false);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Increment the refcount of the cluster */
+    return update_refcount(bs,
+                           (hash_node->physical_sect /
+                            s->cluster_sectors) << s->cluster_bits,
+                            1, 1);
+}
+
+/* This function tries to deduplicate a given cluster.
+ *
+ * @sector_num:           the logical sector number we are trying to deduplicate
+ * @phash:                Used instead of computing the hash if provided
+ * @data:                 the buffer in which to look for a duplicated cluster
+ * @nb_clusters_processed: the number of cluster that must be skipped in data
+ * @ret:                  ret < 0 on error, 1 on deduplication else 0
+ */
+static int qcow2_try_dedup_cluster(BlockDriverState *bs,
+                                   QcowPersistantHash *phash,
+                                   uint64_t sector_num,
+                                   uint8_t *data,
+                                   int nb_clusters_processed)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    QCowHashNode *hash_node;
+    uint64_t logical_sect;
+    uint64_t existing_physical_offset;
+    int pnum = s->cluster_sectors;
+
+    /* search the tree for duplicated cluster */
+    hash_node = qcow2_get_hash_node_for_cluster(bs,
+                                                phash,
+                                                data,
+                                                nb_clusters_processed,
+                                                &ret);
+
+    /* we won't reuse the hash on error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if cluster is not duplicated store hash for later usage */
+    if (!hash_node) {
+        qcow2_build_and_insert_hash_node(bs, &phash->hash);
+        return 0;
+    }
+
+    logical_sect = sector_num & ~(s->cluster_sectors - 1);
+    ret = qcow2_get_cluster_offset(bs, logical_sect << 9,
+                                   &pnum, &existing_physical_offset);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if we are rewriting the same cluster at the same place do nothing */
+    if (existing_physical_offset == hash_node->physical_sect << 9) {
+        return 1;
+    }
+
+    /* take care of not having refcount > 1 and QCOW_OFLAG_COPIED at once */
+    ret = qcow2_clear_l2_copied_flag_if_needed(bs, hash_node);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* do the deduplication */
+    ret = qcow2_deduplicate_cluster(bs, logical_sect,
+                                    hash_node);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 1;
+}
+
+
+static void add_hash_to_undedupable_list(BlockDriverState *bs,
+                                                    QCowDedupState *ds)
+{
+    /* memorise hash for later storage in gtree and disk */
+    QCowHashElement *dedup_hash = qcow2_build_dedup_hash(&ds->phash.hash);
+    QTAILQ_INSERT_TAIL(&ds->undedupables, dedup_hash, next);
+}
+
+static int qcow2_dedup_starting_from_begining(BlockDriverState *bs,
+                                              QCowDedupState *ds,
+                                              uint64_t sector_num,
+                                              uint8_t *data,
+                                              int left_to_process)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+    int ret = 0;
+
+    for (i = 0; i < left_to_process; i++) {
+        ret = qcow2_try_dedup_cluster(bs,
+                                      &ds->phash,
+                                      sector_num + i * s->cluster_sectors,
+                                      data,
+                                      ds->nb_clusters_processed + i);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* stop if a cluster has not been deduplicated */
+        if (ret != 1) {
+            break;
+        }
+    }
+
+    return i;
+}
+
+static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
+                                                   QCowDedupState *ds,
+                                                   uint8_t *data,
+                                                   int left_to_process)
+{
+    int i;
+    int ret = 0;
+    QCowHashNode *hash_node;
+
+    for (i = 0; i < left_to_process; i++) {
+        hash_node = qcow2_get_hash_node_for_cluster(bs,
+                                                  &ds->phash,
+                                                  data,
+                                                  ds->nb_clusters_processed + i,
+                                                  &ret);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* found a duplicated cluster : stop here */
+        if (hash_node) {
+            break;
+        }
+
+        qcow2_build_and_insert_hash_node(bs, &ds->phash.hash);
+        add_hash_to_undedupable_list(bs, ds);
+    }
+
+    return i;
+}
+
+
+/* Deduplicate all the cluster that can be deduplicated.
+ *
+ * Next it compute the number of non deduplicable sectors to come while storing
+ * the hashes of these sectors in a linked list for later usage.
+ * Then it compute the first duplicated cluster hash that come after non
+ * deduplicable cluster, this hash will be used at next call of the function
+ *
+ * @ds:              a structure containing the state of the deduplication
+ *                   for this write request
+ * @sector_num:      The logical sector
+ * @data:            the buffer containing the data to deduplicate
+ * @data_nr:         the size of the buffer in sectors
+ *
+ */
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    int deduped_clusters_nr = 0;
+    int left_to_process;
+    int begining_index;
+
+    begining_index = sector_num & (s->cluster_sectors - 1);
+
+    left_to_process = (data_nr / s->cluster_sectors) -
+                      ds->nb_clusters_processed;
+
+    /* start deduplicating all that can be cluster after cluster */
+    ret = qcow2_dedup_starting_from_begining(bs,
+                                             ds,
+                                             sector_num,
+                                             data,
+                                             left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    deduped_clusters_nr = ret;
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+
+    /* We deduped everything till the end */
+    if (!left_to_process) {
+        ds->nb_undedupable_sectors = 0;
+        goto exit;
+    }
+
+    /* skip and account the first undedupable cluster found */
+    left_to_process--;
+    ds->nb_clusters_processed++;
+    ds->nb_undedupable_sectors += s->cluster_sectors;
+
+    add_hash_to_undedupable_list(bs, ds);
+
+    /* Count how many non duplicated sector can be written and memorize hashes
+     * to write them after data has reached disk.
+     */
+    ret = qcow2_count_next_non_dedupable_clusters(bs,
+                                                  ds,
+                                                  data,
+                                                  left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+    ds->nb_undedupable_sectors += ret * s->cluster_sectors;
+
+    /* remember to reuse the last hash computed at new qcow2_dedup call */
+    if (left_to_process) {
+        ds->phash.reuse = true;
+    }
+
+exit:
+    if (!deduped_clusters_nr) {
+        return 0;
+    }
+
+    return deduped_clusters_nr * s->cluster_sectors - begining_index;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 4273e7c..11c3002 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -466,5 +466,10 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              int sectors_nr,
                                              uint8_t **dedup_cluster_data,
                                              int *dedup_cluster_data_nr);
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 07/62] qcow2: Add qcow2_dedup_store_new_hashes.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (5 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 06/62] qcow2: Add qcow2_dedup and related functions Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 08/62] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
                   ` (55 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |  325 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.h       |    5 +
 2 files changed, 329 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 5901749..a424af8 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -29,6 +29,12 @@
 #include "qemu-common.h"
 #include "qcow2.h"
 
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint64_t *first_logical_sect,
+                                       uint64_t physical_sect,
+                                       bool write);
+
 /*
  * Prepare a buffer containing all the required data required to compute cluster
  * sized deduplication hashes.
@@ -291,7 +297,11 @@ static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
     /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
     hash_node->first_logical_sect &= first_logical_sect;
 
-    return 0;
+    /* clear the QCOW_FLAG_FIRST flag from disk */
+    return qcow2_dedup_read_write_hash(bs, &hash_node->hash,
+                                       &hash_node->first_logical_sect,
+                                       hash_node->physical_sect,
+                                       true);
 }
 
 /* This function deduplicate a cluster
@@ -553,3 +563,316 @@ exit:
 
     return deduped_clusters_nr * s->cluster_sectors - begining_index;
 }
+
+
+/* Create a deduplication table hash block, write it's offset to disk and
+ * reference it in the RAM deduplication table
+ *
+ * sync this to disk and get the dedup cluster cache entry
+ *
+ * @index: index in the RAM deduplication table
+ * @ret:   offset on success, negative on error
+ */
+static uint64_t qcow2_create_block(BlockDriverState *bs,
+                                               int32_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    uint64_t data64;
+    int ret = 0;
+
+    /* allocate a new dedup table hash block */
+    offset = qcow2_alloc_clusters(bs, s->hash_block_size);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        goto free_fail;
+    }
+
+    /* write the new block offset in the dedup table L1 */
+    data64 = cpu_to_be64(offset);
+    ret = bdrv_pwrite_sync(bs->file,
+                           s->dedup_table_offset +
+                           index * sizeof(uint64_t),
+                           &data64, sizeof(data64));
+
+    if (ret < 0) {
+        goto free_fail;
+    }
+
+    s->dedup_table[index] = offset;
+
+    return offset;
+
+free_fail:
+    qcow2_free_clusters(bs, offset, s->hash_block_size);
+    return ret;
+}
+
+static int qcow2_create_and_get_block(BlockDriverState *bs,
+                                      uint32_t index,
+                                      uint8_t **block)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    int64_t offset;
+
+    offset = qcow2_create_block(bs, index);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+
+    /* get an empty cluster from the dedup cache */
+    ret = qcow2_cache_get_empty(bs, s->dedup_cluster_cache,
+                                offset,
+                                (void **) block);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* clear it */
+    memset(*block, 0, s->hash_block_size);
+
+    return 0;
+}
+
+static inline bool qcow2_has_dedup_block(BlockDriverState *bs,
+                                         uint32_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->dedup_table[index] == 0 ? false : true;
+}
+
+static inline void qcow2_write_hash_to_block_and_dirty(BlockDriverState *bs,
+                                                       uint8_t *block,
+                                                       QCowHash *hash,
+                                                       int offset,
+                                                       uint64_t *logical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t first;
+    first = cpu_to_be64(*logical_sect);
+    memcpy(block + offset, hash->data, HASH_LENGTH);
+    memcpy(block + offset + HASH_LENGTH, &first, 8);
+    qcow2_cache_entry_mark_dirty(s->dedup_cluster_cache, block);
+}
+
+static inline uint64_t qcow2_read_hash_from_block(uint8_t *block,
+                                                  QCowHash *hash,
+                                                  int offset)
+{
+    uint64_t first;
+    memcpy(hash->data, block + offset, HASH_LENGTH);
+    memcpy(&first, block + offset + HASH_LENGTH, 8);
+    return be64_to_cpu(first);
+}
+
+/* Read/write a given hash and cluster_sect from/to the dedup table
+ *
+ * This function doesn't flush the dedup cache to disk
+ *
+ * @hash:                     the hash to read or store
+ * @first_logical_sect:       logical sector of the QCOW_FLAG_OCOPIED cluster
+ * @physical_sect:            sector of the cluster in QCOW2 file (in sectors)
+ * @write:                    true to write, false to read
+ * @ret:                      0 on succes, errno on error
+ */
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint64_t *first_logical_sect,
+                                       uint64_t physical_sect,
+                                       bool write)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint8_t *block = NULL;
+    int ret = 0;
+    int64_t cluster_number;
+    uint32_t index_in_dedup_table;
+    int offset_in_block;
+    int nb_hash_in_block = s->hash_block_size / (HASH_LENGTH + 8);
+
+    cluster_number = physical_sect / s->cluster_sectors;
+    index_in_dedup_table = cluster_number / nb_hash_in_block;
+
+    if (s->dedup_table_size <= index_in_dedup_table) {
+        return -ENOSPC;
+    }
+
+    /* if we must read and there is nothing to read return a null hash */
+    if (!qcow2_has_dedup_block(bs, index_in_dedup_table) && !write) {
+        memset(hash->data, 0, HASH_LENGTH);
+        *first_logical_sect = 0;
+        return 0;
+    }
+
+    if (qcow2_has_dedup_block(bs, index_in_dedup_table)) {
+        ret = qcow2_cache_get(bs,
+                              s->dedup_cluster_cache,
+                              s->dedup_table[index_in_dedup_table],
+                              (void **) &block);
+    } else {
+        ret = qcow2_create_and_get_block(bs,
+                                         index_in_dedup_table,
+                                         &block);
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    offset_in_block = (cluster_number % nb_hash_in_block) *
+                      (HASH_LENGTH + 8);
+
+    if (write)  {
+        qcow2_write_hash_to_block_and_dirty(bs,
+                                            block,
+                                            hash,
+                                            offset_in_block,
+                                            first_logical_sect);
+    } else  {
+        *first_logical_sect = qcow2_read_hash_from_block(block,
+                                                         hash,
+                                                         offset_in_block);
+    }
+
+    qcow2_cache_put(bs, s->dedup_cluster_cache, (void **) &block);
+
+    return 0;
+}
+
+static inline bool is_hash_node_empty(QCowHashNode *hash_node)
+{
+    return hash_node->physical_sect & QCOW_FLAG_EMPTY;
+}
+
+static void qcow2_remove_hash_node(BlockDriverState *bs,
+                                   QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+    g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
+    g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+}
+
+/* This function removes a hash_node from the trees given a physical sector
+ *
+ * @physical_sect: The physical sector of the cluster corresponding to the hash
+ */
+static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
+                                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    hash_node = g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+    if (!hash_node) {
+        return;
+    }
+
+    qcow2_remove_hash_node(bs, hash_node);
+}
+
+/* This function store a hash information to disk and RAM
+ *
+ * @hash:           the QCowHash to process
+ * @logical_sect:   the logical sector of the cluster seen by the guest
+ * @physical_sect:  the physical sector of the stored cluster
+ * @ret:            0 on success, negative on error
+ */
+static int qcow2_store_hash(BlockDriverState *bs,
+                            QCowHash *hash,
+                            uint64_t logical_sect,
+                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    hash_node = g_tree_lookup(s->dedup_tree_by_hash, hash);
+
+    /* no hash node found for this hash */
+    if (!hash_node) {
+        return 0;
+    }
+
+    /* the hash node information are already completed */
+    if (!is_hash_node_empty(hash_node)) {
+        return 0;
+    }
+
+    /* Remember that this QCowHashNoderepresent the first occurence of the
+     * cluste so we will be able to clear QCOW_OFLAG_COPIED from the L2 table
+     * entry when refcount will go > 1.
+     */
+    logical_sect = logical_sect | QCOW_FLAG_FIRST;
+
+    /* remove stale hash node pointing to this physical sector from the trees */
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+
+    /* fill the missing fields of the hash node */
+    hash_node->physical_sect = physical_sect;
+    hash_node->first_logical_sect = logical_sect;
+
+    /* insert the hash node in the second tree: it's already in the first one */
+    g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+
+    /* write the hash to disk */
+    return qcow2_dedup_read_write_hash(bs,
+                                       hash,
+                                       &logical_sect,
+                                       physical_sect,
+                                       true);
+}
+
+/* This function store the hashes of the clusters which are not duplicated
+ *
+ * @ds:            The deduplication state
+ * @count:         the number of dedup hash to process
+ * @logical_sect:  logical offset of the first cluster (in sectors)
+ * @physical_sect: offset of the first cluster (in sectors)
+ * @ret:           0 on succes, errno on error
+ */
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect)
+{
+    int ret = 0;
+    int i = 0;
+    BDRVQcowState *s = bs->opaque;
+    QCowHashElement *dedup_hash, *next_dedup_hash;
+
+    /* round values on cluster boundaries for easier cluster deletion */
+    logical_sect = logical_sect & ~(s->cluster_sectors - 1);
+    physical_sect = physical_sect & ~(s->cluster_sectors - 1);
+
+    QTAILQ_FOREACH_SAFE(dedup_hash, &ds->undedupables, next, next_dedup_hash) {
+
+        ret = qcow2_store_hash(bs,
+                               &dedup_hash->hash,
+                               logical_sect + i * s->cluster_sectors,
+                               physical_sect + i * s->cluster_sectors);
+
+        QTAILQ_REMOVE(&ds->undedupables, dedup_hash, next);
+        g_free(dedup_hash);
+
+        if (ret < 0) {
+            break;
+        }
+
+        i++;
+
+        if (i == count) {
+            break;
+        }
+    }
+
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 11c3002..ea0c30e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -471,5 +471,10 @@ int qcow2_dedup(BlockDriverState *bs,
                 uint64_t sector_num,
                 uint8_t *data,
                 int data_nr);
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 08/62] qcow2: Implement qcow2_compute_cluster_hash.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (6 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 07/62] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 09/62] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
                   ` (54 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Add detection of libgnutls used to compute SHA256 hashes

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   13 ++++++++++++-
 configure           |   22 ++++++++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index a424af8..45b2326 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */
 
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
 #include "qcow2.h"
@@ -157,7 +159,16 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
                                        QCowHash *hash,
                                        uint8_t *data)
 {
-    return 0;
+    BDRVQcowState *s = bs->opaque;
+    switch (s->dedup_hash_algo) {
+    case QCOW_HASH_SHA256:
+        return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
+                                s->cluster_size, hash->data);
+    default:
+        error_report("Invalid deduplication hash algorithm %i",
+                     s->dedup_hash_algo);
+        abort();
+    }
 }
 
 /*
diff --git a/configure b/configure
index 99c1ec3..390326e 100755
--- a/configure
+++ b/configure
@@ -1724,6 +1724,28 @@ EOF
 fi
 
 ##########################################
+# QCOW Deduplication gnutls detection
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$qcow_tls_cflags" "$qcow_tls_libs" ; then
+  qcow_tls=yes
+  libs_softmmu="$qcow_tls_libs $libs_softmmu"
+  libs_tools="$qcow_tls_libs $libs_softmmu"
+  QEMU_CFLAGS="$QEMU_CFLAGS $qcow_tls_cflags"
+else
+  echo "gnutls > 2.10.0 required to compile QEMU"
+  exit 1
+fi
+
+##########################################
 # VNC SASL detection
 if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
   cat > $TMPC <<EOF
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 09/62] qcow2: Extract qcow2_dedup_grow_table
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (7 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 08/62] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 10/62] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
                   ` (53 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |  102 +++++++++++++++++++++++++++++++------------------
 block/qcow2-dedup.c   |    3 +-
 block/qcow2.h         |    6 +++
 3 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 63a7241..dbcb6d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -29,44 +29,48 @@
 #include "block/qcow2.h"
 #include "trace.h"
 
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+                        uint64_t **table, uint64_t *table_offset,
+                        int *table_size, qcow2_save_table save_table,
+                        const char *table_name)
 {
     BDRVQcowState *s = bs->opaque;
-    int new_l1_size, new_l1_size2, ret, i;
-    uint64_t *new_l1_table;
-    int64_t new_l1_table_offset;
-    uint8_t data[12];
+    int new_size, new_size2, ret, i;
+    uint64_t *new_table;
+    int64_t new_table_offset;
 
-    if (min_size <= s->l1_size)
+    if (min_size <= *table_size) {
         return 0;
+    }
 
     if (exact_size) {
-        new_l1_size = min_size;
+        new_size = min_size;
     } else {
         /* Bump size up to reduce the number of times we have to grow */
-        new_l1_size = s->l1_size;
-        if (new_l1_size == 0) {
-            new_l1_size = 1;
+        new_size = *table_size;
+        if (new_size == 0) {
+            new_size = 1;
         }
-        while (min_size > new_l1_size) {
-            new_l1_size = (new_l1_size * 3 + 1) / 2;
+        while (min_size > new_size) {
+            new_size = (new_size * 3 + 1) / 2;
         }
     }
 
 #ifdef DEBUG_ALLOC2
-    fprintf(stderr, "grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
+    fprintf(stderr, "grow %s_table from %d to %d\n",
+            table_name, *table_size, new_size);
 #endif
 
-    new_l1_size2 = sizeof(uint64_t) * new_l1_size;
-    new_l1_table = g_malloc0(align_offset(new_l1_size2, 512));
-    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+    new_size2 = sizeof(uint64_t) * new_size;
+    new_table = g_malloc0(align_offset(new_size2, 512));
+    memcpy(new_table, *table, *table_size * sizeof(uint64_t));
 
     /* write new table (align to cluster) */
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
-    new_l1_table_offset = qcow2_alloc_clusters(bs, new_l1_size2);
-    if (new_l1_table_offset < 0) {
-        g_free(new_l1_table);
-        return new_l1_table_offset;
+    new_table_offset = qcow2_alloc_clusters(bs, new_size2);
+    if (new_table_offset < 0) {
+        g_free(new_table);
+        return new_table_offset;
     }
 
     ret = qcow2_cache_flush(bs, s->refcount_block_cache);
@@ -75,34 +79,56 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
-    for(i = 0; i < s->l1_size; i++)
-        new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-    ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset, new_l1_table, new_l1_size2);
+    for (i = 0; i < *table_size; i++) {
+        new_table[i] = cpu_to_be64(new_table[i]);
+    }
+    ret = bdrv_pwrite_sync(bs->file, new_table_offset, new_table, new_size2);
     if (ret < 0)
         goto fail;
-    for(i = 0; i < s->l1_size; i++)
-        new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
+    for (i = 0; i < *table_size; i++) {
+        new_table[i] = be64_to_cpu(new_table[i]);
+    }
+
+    g_free(*table);
+    qcow2_free_clusters(bs, *table_offset, *table_size * sizeof(uint64_t));
+    *table_offset = new_table_offset;
+    *table = new_table;
+    *table_size = new_size;
 
     /* set new table */
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
-    cpu_to_be32w((uint32_t*)data, new_l1_size);
-    cpu_to_be64wu((uint64_t*)(data + 4), new_l1_table_offset);
-    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size), data,sizeof(data));
-    if (ret < 0) {
-        goto fail;
-    }
-    g_free(s->l1_table);
-    qcow2_free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
-    s->l1_table_offset = new_l1_table_offset;
-    s->l1_table = new_l1_table;
-    s->l1_size = new_l1_size;
+    save_table(bs, *table_offset, *table_size);
+
     return 0;
  fail:
-    g_free(new_l1_table);
-    qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2);
+    g_free(new_table);
+    qcow2_free_clusters(bs, new_table_offset, new_size2);
     return ret;
 }
 
+static int qcow2_l1_save_table(BlockDriverState *bs,
+                               int64_t table_offset, int size)
+{
+    uint8_t data[12];
+    cpu_to_be32w((uint32_t *)data, size);
+    cpu_to_be64wu((uint64_t *)(data + 4), table_offset);
+    return bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+                            data, sizeof(data));
+}
+
+int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_grow_table(bs,
+                               min_size,
+                               exact_size,
+                               &s->l1_table,
+                               &s->l1_table_offset,
+                               &s->l1_size,
+                               qcow2_l1_save_table,
+                               "l1");
+}
+
 /*
  * l2_load
  *
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 45b2326..de1b366 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -575,7 +575,6 @@ exit:
     return deduped_clusters_nr * s->cluster_sectors - begining_index;
 }
 
-
 /* Create a deduplication table hash block, write it's offset to disk and
  * reference it in the RAM deduplication table
  *
@@ -592,7 +591,7 @@ static uint64_t qcow2_create_block(BlockDriverState *bs,
     uint64_t data64;
     int ret = 0;
 
-    /* allocate a new dedup table hash block */
+    /* allocate a new dedup table cluster */
     offset = qcow2_alloc_clusters(bs, s->hash_block_size);
 
     if (offset < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index ea0c30e..359a50f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -412,6 +412,12 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     int64_t offset, int64_t length, int addend);
 
 /* qcow2-cluster.c functions */
+typedef int (*qcow2_save_table)(BlockDriverState *bs,
+                                int64_t table_offset, int size);
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+                        uint64_t **table, uint64_t *table_offset,
+                        int *table_size, qcow2_save_table save_table,
+                        const char *table_name);
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
 void qcow2_l2_cache_reset(BlockDriverState *bs);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 10/62] qcow2: Add qcow2_dedup_grow_table and use it.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (8 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 09/62] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 11/62] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
                   ` (52 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index de1b366..de6e3a3 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -38,6 +38,44 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        bool write);
 
 /*
+ * Save the dedup table information into the header extensions
+ *
+ * @table_offset: the dedup table offset in the QCOW2 file
+ * @size:         the size of the dedup table
+ * @ret:          0 on success, -errno  on error
+ */
+static int qcow2_dedup_save_table_info(BlockDriverState *bs,
+                                  int64_t table_offset, int size)
+{
+    BDRVQcowState *s = bs->opaque;
+    s->dedup_table_offset = table_offset;
+    s->dedup_table_size = size;
+    return qcow2_update_header(bs);
+}
+
+/*
+ * Grow the deduplication table
+ *
+ * @min_size:   minimal size
+ * @exact_size: if true force to grow to the exact size
+ * @ret:        0 on success, -errno  on error
+ */
+static int qcow2_dedup_grow_table(BlockDriverState *bs,
+                                  int min_size,
+                                  bool exact_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_grow_table(bs,
+                               min_size,
+                               exact_size,
+                               &s->dedup_table,
+                               &s->dedup_table_offset,
+                               &s->dedup_table_size,
+                               qcow2_dedup_save_table_info,
+                               "dedup");
+}
+
+/*
  * Prepare a buffer containing all the required data required to compute cluster
  * sized deduplication hashes.
  * If sector_num or nb_sectors are not cluster-aligned, missing data
@@ -712,7 +750,11 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
     index_in_dedup_table = cluster_number / nb_hash_in_block;
 
     if (s->dedup_table_size <= index_in_dedup_table) {
-        return -ENOSPC;
+        ret = qcow2_dedup_grow_table(bs, index_in_dedup_table + 1, false);
+    }
+
+    if (ret < 0) {
+        return ret;
     }
 
     /* if we must read and there is nothing to read return a null hash */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 11/62] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (9 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 10/62] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 12/62] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
                   ` (51 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-cluster.c |    8 ++++++--
 block/qcow2-dedup.c   |    7 +++++++
 block/qcow2.h         |    3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dbcb6d2..ef91216 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -709,6 +709,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
 
     for (i = 0; i < m->nb_clusters; i++) {
+        uint64_t flags = 0;
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
@@ -718,9 +719,11 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         if(l2_table[l2_index + i] != 0)
             old_cluster[j++] = l2_table[l2_index + i];
 
+        flags = m->oflag_copied ? QCOW_OFLAG_COPIED : 0;
+        flags |= m->to_deduplicate ? QCOW_OFLAG_TO_DEDUP : 0;
+
         l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) |
-                    (m->oflag_copied ? QCOW_OFLAG_COPIED : 0));
+                    (i << s->cluster_bits)) | flags);
      }
 
 
@@ -1036,6 +1039,7 @@ again:
 
                 .oflag_copied   = true,
                 .overwrite      = false,
+                .to_deduplicate = qcow2_must_deduplicate(bs),
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
             QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index de6e3a3..3d512e5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -37,6 +37,12 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        uint64_t physical_sect,
                                        bool write);
 
+bool qcow2_must_deduplicate(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->has_dedup && s->dedup_status != QCOW_DEDUP_STARTED;
+}
+
 /*
  * Save the dedup table information into the header extensions
  *
@@ -310,6 +316,7 @@ static int qcow2_dedup_link_l2(BlockDriverState *bs,
         },
         .oflag_copied   = false,
         .overwrite      = overwrite,
+        .to_deduplicate = false,
     };
     return qcow2_alloc_cluster_link_l2(bs, &m);
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 359a50f..da7e57e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -312,6 +312,8 @@ typedef struct QCowL2Meta
     bool oflag_copied;
     /* set to true if we are overwriting an L2 table entry */
     bool overwrite;
+    /* set to true if the cluster must be tagged with QCOW_OFLAG_TO_DEDUP */
+    bool to_deduplicate;
 
     /**
      * The COW Region between the start of the first allocated cluster and the
@@ -466,6 +468,7 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
 /* qcow2-dedup.c functions */
+bool qcow2_must_deduplicate(BlockDriverState *bs);
 int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              QEMUIOVector *qiov,
                                              uint64_t sector,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 12/62] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (10 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 11/62] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 13/62] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
                   ` (50 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |   11 +++++++++--
 block/qcow2-dedup.c   |    8 +++++++-
 block/qcow2.h         |    2 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ef91216..5b1d20d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -710,6 +710,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
     for (i = 0; i < m->nb_clusters; i++) {
         uint64_t flags = 0;
+        uint64_t offset = cluster_offset + (i << s->cluster_bits);
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
@@ -722,8 +723,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         flags = m->oflag_copied ? QCOW_OFLAG_COPIED : 0;
         flags |= m->to_deduplicate ? QCOW_OFLAG_TO_DEDUP : 0;
 
-        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | flags);
+        l2_table[l2_index + i] = cpu_to_be64(offset | flags);
+
+        /* make the deduplication forget the cluster to avoid making
+         * the dedup pointing to a cluster that has changed on it's back.
+         */
+        if (m->to_deduplicate) {
+            qcow2_dedup_forget_cluster_by_sector(bs, offset >> 9);
+        }
      }
 
 
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 3d512e5..7049bd8 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -824,7 +824,7 @@ static void qcow2_remove_hash_node(BlockDriverState *bs,
  * @physical_sect: The physical sector of the cluster corresponding to the hash
  */
 static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
-                                            uint64_t physical_sect)
+                                             uint64_t physical_sect)
 {
     BDRVQcowState *s = bs->opaque;
     QCowHashNode *hash_node;
@@ -838,6 +838,12 @@ static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
     qcow2_remove_hash_node(bs, hash_node);
 }
 
+void qcow2_dedup_forget_cluster_by_sector(BlockDriverState *bs,
+                                          uint64_t physical_sect)
+{
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
 /* This function store a hash information to disk and RAM
  *
  * @hash:           the QCowHash to process
diff --git a/block/qcow2.h b/block/qcow2.h
index da7e57e..bc1ba33 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -469,6 +469,8 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
 /* qcow2-dedup.c functions */
 bool qcow2_must_deduplicate(BlockDriverState *bs);
+void qcow2_dedup_forget_cluster_by_sector(BlockDriverState *bs,
+                                          uint64_t physical_sect);
 int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              QEMUIOVector *qiov,
                                              uint64_t sector,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 13/62] qcow2: Create qcow2_is_cluster_to_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (11 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 12/62] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 14/62] qcow2: Load and save deduplication table header extension Benoît Canet
                   ` (49 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h         |    4 ++++
 2 files changed, 56 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5b1d20d..fedcf57 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -514,6 +514,58 @@ out:
     return ret;
 }
 
+/* Check if a cluster is to deduplicate given it's index
+ *
+ * @index:         The logical index of the cluster starting from 0
+ * @physical_sect: The physical sector of the cluster as return value
+ * @err:           0 on success, negative on error
+ * @ret:           True if the cluster is to deduplicate else false
+ */
+bool qcow2_is_cluster_to_dedup(BlockDriverState *bs,
+                               uint64_t index,
+                               uint64_t *physical_sect,
+                               int *err)
+{
+    BDRVQcowState *s = bs->opaque;
+    unsigned int l1_index, l2_index;
+    uint64_t offset;
+    uint64_t l2_offset;
+    uint64_t *l2_table = NULL;
+
+    *physical_sect = 0;
+    *err = 0;
+
+    l1_index = index >> s->l2_bits;
+
+    if (l1_index >= s->l1_size) {
+        return false;
+    }
+
+    /* no l1 entry */
+    if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
+        return false;
+    }
+
+    l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+
+    *err = l2_load(bs, l2_offset, &l2_table);
+    if (*err < 0) {
+        return false;
+    }
+
+    l2_index = index & (s->l2_size - 1);
+
+    offset = be64_to_cpu(l2_table[l2_index]);
+    *physical_sect = (offset & L2E_OFFSET_MASK) >> 9;
+
+    *err = qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
+    if (*err < 0) {
+        return false;
+    }
+
+    return offset & QCOW_OFLAG_TO_DEDUP;
+}
+
 /*
  * get_cluster_table
  *
diff --git a/block/qcow2.h b/block/qcow2.h
index bc1ba33..0232088 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -440,6 +440,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
 int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
     int nb_sectors);
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
+bool qcow2_is_cluster_to_dedup(BlockDriverState *bs,
+                               uint64_t index,
+                               uint64_t *physical_sect,
+                               int *ret);
 
 /* qcow2-snapshot.c functions */
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 14/62] qcow2: Load and save deduplication table header extension.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (12 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 13/62] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 15/62] qcow2: Extract qcow2_do_table_init Benoît Canet
                   ` (48 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 410d3c1..acd3258 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -53,9 +53,18 @@ typedef struct {
     uint32_t len;
 } QCowExtension;
 
+typedef struct {
+    uint64_t offset;
+    int32_t  size;
+    uint8_t  hash_algo;
+    uint8_t  strategies;
+    char     reserved[56];
+} QCowDedupTableExtension;
+
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_DEDUP_TABLE 0xCD8E819B
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -83,6 +92,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
     QCowExtension ext;
     uint64_t offset;
     int ret;
+    QCowDedupTableExtension dedup_table_extension;
 
 #ifdef DEBUG_EXT
     printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
@@ -147,6 +157,19 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             }
             break;
 
+        case QCOW2_EXT_MAGIC_DEDUP_TABLE:
+                ret = bdrv_pread(bs->file, offset,
+                                 &dedup_table_extension, ext.len);
+                if (ret < 0) {
+                    return ret;
+                }
+                s->dedup_table_offset =
+                    be64_to_cpu(dedup_table_extension.offset);
+                s->dedup_table_size =
+                    be32_to_cpu(dedup_table_extension.size);
+                s->dedup_hash_algo = dedup_table_extension.hash_algo;
+            break;
+
         default:
             /* unknown magic - save it in case we need to rewrite the header */
             {
@@ -958,6 +981,7 @@ int qcow2_update_header(BlockDriverState *bs)
     uint32_t refcount_table_clusters;
     size_t header_length;
     Qcow2UnknownHeaderExtension *uext;
+    QCowDedupTableExtension dedup_table_extension;
 
     buf = qemu_blockalign(bs, buflen);
 
@@ -1061,6 +1085,25 @@ int qcow2_update_header(BlockDriverState *bs)
     buf += ret;
     buflen -= ret;
 
+    if (s->has_dedup) {
+        memset(&dedup_table_extension, 0, sizeof(dedup_table_extension));
+        dedup_table_extension.offset = cpu_to_be64(s->dedup_table_offset);
+        dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
+        dedup_table_extension.hash_algo = s->dedup_hash_algo;
+        dedup_table_extension.strategies |= 1; /* RAM based lookup */
+        dedup_table_extension.strategies |= 1 << 2; /* deduplication running */
+        ret = header_ext_add(buf,
+                             QCOW2_EXT_MAGIC_DEDUP_TABLE,
+                             &dedup_table_extension,
+                             sizeof(dedup_table_extension),
+                             buflen);
+        if (ret < 0) {
+            goto fail;
+        }
+        buf += ret;
+        buflen -= ret;
+    }
+
     /* Keep unknown header extensions */
     QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
         ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 15/62] qcow2: Extract qcow2_do_table_init.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (13 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 14/62] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 16/62] qcow2-cache: Allow to choose table size at creation Benoît Canet
                   ` (47 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |   43 ++++++++++++++++++++++++++++++-------------
 block/qcow2.h          |    5 +++++
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e014b0e..75c2bde 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,27 +31,44 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
 /*********************************************************/
 /* refcount handling */
 
-int qcow2_refcount_init(BlockDriverState *bs)
+int qcow2_do_table_init(BlockDriverState *bs,
+                        uint64_t **table,
+                        int64_t offset,
+                        int size,
+                        bool is_refcount)
 {
-    BDRVQcowState *s = bs->opaque;
-    int ret, refcount_table_size2, i;
-
-    refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
-    s->refcount_table = g_malloc(refcount_table_size2);
-    if (s->refcount_table_size > 0) {
-        BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
-        ret = bdrv_pread(bs->file, s->refcount_table_offset,
-                         s->refcount_table, refcount_table_size2);
-        if (ret != refcount_table_size2)
+    int ret, size2, i;
+
+    size2 = size * sizeof(uint64_t);
+    *table = g_malloc(size2);
+    if (size > 0) {
+        if (is_refcount) {
+            BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
+        }
+        ret = bdrv_pread(bs->file, offset,
+                         *table, size2);
+        if (ret != size2) {
             goto fail;
-        for(i = 0; i < s->refcount_table_size; i++)
-            be64_to_cpus(&s->refcount_table[i]);
+        }
+        for (i = 0; i < size; i++) {
+            be64_to_cpus(&(*table)[i]);
+        }
     }
     return 0;
  fail:
     return -ENOMEM;
 }
 
+int qcow2_refcount_init(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_table_init(bs,
+                               &s->refcount_table,
+                               s->refcount_table_offset,
+                               s->refcount_table_size,
+                               true);
+}
+
 void qcow2_refcount_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 0232088..8eb2977 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -393,6 +393,11 @@ int qcow2_read_cluster_data(BlockDriverState *bs,
                             int nb_sectors);
 
 /* qcow2-refcount.c functions */
+int qcow2_do_table_init(BlockDriverState *bs,
+                        uint64_t **table,
+                        int64_t offset,
+                        int size,
+                        bool is_refcount);
 int qcow2_refcount_init(BlockDriverState *bs);
 void qcow2_refcount_close(BlockDriverState *bs);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 16/62] qcow2-cache: Allow to choose table size at creation.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (14 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 15/62] qcow2: Extract qcow2_do_table_init Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 17/62] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
                   ` (46 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cache.c |   12 +++++++-----
 block/qcow2.c       |    5 +++--
 block/qcow2.h       |    3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2f3114e..83f2814 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -40,20 +40,22 @@ struct Qcow2Cache {
     struct Qcow2Cache*      depends;
     int                     size;
     bool                    depends_on_flush;
+    int                     table_size;
 };
 
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+                               int table_size)
 {
-    BDRVQcowState *s = bs->opaque;
     Qcow2Cache *c;
     int i;
 
     c = g_malloc0(sizeof(*c));
     c->size = num_tables;
     c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+    c->table_size = table_size;
 
     for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
+        c->entries[i].table = qemu_blockalign(bs, c->table_size);
     }
 
     return c;
@@ -121,7 +123,7 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
     }
 
     ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
+        c->table_size);
     if (ret < 0) {
         return ret;
     }
@@ -253,7 +255,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         }
 
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+        ret = bdrv_pread(bs->file, offset, c->entries[i].table, c->table_size);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index acd3258..b8c4e31 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -452,8 +452,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
+                                                 s->cluster_size);
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
diff --git a/block/qcow2.h b/block/qcow2.h
index 8eb2977..b17977f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -461,7 +461,8 @@ void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
 /* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+                               int table_size);
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
 
 void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 17/62] qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (15 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 16/62] qcow2-cache: Allow to choose table size at creation Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 18/62] block: Add qemu-img dedup create option Benoît Canet
                   ` (45 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   49 ++++++++++++++++++++++++++++++-------------------
 block/qcow2.h |    4 ++--
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b8c4e31..f046a77 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -238,61 +238,72 @@ static void report_unsupported_feature(BlockDriverState *bs,
 }
 
 /*
- * Sets the dirty bit and flushes afterwards if necessary.
+ * Sets the an incompatible feature bit and flushes afterwards if necessary.
  *
  * The incompatible_features bit is only set if the image file header was
  * updated successfully.  Therefore it is not required to check the return
  * value of this function.
  */
-int qcow2_mark_dirty(BlockDriverState *bs)
+static int qcow2_add_feature(BlockDriverState *bs,
+                             QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t val;
-    int ret;
+    int ret = 0;
 
     assert(s->qcow_version >= 3);
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        return 0; /* already dirty */
+    if (s->incompatible_features & feature) {
+        return 0; /* already added */
     }
 
-    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
+    val = cpu_to_be64(s->incompatible_features | feature);
     ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
                       &val, sizeof(val));
     if (ret < 0) {
         return ret;
     }
-    ret = bdrv_flush(bs->file);
-    if (ret < 0) {
-        return ret;
-    }
 
-    /* Only treat image as dirty if the header was updated successfully */
-    s->incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+    /* Only treat image as having the feature if the header was updated
+     * successfully
+     */
+    s->incompatible_features |= feature;
     return 0;
 }
 
+int qcow2_mark_dirty(BlockDriverState *bs)
+{
+    return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 /*
- * Clears the dirty bit and flushes before if necessary.  Only call this
- * function when there are no pending requests, it does not guard against
- * concurrent requests dirtying the image.
+ * Clears an incompatible feature bit and flushes before if necessary.
+ * Only call this function when there are no pending requests, it does not
+ * guard against concurrent requests adding a feature to the image.
  */
-static int qcow2_mark_clean(BlockDriverState *bs)
+static int qcow2_remove_feature(BlockDriverState *bs,
+                             QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
+    int ret = 0;
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        int ret = bdrv_flush(bs);
+    if (s->incompatible_features & feature) {
+        ret = bdrv_flush(bs);
         if (ret < 0) {
             return ret;
         }
 
-        s->incompatible_features &= ~QCOW2_INCOMPAT_DIRTY;
+        s->incompatible_features &= ~feature;
         return qcow2_update_header(bs);
     }
     return 0;
 }
 
+static int qcow2_mark_clean(BlockDriverState *bs)
+{
+    return qcow2_remove_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
                        BdrvCheckMode fix)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index b17977f..59432fd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -170,14 +170,14 @@ enum {
 };
 
 /* Incompatible feature bits */
-enum {
+typedef enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
     QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
     QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
-};
+} QCow2IncompatibleFeature;
 
 /* Compatible feature bits */
 enum {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 18/62] block: Add qemu-img dedup create option.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (16 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 17/62] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 19/62] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
                   ` (44 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c             |  113 +++++++++++++++++++++++++++++++++++++++------
 block/qcow2.h             |    2 +
 include/block/block_int.h |    1 +
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f046a77..835554d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -276,6 +276,11 @@ int qcow2_mark_dirty(BlockDriverState *bs)
     return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
 }
 
+static int qcow2_activate_dedup(BlockDriverState *bs)
+{
+    return qcow2_add_feature(bs, QCOW2_INCOMPAT_DEDUP);
+}
+
 /*
  * Clears an incompatible feature bit and flushes before if necessary.
  * Only call this function when there are no pending requests, it does not
@@ -907,6 +912,11 @@ static void qcow2_close(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     g_free(s->l1_table);
 
+    if (s->has_dedup) {
+        qcow2_cache_flush(bs, s->dedup_cluster_cache);
+        qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+    }
+
     qcow2_cache_flush(bs, s->l2_table_cache);
     qcow2_cache_flush(bs, s->refcount_block_cache);
 
@@ -1266,7 +1276,8 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
                          const char *backing_file, const char *backing_format,
                          int flags, size_t cluster_size, int prealloc,
-                         QEMUOptionParameter *options, int version)
+                         QEMUOptionParameter *options, int version,
+                         bool dedup, uint8_t hash_algo)
 {
     /* Calculate cluster_bits */
     int cluster_bits;
@@ -1293,8 +1304,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
      * size for any qcow2 image.
      */
     BlockDriverState* bs;
+    BDRVQcowState *s;
     QCowHeader header;
-    uint8_t* refcount_table;
+    uint8_t *tables;
+    int size;
     int ret;
 
     ret = bdrv_create_file(filename, options);
@@ -1336,10 +1349,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         goto out;
     }
 
-    /* Write an empty refcount table */
-    refcount_table = g_malloc0(cluster_size);
-    ret = bdrv_pwrite(bs, cluster_size, refcount_table, cluster_size);
-    g_free(refcount_table);
+    /* Write an empty refcount table + extra space for dedup table if needed */
+    size = dedup ? 2 : 1;
+    tables = g_malloc0(size * cluster_size);
+    ret = bdrv_pwrite(bs, cluster_size, tables, size * cluster_size);
+    g_free(tables);
 
     if (ret < 0) {
         goto out;
@@ -1350,7 +1364,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     /*
      * And now open the image and make it consistent first (i.e. increase the
      * refcount of the cluster that is occupied by the header and the refcount
-     * table)
+     * table and the eventual dedup table)
      */
     BlockDriver* drv = bdrv_find_format("qcow2");
     assert(drv != NULL);
@@ -1360,7 +1374,8 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         goto out;
     }
 
-    ret = qcow2_alloc_clusters(bs, 2 * cluster_size);
+    size++; /* Add a cluster for the header */
+    ret = qcow2_alloc_clusters(bs, size * cluster_size);
     if (ret < 0) {
         goto out;
 
@@ -1370,11 +1385,33 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     }
 
     /* Okay, now that we have a valid image, let's give it the right size */
+    s = bs->opaque;
     ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
     if (ret < 0) {
         goto out;
     }
 
+    if (dedup) {
+        s->has_dedup = true;
+        s->dedup_table_offset = cluster_size * 2;
+        s->dedup_table_size = cluster_size / sizeof(uint64_t);
+        s->dedup_hash_algo = hash_algo;
+
+        ret = qcow2_activate_dedup(bs);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = qcow2_update_header(bs);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* minimal init */
+        s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+                                                    s->hash_block_size);
+    }
+
     /* Want a backing file? There you go.*/
     if (backing_file) {
         ret = bdrv_change_backing_file(bs, backing_file, backing_format);
@@ -1400,15 +1437,41 @@ out:
     return ret;
 }
 
+static int qcow2_warn_if_version_3_is_needed(int version,
+                                             bool has_feature,
+                                             const char *feature)
+{
+    if (version < 3 && has_feature) {
+        fprintf(stderr, "%s only supported with compatibility "
+                "level 1.1 and above (use compat=1.1 or greater)\n",
+                feature);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int8_t qcow2_get_dedup_hash_algo(char *value)
+{
+    if (!strcmp(value, "sha256")) {
+        return QCOW_HASH_SHA256;
+    }
+
+    error_printf("Unsupported deduplication hash algorithm.\n");
+    return -EINVAL;
+}
+
 static int qcow2_create(const char *filename, QEMUOptionParameter *options)
 {
     const char *backing_file = NULL;
     const char *backing_fmt = NULL;
     uint64_t sectors = 0;
     int flags = 0;
+    int ret;
     size_t cluster_size = DEFAULT_CLUSTER_SIZE;
     int prealloc = 0;
     int version = 2;
+    bool dedup = false;
+    int8_t hash_algo = 0;
 
     /* Read out options */
     while (options && options->name) {
@@ -1446,24 +1509,43 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
             }
         } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
             flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
+        } else if (!strcmp(options->name, BLOCK_OPT_DEDUP) &&
+                   options->value.s) {
+            hash_algo = qcow2_get_dedup_hash_algo(options->value.s);
+            if (hash_algo < 0) {
+                return hash_algo;
+            }
+            dedup = true;
         }
         options++;
     }
 
+    if (dedup) {
+        cluster_size = 4096;
+    }
+
     if (backing_file && prealloc) {
         fprintf(stderr, "Backing file and preallocation cannot be used at "
             "the same time\n");
         return -EINVAL;
     }
 
-    if (version < 3 && (flags & BLOCK_FLAG_LAZY_REFCOUNTS)) {
-        fprintf(stderr, "Lazy refcounts only supported with compatibility "
-                "level 1.1 and above (use compat=1.1 or greater)\n");
-        return -EINVAL;
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            flags & BLOCK_FLAG_LAZY_REFCOUNTS,
+                                            "Lazy refcounts");
+    if (ret < 0) {
+        return ret;
+    }
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            dedup,
+                                            "Deduplication");
+    if (ret < 0) {
+        return ret;
     }
 
     return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
-                         cluster_size, prealloc, options, version);
+                         cluster_size, prealloc, options, version,
+                         dedup, hash_algo);
 }
 
 static int qcow2_make_empty(BlockDriverState *bs)
@@ -1766,6 +1848,11 @@ static QEMUOptionParameter qcow2_create_options[] = {
         .type = OPT_FLAG,
         .help = "Postpone refcount updates",
     },
+    {
+        .name = BLOCK_OPT_DEDUP,
+        .type = OPT_STRING,
+        .help = "Deduplication",
+    },
     { NULL }
 };
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 59432fd..f987328 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -60,6 +60,8 @@
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define REFCOUNT_CACHE_SIZE 4
 
+#define DEDUP_CACHE_SIZE 4
+
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f83ffb8..b7ed3e6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -55,6 +55,7 @@
 #define BLOCK_OPT_SUBFMT            "subformat"
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
+#define BLOCK_OPT_DEDUP             "dedup"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 19/62] qcow2: Add a deduplication boolean to update_refcount.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (17 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 18/62] block: Add qemu-img dedup create option Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 20/62] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
                   ` (43 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This is needed for next commit which handle the deduplication refcount overflow
case.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c    |    2 +-
 block/qcow2-refcount.c |   20 +++++++++++---------
 block/qcow2.h          |    2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 7049bd8..25ecefa 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -386,7 +386,7 @@ static int qcow2_deduplicate_cluster(BlockDriverState *bs,
     return update_refcount(bs,
                            (hash_node->physical_sect /
                             s->cluster_sectors) << s->cluster_bits,
-                            1, 1);
+                            1, 1, true);
 }
 
 /* This function tries to deduplicate a given cluster.
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 75c2bde..b1ad112 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -245,7 +245,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     } else {
         /* Described somewhere else. This can recurse at most twice before we
          * arrive at a block that describes itself. */
-        ret = update_refcount(bs, new_block, s->cluster_size, 1);
+        ret = update_refcount(bs, new_block, s->cluster_size, 1, false);
         if (ret < 0) {
             goto fail_block;
         }
@@ -427,7 +427,7 @@ fail_block:
 
 /* XXX: cache several refcount block clusters ? */
 int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-    int64_t offset, int64_t length, int addend)
+    int64_t offset, int64_t length, int addend, bool deduplication)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t start, last, cluster_offset;
@@ -513,7 +513,8 @@ fail:
      */
     if (ret < 0) {
         int dummy;
-        dummy = update_refcount(bs, offset, cluster_offset - offset, -addend);
+        dummy = update_refcount(bs, offset, cluster_offset - offset, -addend,
+                                deduplication);
         (void)dummy;
     }
 
@@ -534,7 +535,8 @@ static int update_cluster_refcount(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend);
+    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend,
+                          false);
     if (ret < 0) {
         return ret;
     }
@@ -588,7 +590,7 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t size)
         return offset;
     }
 
-    ret = update_refcount(bs, offset, size, 1);
+    ret = update_refcount(bs, offset, size, 1, false);
     if (ret < 0) {
         return ret;
     }
@@ -620,7 +622,7 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
     old_free_cluster_index = s->free_cluster_index;
     s->free_cluster_index = cluster_index + i;
 
-    ret = update_refcount(bs, offset, i << s->cluster_bits, 1);
+    ret = update_refcount(bs, offset, i << s->cluster_bits, 1, false);
     if (ret < 0) {
         return ret;
     }
@@ -686,7 +688,7 @@ void qcow2_free_clusters(BlockDriverState *bs,
     int ret;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_FREE);
-    ret = update_refcount(bs, offset, size, -1);
+    ret = update_refcount(bs, offset, size, -1, false);
     if (ret < 0) {
         fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret));
         /* TODO Remember the clusters to free them later and avoid leaking */
@@ -795,7 +797,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             int ret;
                             ret = update_refcount(bs,
                                 (offset & s->cluster_offset_mask) & ~511,
-                                nb_csectors * 512, addend);
+                                nb_csectors * 512, addend, false);
                             if (ret < 0) {
                                 goto fail;
                             }
@@ -1228,7 +1230,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 
             if (num_fixed) {
                 ret = update_refcount(bs, i << s->cluster_bits, 1,
-                                      refcount2 - refcount1);
+                                      refcount2 - refcount1, false);
                 if (ret >= 0) {
                     (*num_fixed)++;
                     continue;
diff --git a/block/qcow2.h b/block/qcow2.h
index f987328..5c126be 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -418,7 +418,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix);
 int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-    int64_t offset, int64_t length, int addend);
+    int64_t offset, int64_t length, int addend, bool deduplication);
 
 /* qcow2-cluster.c functions */
 typedef int (*qcow2_save_table)(BlockDriverState *bs,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 20/62] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (18 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 19/62] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
@ 2013-01-16 15:47 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 21/62] qcow2: Remove hash when cluster is deleted Benoît Canet
                   ` (42 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

A new physical cluster with the same hash value will be used for further
occurence of this hash.
---
 block/qcow2-dedup.c    |   32 ++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c |    3 +++
 block/qcow2.h          |    4 ++++
 3 files changed, 39 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 25ecefa..9eba773 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -941,3 +941,35 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
 
     return ret;
 }
+
+/* Force to use a new physical cluster and QCowHashNode when the refcount pass
+ * 2^16/2.
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
+                                           uint64_t cluster_index)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+    uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+    hash_node =  g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+    if (!hash_node) {
+        return;
+    }
+
+    /* mark this hash so we won't load it anymore at startup after writing it */
+    hash_node->first_logical_sect |= QCOW_FLAG_HALF_MAX_REFCOUNT;
+
+    /* write to disk */
+    qcow2_dedup_read_write_hash(bs,
+                                &hash_node->hash,
+                                &hash_node->first_logical_sect,
+                                hash_node->physical_sect,
+                                true);
+
+    /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
+    qcow2_remove_hash_node(bs, hash_node);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b1ad112..ac396c4 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -489,6 +489,9 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
             ret = -EINVAL;
             goto fail;
         }
+        if (s->has_dedup && deduplication && refcount >= 0xFFFF/2) {
+            qcow2_dedup_refcount_half_max_reached(bs, cluster_index);
+        }
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index 5c126be..ba10ed0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -65,6 +65,8 @@
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
+/* indicate that this cluster refcount has reached its maximum value */
+#define QCOW_FLAG_HALF_MAX_REFCOUNT (1LL << 61)
 /* indicate that the hash structure is empty and miss offset */
 #define QCOW_FLAG_EMPTY   (1LL << 62)
 /* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
@@ -499,5 +501,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  int count,
                                  uint64_t logical_sect,
                                  uint64_t physical_sect);
+void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
+                                           uint64_t cluster_index);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 21/62] qcow2: Remove hash when cluster is deleted.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (19 preceding siblings ...)
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 20/62] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 22/62] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
                   ` (41 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c    |   26 ++++++++++++++++++++++++++
 block/qcow2-refcount.c |    3 +++
 block/qcow2.h          |    2 ++
 3 files changed, 31 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 9eba773..8b51dda 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -942,6 +942,32 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
     return ret;
 }
 
+/* Clean the last reference to a given cluster when it's refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+                                      uint64_t cluster_index)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHash null_hash;
+    uint64_t logical_sect = 0;
+    uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+    /* prepare null hash */
+    memset(&null_hash, 0, sizeof(null_hash));
+
+    /* clear from disk */
+    qcow2_dedup_read_write_hash(bs,
+                                &null_hash,
+                                &logical_sect,
+                                physical_sect,
+                                true);
+
+    /* remove from ram if present so we won't dedup with it anymore */
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
 /* Force to use a new physical cluster and QCowHashNode when the refcount pass
  * 2^16/2.
  *
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ac396c4..6a6719f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -492,6 +492,9 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         if (s->has_dedup && deduplication && refcount >= 0xFFFF/2) {
             qcow2_dedup_refcount_half_max_reached(bs, cluster_index);
         }
+        if (s->has_dedup && refcount == 0) {
+            qcow2_dedup_refcount_zero_reached(bs, cluster_index);
+        }
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index ba10ed0..842c321 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -501,6 +501,8 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  int count,
                                  uint64_t logical_sect,
                                  uint64_t physical_sect);
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+                                       uint64_t cluster_index);
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 22/62] qcow2: Add qcow2_dedup_is_running to probe if dedup is running.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (20 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 21/62] qcow2: Remove hash when cluster is deleted Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 23/62] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
                   ` (40 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    6 ++++++
 block/qcow2.h       |    1 +
 2 files changed, 7 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 8b51dda..cc99e27 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -999,3 +999,9 @@ void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
     /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
     qcow2_remove_hash_node(bs, hash_node);
 }
+
+bool qcow2_dedup_is_running(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 842c321..dc9f519 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -505,5 +505,6 @@ void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
                                        uint64_t cluster_index);
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
+bool qcow2_dedup_is_running(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 23/62] qcow2: Integrate deduplication in qcow2_co_writev loop.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (21 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 22/62] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 24/62] qcow2: Serialize write requests when deduplication is activated Benoît Canet
                   ` (39 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 835554d..6b8f85f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -330,6 +330,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     QCowHeader header;
     uint64_t ext_end;
 
+    s->has_dedup = false;
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
         goto fail;
@@ -792,13 +793,18 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     int index_in_cluster;
     int n_end;
-    int ret;
+    int ret = 0;
     int cur_nr_sectors; /* number of sectors in current iteration */
     uint64_t cluster_offset;
     QEMUIOVector hd_qiov;
     uint64_t bytes_done = 0;
     uint8_t *cluster_data = NULL;
     QCowL2Meta *l2meta;
+    uint8_t *dedup_cluster_data = NULL;
+    int dedup_cluster_data_nr;
+    int deduped_sectors_nr;
+    QCowDedupState ds;
+    bool atomic_dedup_is_running;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
                                  remaining_sectors);
@@ -809,13 +815,70 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     qemu_co_mutex_lock(&s->lock);
 
+    atomic_dedup_is_running = qcow2_dedup_is_running(bs);
+    if (atomic_dedup_is_running) {
+        QTAILQ_INIT(&ds.undedupables);
+        ds.phash.reuse = false;
+        ds.nb_undedupable_sectors = 0;
+        ds.nb_clusters_processed = 0;
+
+        /* if deduplication is on we make sure dedup_cluster_data
+         * contains a multiple of cluster size of data in order
+         * to compute the hashes
+         */
+        ret = qcow2_dedup_read_missing_and_concatenate(bs,
+                                                       qiov,
+                                                       sector_num,
+                                                       remaining_sectors,
+                                                       &dedup_cluster_data,
+                                                       &dedup_cluster_data_nr);
+
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     while (remaining_sectors != 0) {
 
         l2meta = NULL;
 
         trace_qcow2_writev_start_part(qemu_coroutine_self());
+
+        if (atomic_dedup_is_running && ds.nb_undedupable_sectors == 0) {
+            /* Try to deduplicate as much clusters as possible */
+            deduped_sectors_nr = qcow2_dedup(bs,
+                                             &ds,
+                                             sector_num,
+                                             dedup_cluster_data,
+                                             dedup_cluster_data_nr);
+
+            if (deduped_sectors_nr < 0) {
+                goto fail;
+            }
+
+            remaining_sectors -= deduped_sectors_nr;
+            sector_num += deduped_sectors_nr;
+            bytes_done += deduped_sectors_nr * 512;
+
+            /* no more data to write -> exit */
+            if (remaining_sectors <= 0) {
+                goto fail;
+            }
+
+            /* if we deduped something trace it */
+            if (deduped_sectors_nr) {
+                trace_qcow2_writev_done_part(qemu_coroutine_self(),
+                                             deduped_sectors_nr);
+                trace_qcow2_writev_start_part(qemu_coroutine_self());
+            }
+        }
+
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n_end = index_in_cluster + remaining_sectors;
+        n_end = atomic_dedup_is_running &&
+                ds.nb_undedupable_sectors < remaining_sectors ?
+                index_in_cluster + ds.nb_undedupable_sectors :
+                index_in_cluster + remaining_sectors;
+
         if (s->crypt_method &&
             n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) {
             n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
@@ -851,6 +914,24 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
                 cur_nr_sectors * 512);
         }
 
+        /* Write the non duplicated clusters hashes to disk */
+        if (atomic_dedup_is_running) {
+            int count = cur_nr_sectors / s->cluster_sectors;
+            int has_ending = ((cluster_offset >> 9) + index_in_cluster +
+                             cur_nr_sectors) & (s->cluster_sectors - 1);
+            count = index_in_cluster ? count + 1 : count;
+            count = has_ending ? count + 1 : count;
+            ret = qcow2_dedup_store_new_hashes(bs,
+                                               &ds,
+                                               count,
+                                               sector_num,
+                                               (cluster_offset >> 9));
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
         qemu_co_mutex_unlock(&s->lock);
         BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
         trace_qcow2_writev_data(qemu_coroutine_self(),
@@ -882,6 +963,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             l2meta = NULL;
         }
 
+        ds.nb_undedupable_sectors -= cur_nr_sectors;
         remaining_sectors -= cur_nr_sectors;
         sector_num += cur_nr_sectors;
         bytes_done += cur_nr_sectors * 512;
@@ -902,6 +984,7 @@ fail:
 
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
+    qemu_vfree(dedup_cluster_data);
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 24/62] qcow2: Serialize write requests when deduplication is activated.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (22 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 23/62] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 25/62] qcow2: Add verification of dedup table Benoît Canet
                   ` (38 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This fix the sub cluster sized writes race conditions while waiting
for a more faster solution.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   14 +++++++++++++-
 block/qcow2.h |    1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6b8f85f..4f8cf68 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -523,6 +523,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 
     /* Initialise locks */
     qemu_co_mutex_init(&s->lock);
+    qemu_co_mutex_init(&s->dedup_lock);
 
     /* Repair image if dirty */
     if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -814,8 +815,15 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     s->cluster_cache_offset = -1; /* disable compressed cache */
 
     qemu_co_mutex_lock(&s->lock);
-
     atomic_dedup_is_running = qcow2_dedup_is_running(bs);
+    qemu_co_mutex_unlock(&s->lock);
+
+    if (atomic_dedup_is_running) {
+        qemu_co_mutex_lock(&s->dedup_lock);
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+
     if (atomic_dedup_is_running) {
         QTAILQ_INIT(&ds.undedupables);
         ds.phash.reuse = false;
@@ -982,6 +990,10 @@ fail:
         g_free(l2meta);
     }
 
+    if (atomic_dedup_is_running) {
+        qemu_co_mutex_unlock(&s->dedup_lock);
+    }
+
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
     qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index dc9f519..9f5d0f0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -239,6 +239,7 @@ typedef struct BDRVQcowState {
     GTree *dedup_tree_by_sect;
 
     CoMutex lock;
+    CoMutex dedup_lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 25/62] qcow2: Add verification of dedup table.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (23 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 24/62] qcow2: Serialize write requests when deduplication is activated Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 26/62] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
                   ` (37 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a6719f..34a6a04 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1158,6 +1158,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         goto fail;
     }
 
+    if (s->has_dedup) {
+        ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+                                 s->dedup_table_offset, s->dedup_table_size, 0);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     /* snapshots */
     for(i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 26/62] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (24 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 25/62] qcow2: Add verification of dedup table Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 27/62] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
                   ` (36 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 34a6a04..f7a283a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1003,7 +1003,14 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         PRIx64 ": %s\n", l2_entry, strerror(-refcount));
                     goto fail;
                 }
-                if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+                if (!s->has_dedup &&
+                    (refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+                    fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
+                        PRIx64 " refcount=%d\n", l2_entry, refcount);
+                    res->corruptions++;
+                }
+                if (s->has_dedup && refcount > 1 &&
+                    ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
                     fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
                         PRIx64 " refcount=%d\n", l2_entry, refcount);
                     res->corruptions++;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 27/62] qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (25 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 26/62] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 28/62] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
                   ` (35 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |   65 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 9 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f7a283a..3077a9f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1047,6 +1047,43 @@ fail:
     return -EIO;
 }
 
+static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
+                          int64_t l2_offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t *l2_table;
+    int i, l2_size;
+
+    /* Read L2 table from disk */
+    l2_size = s->cluster_size;
+    l2_table = g_malloc(l2_size);
+
+    if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
+        goto fail;
+    }
+
+    /* Do the actual checks */
+    for (i = 0; i < (s->l2_size - 5); i += 5) {
+        uint64_t first_logical_offset = be64_to_cpu(l2_table[i + 4]) &
+                                        ~QCOW_FLAG_FIRST;
+        if (first_logical_offset > (bs->total_sectors * BDRV_SECTOR_SIZE)) {
+            fprintf(stderr, "ERROR: l2 deduplication first_logical_offset"
+                    "=%" PRIi64 " outside of deduplicated volume in l2 table "
+                    "with offset %" PRIi64 ".\n", first_logical_offset,
+                    l2_offset);
+            res->corruptions++;
+        }
+    }
+
+    g_free(l2_table);
+    return 0;
+
+fail:
+    fprintf(stderr, "ERROR: I/O error in check_dedup_l2\n");
+    g_free(l2_table);
+    return -EIO;
+}
+
 /*
  * Increases the refcount for the L1 table, its L2 tables and all referenced
  * clusters in the given refcount table. While doing so, performs some checks
@@ -1060,7 +1097,8 @@ static int check_refcounts_l1(BlockDriverState *bs,
                               uint16_t *refcount_table,
                               int refcount_table_size,
                               int64_t l1_table_offset, int l1_size,
-                              int check_copied)
+                              int check_copied,
+                              bool dedup)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table, l2_offset, l1_size2;
@@ -1116,11 +1154,19 @@ static int check_refcounts_l1(BlockDriverState *bs,
                 res->corruptions++;
             }
 
-            /* Process and check L2 entries */
-            ret = check_refcounts_l2(bs, res, refcount_table,
-                refcount_table_size, l2_offset, check_copied);
-            if (ret < 0) {
-                goto fail;
+            if (dedup) {
+                /* Process and check dedup l2 entries */
+                ret = check_dedup_l2(bs, res, l2_offset);
+                if (ret < 0) {
+                    goto fail;
+                }
+                } else {
+                /* Process and check L2 entries */
+                ret = check_refcounts_l2(bs, res, refcount_table,
+                    refcount_table_size, l2_offset, check_copied);
+                if (ret < 0) {
+                    goto fail;
+                }
             }
         }
     }
@@ -1160,14 +1206,15 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* current L1 table */
     ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-                       s->l1_table_offset, s->l1_size, 1);
+                       s->l1_table_offset, s->l1_size, 1, false);
     if (ret < 0) {
         goto fail;
     }
 
     if (s->has_dedup) {
         ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-                                 s->dedup_table_offset, s->dedup_table_size, 0);
+                                 s->dedup_table_offset, s->dedup_table_size,
+                                 0, true);
         if (ret < 0) {
             goto fail;
         }
@@ -1177,7 +1224,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     for(i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
         ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-            sn->l1_table_offset, sn->l1_size, 0);
+            sn->l1_table_offset, sn->l1_size, 0, false);
         if (ret < 0) {
             goto fail;
         }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 28/62] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (26 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 27/62] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 29/62] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
                   ` (34 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

In the case of a race condition between two writes a l2 entry can be written
without QCOW_OFLAG_COPIED before the first write fill it.
This patch simply check if the l2 entry has the correct offset without
QCOW_OFLAG_COPIED and do nothing.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fedcf57..c016e85 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -763,6 +763,11 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     for (i = 0; i < m->nb_clusters; i++) {
         uint64_t flags = 0;
         uint64_t offset = cluster_offset + (i << s->cluster_bits);
+
+        if (be64_to_cpu(l2_table[l2_index + i]) == offset) {
+            continue;
+        }
+
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 29/62] qcow2: Integrate SKEIN hash algorithm in deduplication.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (27 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 28/62] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 30/62] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
                   ` (33 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   14 ++++++++++++++
 block/qcow2.c       |    5 +++++
 configure           |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index cc99e27..50ffa54 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -30,6 +30,9 @@
 #include "block/block_int.h"
 #include "qemu-common.h"
 #include "qcow2.h"
+#ifdef CONFIG_SKEIN_DEDUP
+#include <skeinApi.h>
+#endif
 
 static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        QCowHash *hash,
@@ -208,6 +211,17 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
     case QCOW_HASH_SHA256:
         return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
                                 s->cluster_size, hash->data);
+#if defined(CONFIG_SKEIN_DEDUP)
+    case QCOW_HASH_SKEIN:
+        {
+        SkeinCtx_t ctx;
+        skeinCtxPrepare(&ctx, Skein256);
+        skeinInit(&ctx, Skein256);
+        skeinUpdate(&ctx, data, s->cluster_size);
+        skeinFinal(&ctx, hash->data);
+        }
+        return 0;
+#endif
     default:
         error_report("Invalid deduplication hash algorithm %i",
                      s->dedup_hash_algo);
diff --git a/block/qcow2.c b/block/qcow2.c
index 4f8cf68..e742e02 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1550,6 +1550,11 @@ static int8_t qcow2_get_dedup_hash_algo(char *value)
     if (!strcmp(value, "sha256")) {
         return QCOW_HASH_SHA256;
     }
+#if defined(CONFIG_SKEIN_DEDUP)
+    if (!strcmp(value, "skein")) {
+        return QCOW_HASH_SKEIN;
+    }
+#endif
 
     error_printf("Unsupported deduplication hash algorithm.\n");
     return -EINVAL;
diff --git a/configure b/configure
index 390326e..97497af 100755
--- a/configure
+++ b/configure
@@ -223,6 +223,7 @@ libiscsi=""
 coroutine=""
 seccomp=""
 glusterfs=""
+skein_dedup="no"
 
 # parse CC options first
 for opt do
@@ -882,6 +883,8 @@ for opt do
   ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
+  --enable-skein-dedup) skein_dedup="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1130,6 +1133,7 @@ echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
 echo "  --enable-glusterfs       enable GlusterFS backend"
 echo "  --disable-glusterfs      disable GlusterFS backend"
+echo "  --enable-skein-dedup     enable computing dedup hashes with SKEIN"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2412,6 +2416,30 @@ EOF
   fi
 fi
 
+##########################################
+# SKEIN dedup hash function probe
+if test "$skein_dedup" != "no" ; then
+  cat > $TMPC <<EOF
+#include <skeinApi.h>
+int main(void) {
+    SkeinCtx_t ctx;
+    skeinCtxPrepare(&ctx, 512);
+    return 0;
+}
+EOF
+  skein_libs="-lskein3fish"
+  if compile_prog "" "$skein_libs" ; then
+    skein_dedup=yes
+    libs_tools="$skein_libs $libs_tools"
+    libs_softmmu="$skein_libs $libs_softmmu"
+  else
+    if test "$skein_dedup" = "yes" ; then
+      feature_not_found "libskein3fish not found"
+    fi
+    skein_dedup=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3296,6 +3324,7 @@ echo "build guest agent $guest_agent"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine_backend"
 echo "GlusterFS support $glusterfs"
+echo "SKEIN support     $skein_dedup"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3637,6 +3666,10 @@ if test "$glusterfs" = "yes" ; then
   echo "CONFIG_GLUSTERFS=y" >> $config_host_mak
 fi
 
+if test "$skein_dedup" = "yes" ; then
+  echo "CONFIG_SKEIN_DEDUP=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 30/62] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (28 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 29/62] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 31/62] qcow2: Use large L2 table for deduplication Benoît Canet
                   ` (32 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index e742e02..7ef9170 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1616,6 +1616,7 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
                 return hash_algo;
             }
             dedup = true;
+            flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
         }
         options++;
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 31/62] qcow2: Use large L2 table for deduplication.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (29 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 30/62] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 32/62] qcow: Set large dedup hash block size Benoît Canet
                   ` (31 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c  |    2 +-
 block/qcow2-refcount.c |   22 +++++++++++++++-------
 block/qcow2.c          |    8 ++++++--
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c016e85..8ad4740 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -236,7 +236,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
             goto fail;
         }
 
-        memcpy(l2_table, old_table, s->cluster_size);
+        memcpy(l2_table, old_table, s->l2_size << 3);
 
         ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
         if (ret < 0) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3077a9f..f305510 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -536,12 +536,15 @@ fail:
  */
 static int update_cluster_refcount(BlockDriverState *bs,
                                    int64_t cluster_index,
-                                   int addend)
+                                   int addend,
+                                   bool is_l2)
 {
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend,
+    int size = is_l2 ? s->l2_size << 3 : 1;
+
+    ret = update_refcount(bs, cluster_index << s->cluster_bits, size, addend,
                           false);
     if (ret < 0) {
         return ret;
@@ -666,7 +669,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if (free_in_cluster == 0)
             s->free_byte_offset = 0;
         if ((offset & (s->cluster_size - 1)) != 0)
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
     } else {
         offset = qcow2_alloc_clusters(bs, s->cluster_size);
         if (offset < 0) {
@@ -676,7 +679,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if ((cluster_offset + s->cluster_size) == offset) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
             s->free_byte_offset += size;
         } else {
             s->free_byte_offset = offset;
@@ -817,7 +820,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     } else {
                         uint64_t cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
                         if (addend != 0) {
-                            refcount = update_cluster_refcount(bs, cluster_index, addend);
+                            refcount = update_cluster_refcount(bs,
+                                                               cluster_index,
+                                                               addend,
+                                                               false);
                         } else {
                             refcount = get_refcount(bs, cluster_index);
                         }
@@ -849,7 +855,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 
             if (addend != 0) {
-                refcount = update_cluster_refcount(bs, l2_offset >> s->cluster_bits, addend);
+                refcount = update_cluster_refcount(bs,
+                                                   l2_offset >> s->cluster_bits,
+                                                   addend, true);
             } else {
                 refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
             }
@@ -1145,7 +1153,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
             /* Mark L2 table as used */
             l2_offset &= L1E_OFFSET_MASK;
             inc_refcounts(bs, res, refcount_table, refcount_table_size,
-                l2_offset, s->cluster_size);
+                l2_offset, s->l2_size << 3);
 
             /* L2 tables are cluster aligned */
             if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 7ef9170..f70c24b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -432,7 +432,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->cluster_bits = header.cluster_bits;
     s->cluster_size = 1 << s->cluster_bits;
     s->cluster_sectors = 1 << (s->cluster_bits - 9);
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+        s->l2_bits = 17; /* 64 * 16 KB L2 to compensate smaller cluster size */
+    } else {
+        s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    }
     s->l2_size = 1 << s->l2_bits;
     bs->total_sectors = header.size / 512;
     s->csize_shift = (62 - (s->cluster_bits - 8));
@@ -469,7 +473,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->l2_size << 3);
     s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
                                                  s->cluster_size);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 32/62] qcow: Set large dedup hash block size.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (30 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 31/62] qcow2: Use large L2 table for deduplication Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 33/62] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
                   ` (30 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    4 ++--
 block/qcow2.c          |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f305510..348342a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1063,7 +1063,7 @@ static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size;
 
     /* Read L2 table from disk */
-    l2_size = s->cluster_size;
+    l2_size = s->hash_block_size;
     l2_table = g_malloc(l2_size);
 
     if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
@@ -1153,7 +1153,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
             /* Mark L2 table as used */
             l2_offset &= L1E_OFFSET_MASK;
             inc_refcounts(bs, res, refcount_table, refcount_table_size,
-                l2_offset, s->l2_size << 3);
+                l2_offset, dedup ? s->hash_block_size : s->l2_size << 3);
 
             /* L2 tables are cluster aligned */
             if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index f70c24b..bd7579a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -434,6 +434,8 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->cluster_sectors = 1 << (s->cluster_bits - 9);
     if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
         s->l2_bits = 17; /* 64 * 16 KB L2 to compensate smaller cluster size */
+        s->l2_bits = 16 - 3; /* 64 KB L2 */
+        s->hash_block_size = DEFAULT_CLUSTER_SIZE * 5;
     } else {
         s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 33/62] qemu-iotests: Filter dedup=on/off so existing tests don't break.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (31 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 32/62] qcow: Set large dedup hash block size Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 34/62] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
                   ` (29 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 tests/qemu-iotests/common.rc |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index aef5f52..72e746d 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,7 +124,8 @@ _make_test_img()
             -e "s# compat='[^']*'##g" \
             -e "s# compat6=\\(on\\|off\\)##g" \
             -e "s# static=\\(on\\|off\\)##g" \
-            -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+            -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+            -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
 
     # Start an NBD server on the image file, which is what we'll be talking to
     if [ $IMGPROTO = "nbd" ]; then
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 34/62] qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (32 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 33/62] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 35/62] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
                   ` (28 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |    2 ++
 2 files changed, 99 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 50ffa54..35fcc01 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1019,3 +1019,100 @@ bool qcow2_dedup_is_running(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
 }
+
+static gint qcow2_dedup_compare_by_hash(gconstpointer a,
+                                        gconstpointer b,
+                                        gpointer data)
+{
+    QCowHash *hash_a = (QCowHash *) a;
+    QCowHash *hash_b = (QCowHash *) b;
+    return memcmp(hash_a->data, hash_b->data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_destroy_qcow_hash_node(gpointer p)
+{
+    QCowHashNode *hash_node = (QCowHashNode *) p;
+    g_free(hash_node);
+}
+
+static gint qcow2_dedup_compare_by_offset(gconstpointer a,
+                                          gconstpointer b,
+                                          gpointer data)
+{
+    uint64_t offset_a = *((uint64_t *) a);
+    uint64_t offset_b = *((uint64_t *) b);
+
+    if (offset_a > offset_b) {
+        return 1;
+    }
+    if (offset_a < offset_b) {
+        return -1;
+    }
+    return 0;
+}
+
+static int qcow2_dedup_alloc(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+
+    ret = qcow2_do_table_init(bs,
+                              &s->dedup_table,
+                              s->dedup_table_offset,
+                              s->dedup_table_size,
+                              false);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    s->dedup_tree_by_hash = g_tree_new_full(qcow2_dedup_compare_by_hash, NULL,
+                                            NULL,
+                                            qcow2_dedup_destroy_qcow_hash_node);
+    s->dedup_tree_by_sect = g_tree_new_full(qcow2_dedup_compare_by_offset,
+                                              NULL, NULL, NULL);
+
+    s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+                                                s->hash_block_size);
+
+    return 0;
+}
+
+static void qcow2_dedup_free(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    g_free(s->dedup_table);
+
+    qcow2_cache_flush(bs, s->dedup_cluster_cache);
+    qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+    g_tree_destroy(s->dedup_tree_by_sect);
+    g_tree_destroy(s->dedup_tree_by_hash);
+}
+
+int qcow2_dedup_init(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    s->has_dedup = true;
+
+    ret = qcow2_dedup_alloc(bs);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if we are read-only we don't load the deduplication table */
+    if (bs->read_only) {
+        return 0;
+    }
+
+    s->dedup_status = QCOW_DEDUP_STARTING;
+
+    return 0;
+}
+
+void qcow2_dedup_close(BlockDriverState *bs)
+{
+    qcow2_dedup_free(bs);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 9f5d0f0..29267a9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -507,5 +507,7 @@ void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
 bool qcow2_dedup_is_running(BlockDriverState *bs);
+int qcow2_dedup_init(BlockDriverState *bs);
+void qcow2_dedup_close(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 35/62] qcow2: Add qcow2_co_dedup_resume to restart deduplication.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (33 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 34/62] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 36/62] qcow2: Enable the deduplication feature Benoît Canet
                   ` (27 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 35fcc01..6cd1af4 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -34,6 +34,7 @@
 #include <skeinApi.h>
 #endif
 
+static void qcow2_dedup_reset(BlockDriverState *bs);
 static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        QCowHash *hash,
                                        uint64_t *first_logical_sect,
@@ -1020,6 +1021,175 @@ bool qcow2_dedup_is_running(BlockDriverState *bs)
     return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
 }
 
+static bool hash_is_null(QCowHash *hash)
+{
+    QCowHash null_hash;
+    memset(&null_hash.data, 0, HASH_LENGTH);
+    return !memcmp(hash->data, null_hash.data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_insert_hash_node(BlockDriverState *bs,
+                                         QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+    g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+}
+
+/* This load the QCowHashNode corresponding to a given cluster index into ram
+ *
+ * @index: index of the given physical sector
+ * @ret:   0 on succes, negative on error
+ */
+static int qcow2_load_cluster_hash(BlockDriverState *bs,
+                                   uint64_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    QCowHash hash;
+    uint64_t first_logical_sect;
+    QCowHashNode *hash_node;
+
+    /* get the hash */
+    ret = qcow2_dedup_read_write_hash(bs, &hash,
+                                      &first_logical_sect,
+                                      index * s->cluster_sectors,
+                                      false);
+
+    if (ret < 0) {
+        error_report("Failed to load deduplication hash.");
+        return ret;
+    }
+
+    /* if the hash is null don't load it */
+    if (hash_is_null(&hash)) {
+        return ret;
+    }
+
+    hash_node = qcow2_dedup_build_qcow_hash_node(&hash,
+                                                 index * s->cluster_sectors,
+                                                 first_logical_sect);
+    qcow2_dedup_insert_hash_node(bs, hash_node);
+
+    return 0;
+}
+
+/* Load all the actives hashes into RAM
+ *
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_load_valid_hashes(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t max_clusters, i;
+    int nb_hash_in_hash_block = s->hash_block_size / (HASH_LENGTH + 8);
+    int ret = 0;
+
+    max_clusters = s->dedup_table_size * nb_hash_in_hash_block;
+
+    /* load all the hash stored to disk in memory */
+    for (i = 0; i < max_clusters; i++) {
+        if (!(i % nb_hash_in_hash_block)) {
+            co_sleep_ns(rt_clock, s->dedup_co_delay);
+        }
+        qemu_co_mutex_lock(&s->lock);
+        ret = qcow2_load_cluster_hash(bs, i);
+        qemu_co_mutex_unlock(&s->lock);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static int qcow2_drop_to_dedup_stale_hash(BlockDriverState *bs,
+                                          uint64_t index)
+{
+    int ret = 0;
+    bool to_dedup;
+    uint64_t physical_sect;
+
+    to_dedup = qcow2_is_cluster_to_dedup(bs, index, &physical_sect, &ret);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (!to_dedup) {
+        return 0;
+    }
+
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+    return 0;
+}
+
+/* For each l2 entry marked as QCOW_OFLAG_TO_DEDUP drop the obsolete hash
+ * from the trees
+ *
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_drop_to_dedup_hashes(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t i;
+    int ret = 0;
+
+    /* for each l2 entry */
+    for (i = 0; i < s->l2_size * s->l1_size; i++) {
+        if (!(i % s->l2_size)) {
+            co_sleep_ns(rt_clock, s->dedup_co_delay);
+        }
+        qemu_co_mutex_lock(&s->lock);
+        ret = qcow2_drop_to_dedup_stale_hash(bs, i);
+        qemu_co_mutex_unlock(&s->lock);
+
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * This coroutine resume deduplication
+ *
+ * @data: the given BlockDriverState
+ * @ret:  NULL
+ */
+static void coroutine_fn qcow2_co_dedup_resume(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    ret = qcow2_load_valid_hashes(bs);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = qcow2_drop_to_dedup_hashes(bs);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+    s->dedup_status = QCOW_DEDUP_STARTED;
+    qemu_co_mutex_unlock(&s->lock);
+
+    return;
+
+fail:
+    qemu_co_mutex_lock(&s->lock);
+    s->dedup_status = QCOW_DEDUP_STOPPED;
+    qcow2_dedup_reset(bs);
+    qemu_co_mutex_unlock(&s->lock);
+}
+
 static gint qcow2_dedup_compare_by_hash(gconstpointer a,
                                         gconstpointer b,
                                         gpointer data)
@@ -1089,6 +1259,12 @@ static void qcow2_dedup_free(BlockDriverState *bs)
     g_tree_destroy(s->dedup_tree_by_hash);
 }
 
+static void qcow2_dedup_reset(BlockDriverState *bs)
+{
+    qcow2_dedup_free(bs);
+    qcow2_dedup_alloc(bs);
+}
+
 int qcow2_dedup_init(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1109,6 +1285,10 @@ int qcow2_dedup_init(BlockDriverState *bs)
 
     s->dedup_status = QCOW_DEDUP_STARTING;
 
+    /* resume deduplication */
+    s->dedup_resume_co = qemu_coroutine_create(qcow2_co_dedup_resume);
+    qemu_coroutine_enter(s->dedup_resume_co, bs);
+
     return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 36/62] qcow2: Enable the deduplication feature.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (34 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 35/62] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 37/62] qcow2: Add deduplication metrics structures Benoît Canet
                   ` (26 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index bd7579a..753fce0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -542,6 +542,13 @@ static int qcow2_open(BlockDriverState *bs, int flags)
         }
     }
 
+    if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -1011,11 +1018,11 @@ fail:
 static void qcow2_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
+
     g_free(s->l1_table);
 
     if (s->has_dedup) {
-        qcow2_cache_flush(bs, s->dedup_cluster_cache);
-        qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+        qcow2_dedup_close(bs);
     }
 
     qcow2_cache_flush(bs, s->l2_table_cache);
@@ -1509,8 +1516,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         }
 
         /* minimal init */
-        s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
-                                                    s->hash_block_size);
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto out;
+        }
     }
 
     /* Want a backing file? There you go.*/
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 37/62] qcow2: Add deduplication metrics structures.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (35 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 36/62] qcow2: Enable the deduplication feature Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 38/62] qcow2: Initialize deduplication metrics Benoît Canet
                   ` (25 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2.h         |    3 ++-
 include/block/block.h |   11 +++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 29267a9..0729ff2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -237,9 +237,10 @@ typedef struct BDRVQcowState {
     int32_t dedup_table_size;
     GTree *dedup_tree_by_hash;
     GTree *dedup_tree_by_sect;
+    CoMutex dedup_lock;
+    BlockDeduplicationMetrics dedup_metrics;
 
     CoMutex lock;
-    CoMutex dedup_lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
diff --git a/include/block/block.h b/include/block/block.h
index b81d200..16e1cf1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -12,6 +12,17 @@
 typedef struct BlockDriver BlockDriver;
 typedef struct BlockJob BlockJob;
 
+typedef struct {
+    uint64_t deduplicated_clusters;
+    uint64_t non_deduplicated_clusters;
+    uint64_t missing_data_reads; /* reads used to complete partials clusters */
+    uint64_t ram_hash_creations;     /* RAM based lookup */
+    uint64_t ram_hash_deletions;     /* RAM based lookup */
+    uint64_t ram_usage;              /* RAM usage in bytes */
+    uint64_t deleted_clusters;       /* number of deleted clusters */
+    uint64_t refcount_overflows;     /* number of refcount overflows */
+} BlockDeduplicationMetrics;
+
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
     int cluster_size;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 38/62] qcow2: Initialize deduplication metrics.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (36 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 37/62] qcow2: Add deduplication metrics structures Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 39/62] qcow2: Collect unaligned writes missing data reads metric Benoît Canet
                   ` (24 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 6cd1af4..997714b 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1272,6 +1272,8 @@ int qcow2_dedup_init(BlockDriverState *bs)
 
     s->has_dedup = true;
 
+    memset(&s->dedup_metrics, 0, sizeof(s->dedup_metrics));
+
     ret = qcow2_dedup_alloc(bs);
 
     if (ret < 0) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 39/62] qcow2: Collect unaligned writes missing data reads metric.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (37 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 38/62] qcow2: Initialize deduplication metrics Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 40/62] qcow2: Collect deduplicated cluster metric Benoît Canet
                   ` (23 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 997714b..e4920d4 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -130,6 +130,7 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
 
     /* read beginning */
     if (cluster_beginning_nr) {
+        s->dedup_metrics.missing_data_reads++;
         ret = qcow2_read_cluster_data(bs,
                                       *data,
                                       cluster_beginning_sector,
@@ -153,6 +154,7 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
 
     /* read and add ending */
     if (cluster_ending_nr) {
+        s->dedup_metrics.missing_data_reads++;
         ret = qcow2_read_cluster_data(bs,
                                       *data +
                                       (cluster_beginning_nr +
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 40/62] qcow2: Collect deduplicated cluster metric.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (38 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 39/62] qcow2: Collect unaligned writes missing data reads metric Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 41/62] qcow2: Collect undeduplicated " Benoît Canet
                   ` (22 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index e4920d4..716371c 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -400,10 +400,14 @@ static int qcow2_deduplicate_cluster(BlockDriverState *bs,
     }
 
     /* Increment the refcount of the cluster */
-    return update_refcount(bs,
-                           (hash_node->physical_sect /
-                            s->cluster_sectors) << s->cluster_bits,
-                            1, 1, true);
+    ret = update_refcount(bs,
+                          (hash_node->physical_sect /
+                          s->cluster_sectors) << s->cluster_bits,
+                          1, 1, true);
+
+    s->dedup_metrics.deduplicated_clusters++;
+
+    return ret;
 }
 
 /* This function tries to deduplicate a given cluster.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 41/62] qcow2: Collect undeduplicated cluster metric.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (39 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 40/62] qcow2: Collect deduplicated cluster metric Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 42/62] qcow2: Count QCowHashNode creation metrics Benoît Canet
                   ` (21 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 716371c..0f095a9 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -524,6 +524,7 @@ static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
                                                    uint8_t *data,
                                                    int left_to_process)
 {
+    BDRVQcowState *s = bs->opaque;
     int i;
     int ret = 0;
     QCowHashNode *hash_node;
@@ -546,6 +547,7 @@ static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
 
         qcow2_build_and_insert_hash_node(bs, &ds->phash.hash);
         add_hash_to_undedupable_list(bs, ds);
+        s->dedup_metrics.non_deduplicated_clusters++;
     }
 
     return i;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 42/62] qcow2: Count QCowHashNode creation metrics.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (40 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 41/62] qcow2: Collect undeduplicated " Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 43/62] qcow2: Count QCowHashNode removal from tree for metrics Benoît Canet
                   ` (20 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 0f095a9..d22e2a4 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -289,6 +289,7 @@ static void qcow2_build_and_insert_hash_node(BlockDriverState *bs,
                                                  QCOW_FLAG_EMPTY,
                                                  QCOW_FLAG_EMPTY);
     g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+    s->dedup_metrics.ram_hash_creations++;
 }
 
 /*
@@ -1043,6 +1044,7 @@ static void qcow2_dedup_insert_hash_node(BlockDriverState *bs,
 
     g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
     g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+    s->dedup_metrics.ram_hash_creations++;
 }
 
 /* This load the QCowHashNode corresponding to a given cluster index into ram
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 43/62] qcow2: Count QCowHashNode removal from tree for metrics.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (41 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 42/62] qcow2: Count QCowHashNode creation metrics Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 44/62] qcow2: Count cluster deleted metric Benoît Canet
                   ` (19 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index d22e2a4..64e5a13 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -841,6 +841,7 @@ static void qcow2_remove_hash_node(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
     g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+    s->dedup_metrics.ram_hash_deletions++;
 }
 
 /* This function removes a hash_node from the trees given a physical sector
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 44/62] qcow2: Count cluster deleted metric
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (42 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 43/62] qcow2: Count QCowHashNode removal from tree for metrics Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 45/62] qcow2: Count deduplication refcount overflow metric Benoît Canet
                   ` (18 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 64e5a13..4a1b184 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -991,6 +991,7 @@ void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
 
     /* remove from ram if present so we won't dedup with it anymore */
     qcow2_remove_hash_node_by_sector(bs, physical_sect);
+    s->dedup_metrics.deleted_clusters++;
 }
 
 /* Force to use a new physical cluster and QCowHashNode when the refcount pass
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 45/62] qcow2: Count deduplication refcount overflow metric.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (43 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 44/62] qcow2: Count cluster deleted metric Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 46/62] qapi: Add support for deduplication infos in qapi-schema.json Benoît Canet
                   ` (17 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4a1b184..db23b71 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1024,6 +1024,7 @@ void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
 
     /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
     qcow2_remove_hash_node(bs, hash_node);
+    s->dedup_metrics.refcount_overflows++;
 }
 
 bool qcow2_dedup_is_running(BlockDriverState *bs)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 46/62] qapi: Add support for deduplication infos in qapi-schema.json.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (44 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 45/62] qcow2: Count deduplication refcount overflow metric Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 47/62] block: Add deduplication metrics to BlockDriverInfo Benoît Canet
                   ` (16 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 qapi-schema.json |   40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 5dfa052..1a5014c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,6 +720,40 @@
 { 'command': 'query-block', 'returns': ['BlockInfo'] }
 
 ##
+# @BlockDeviceDedupInfo
+#
+# Statistics of the deduplication on a virtual block device implementing it
+# since QEMU startup.
+#
+# @deduplicated-clusters:     Number of clusters which where deduplicated.
+#
+# @non-deduplicated-clusters: Number of clusters which where not deduplicated.
+#
+# @missing-data-reads:        Number of reads which where done to complete
+#                             unaligned or sub cluster sized writes.
+#
+# @ram-hash-creations:        Number of cluster hash created in RAM.
+#
+# @ram-hash-deletions:        Number of cluster hash deleted in RAM.
+#
+# @ram-usage:                 Number of bytes of RAM used.
+#
+# @deleted-clusters:          Number of deleted cluster when refcount < 0
+#
+# @refcount-overflows:        Number of refcount overflows
+#
+# @running:                   True if deduplication is running
+#
+# Since: 1.5.0
+##
+{ 'type': 'BlockDeviceDedupInfo',
+  'data': {'deduplicated-clusters': 'int', 'non-deduplicated-clusters': 'int',
+           'missing-data-reads': 'int', 'ram-hash-creations': 'int',
+           'ram-hash-deletions': 'int', 'ram-usage': 'int',
+           'deleted-clusters': 'int', 'refcount-overflows': 'int',
+           'running': 'bool' } }
+
+##
 # @BlockDeviceStats:
 #
 # Statistics of a virtual block device or a block backing device.
@@ -747,13 +781,17 @@
 #                     growable sparse files (like qcow2) that are used on top
 #                     of a physical device.
 #
+# @deduplication: #optional @BlockDeviceDedupInfo describing deduplication
+#                           metrics (since 1.5)
+#
 # Since: 0.14.0
 ##
 { 'type': 'BlockDeviceStats',
   'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
            'wr_operations': 'int', 'flush_operations': 'int',
            'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-           'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } }
+           'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
+           '*deduplication': 'BlockDeviceDedupInfo' } }
 
 ##
 # @BlockStats:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 47/62] block: Add deduplication metrics to BlockDriverInfo.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (45 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 46/62] qapi: Add support for deduplication infos in qapi-schema.json Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 48/62] qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage Benoît Canet
                   ` (15 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 include/block/block.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 16e1cf1..2043560 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -29,6 +29,9 @@ typedef struct BlockDriverInfo {
     /* offset at which the VM state can be saved (0 if not possible) */
     int64_t vm_state_offset;
     bool is_dirty;
+    bool has_dedup;
+    bool dedup_running;
+    BlockDeduplicationMetrics dedup_metrics;
 } BlockDriverInfo;
 
 typedef struct BlockFragInfo {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 48/62] qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (46 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 47/62] block: Add deduplication metrics to BlockDriverInfo Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 49/62] qcow2: returns deduplication metrics and status via bdrv_get_info() Benoît Canet
                   ` (14 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |   13 +++++++++++++
 block/qcow2.h       |    1 +
 2 files changed, 14 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index db23b71..4305746 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1311,3 +1311,16 @@ void qcow2_dedup_close(BlockDriverState *bs)
 {
     qcow2_dedup_free(bs);
 }
+
+#define GTREE_NODE_SIZE sizeof(int) * 5
+
+void qcow2_dedup_update_metrics(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    uint64_t nb_hashs = s->dedup_metrics.ram_hash_creations -
+                        s->dedup_metrics.ram_hash_deletions;
+
+    s->dedup_metrics.ram_usage = nb_hashs * GTREE_NODE_SIZE * 2;
+    s->dedup_metrics.ram_usage += nb_hashs * sizeof(QCowHashNode);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 0729ff2..d8e8539 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,5 +510,6 @@ void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
 bool qcow2_dedup_is_running(BlockDriverState *bs);
 int qcow2_dedup_init(BlockDriverState *bs);
 void qcow2_dedup_close(BlockDriverState *bs);
+void qcow2_dedup_update_metrics(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 49/62] qcow2: returns deduplication metrics and status via bdrv_get_info()
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (47 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 48/62] qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 50/62] qapi: Return virtual block device deduplication metrics in QMP Benoît Canet
                   ` (13 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 753fce0..e442268 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1868,6 +1868,10 @@ static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
     BDRVQcowState *s = bs->opaque;
     bdi->cluster_size = s->cluster_size;
     bdi->vm_state_offset = qcow2_vm_state_offset(s);
+    bdi->has_dedup = s->has_dedup;
+    bdi->dedup_running = s->dedup_status == QCOW_DEDUP_STARTED;
+    qcow2_dedup_update_metrics(bs);
+    memcpy(&bdi->dedup_metrics, &s->dedup_metrics, sizeof(bdi->dedup_metrics));
     return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 50/62] qapi: Return virtual block device deduplication metrics in QMP
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (48 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 49/62] qcow2: returns deduplication metrics and status via bdrv_get_info() Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 51/62] block: Add BlockDriver function prototype to pause and resume deduplication Benoît Canet
                   ` (12 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/block.c b/block.c
index 4e28c55..a245653 100644
--- a/block.c
+++ b/block.c
@@ -2921,6 +2921,40 @@ BlockInfoList *qmp_query_block(Error **errp)
     return head;
 }
 
+static void bdrv_get_dedup_metrics(const BlockDriverState *bs,
+                                   BlockDeviceStats *stats)
+{
+    BlockDriverInfo bdi;
+
+    if (bdrv_get_info((BlockDriverState *) bs, &bdi) < 0) {
+        return;
+    }
+
+    if (!bdi.has_dedup) {
+        return;
+    }
+
+    stats->has_deduplication = true;
+    stats->deduplication = g_malloc0(sizeof(*stats->deduplication));
+    stats->deduplication->deduplicated_clusters =
+        bdi.dedup_metrics.deduplicated_clusters;
+    stats->deduplication->non_deduplicated_clusters =
+        bdi.dedup_metrics.non_deduplicated_clusters;
+    stats->deduplication->missing_data_reads =
+        bdi.dedup_metrics.missing_data_reads;
+    stats->deduplication->ram_hash_creations =
+        bdi.dedup_metrics.ram_hash_creations;
+    stats->deduplication->ram_hash_deletions =
+        bdi.dedup_metrics.ram_hash_deletions;
+    stats->deduplication->ram_usage =
+        bdi.dedup_metrics.ram_usage;
+    stats->deduplication->deleted_clusters =
+        bdi.dedup_metrics.deleted_clusters;
+    stats->deduplication->refcount_overflows =
+        bdi.dedup_metrics.refcount_overflows;
+    stats->deduplication->running = bdi.dedup_running;
+}
+
 BlockStats *bdrv_query_stats(const BlockDriverState *bs)
 {
     BlockStats *s;
@@ -2943,6 +2977,8 @@ BlockStats *bdrv_query_stats(const BlockDriverState *bs)
     s->stats->rd_total_time_ns = bs->total_time_ns[BDRV_ACCT_READ];
     s->stats->flush_total_time_ns = bs->total_time_ns[BDRV_ACCT_FLUSH];
 
+    bdrv_get_dedup_metrics(bs, s->stats);
+
     if (bs->file) {
         s->has_parent = true;
         s->parent = bdrv_query_stats(bs->file);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 51/62] block: Add BlockDriver function prototype to pause and resume deduplication.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (49 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 50/62] qapi: Return virtual block device deduplication metrics in QMP Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 52/62] qcow2: Add code to deduplicate cluster flagged with QCOW_OFLAG_TO_DEDUP Benoît Canet
                   ` (11 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 include/block/block_int.h |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index b7ed3e6..bb35df9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -203,6 +203,10 @@ struct BlockDriver {
      */
     int (*bdrv_has_zero_init)(BlockDriverState *bs);
 
+    /* to pause and resume deduplication (mainly qcow2) */
+    void (*bdrv_resume_dedup)(BlockDriverState *bs);
+    void (*bdrv_pause_dedup)(BlockDriverState *bs);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 52/62] qcow2: Add code to deduplicate cluster flagged with QCOW_OFLAG_TO_DEDUP.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (50 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 51/62] block: Add BlockDriver function prototype to pause and resume deduplication Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 53/62] block: Add bdrv_has_dedup Benoît Canet
                   ` (10 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4305746..dd320bf 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1166,6 +1166,130 @@ static int qcow2_drop_to_dedup_hashes(BlockDriverState *bs)
     return 0;
 }
 
+static bool qcow2_try_dedup_on_disk_cluster(BlockDriverState *bs,
+                                            QcowPersistantHash *phash,
+                                            uint64_t index,
+                                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint8_t *data;
+    int ret = 0;
+    bool result = false;
+
+    data = qemu_blockalign(bs, s->cluster_size);
+
+    /* read the cluster data from disk */
+    ret = bdrv_pread(bs->file, physical_sect << 9, data, s->cluster_size);
+
+    if (ret < 0) {
+        goto exit;
+    }
+
+    /* force computation of the hash */
+    phash->reuse = false;
+
+    ret = qcow2_try_dedup_cluster(bs,
+                                  phash,
+                                  index * s->cluster_sectors,
+                                  data,
+                                  0);
+
+    if (ret < 0) {
+        goto exit;
+    }
+
+    /* cluster was deduplicated -> result is true */
+    if (ret) {
+        result = true;
+    }
+
+exit:
+   qemu_vfree(data);
+   return result;
+}
+
+static bool qcow2_process_undeduplicated_cluster(BlockDriverState *bs,
+                                                 QcowPersistantHash *phash,
+                                                 uint64_t index,
+                                                 uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* not duplicated */
+    ret = qcow2_store_hash(bs, &phash->hash,
+                           index * s->cluster_sectors,
+                           physical_sect);
+
+    if (ret < 0) {
+        error_printf("Error while storing hash");
+        return false;
+    }
+
+    /* remove the QCOW_OFLAG_TO_DEDUP flag from l2 entry
+     * note: we should take care of setting QCOW_OFLAG_COPIED if needed
+     */
+    ret = qcow2_dedup_link_l2(bs, index * s->cluster_sectors,
+                              physical_sect, true);
+
+    return ret == 0 ? true : false;
+}
+
+static bool qcow2_process_to_dedup_cluster(BlockDriverState *bs,
+                                           uint64_t index)
+{
+    QcowPersistantHash phash;
+    bool to_dedup, deduplicated;
+    uint64_t physical_sect;
+    int ret = 0;
+
+    to_dedup = qcow2_is_cluster_to_dedup(bs, index, &physical_sect, &ret);
+
+    if (ret < 0) {
+        error_printf("Error checking if cluster must be deduplicated");
+        return false;
+    }
+
+    if (!to_dedup) {
+        return false;
+    }
+
+    deduplicated = qcow2_try_dedup_on_disk_cluster(bs,
+                                                   &phash,
+                                                   index,
+                                                   physical_sect);
+
+    if (deduplicated) {
+        return true;
+    }
+
+    qcow2_process_undeduplicated_cluster(bs,
+                                         &phash,
+                                         index,
+                                         physical_sect);
+
+    return true;
+}
+
+/* This function try to deduplicate clusters written when dedup was not running.
+ */
+static void qcow2_deduplicate_after_resuming(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t i;
+    bool processed;
+
+    /* for each l2 entry */
+    for (i = 0; i < s->l2_size * s->l1_size; i++) {
+        qemu_co_mutex_lock(&s->lock);
+        processed = qcow2_process_to_dedup_cluster(bs, i);
+        qemu_co_mutex_unlock(&s->lock);
+        if (processed || !(i % s->l2_size)) {
+            co_sleep_ns(rt_clock, s->dedup_co_delay);
+        }
+    }
+}
+
 /*
  * This coroutine resume deduplication
  *
@@ -1194,6 +1318,8 @@ static void coroutine_fn qcow2_co_dedup_resume(void *opaque)
     s->dedup_status = QCOW_DEDUP_STARTED;
     qemu_co_mutex_unlock(&s->lock);
 
+    qcow2_deduplicate_after_resuming(bs);
+
     return;
 
 fail:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 53/62] block: Add bdrv_has_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (51 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 52/62] qcow2: Add code to deduplicate cluster flagged with QCOW_OFLAG_TO_DEDUP Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 54/62] block: Add bdrv_is_dedup_running Benoît Canet
                   ` (9 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block.c               |   15 +++++++++++++++
 include/block/block.h |    1 +
 2 files changed, 16 insertions(+)

diff --git a/block.c b/block.c
index a245653..aee33e0 100644
--- a/block.c
+++ b/block.c
@@ -4327,6 +4327,21 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked)
     }
 }
 
+/* Return true if the device has deduplication */
+bool bdrv_has_dedup(BlockDriverState *bs)
+{
+    BlockDriverInfo bdi;
+    int ret = 0;
+
+    ret = bdrv_get_info((BlockDriverState *) bs, &bdi);
+
+    if (ret < 0) {
+        return false;
+    }
+
+    return bdi.has_dedup;
+}
+
 /* needed for generic scsi interface */
 
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
diff --git a/include/block/block.h b/include/block/block.h
index 2043560..e6f86ac 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -306,6 +306,7 @@ void bdrv_set_enable_write_cache(BlockDriverState *bs, bool wce);
 int bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 void bdrv_lock_medium(BlockDriverState *bs, bool locked);
+bool bdrv_has_dedup(BlockDriverState *bs);
 void bdrv_eject(BlockDriverState *bs, bool eject_flag);
 const char *bdrv_get_format_name(BlockDriverState *bs);
 BlockDriverState *bdrv_find(const char *name);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 54/62] block: Add bdrv_is_dedup_running.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (52 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 53/62] block: Add bdrv_has_dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 55/62] block: Add bdrv_resume_dedup Benoît Canet
                   ` (8 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block.c               |   19 +++++++++++++++++++
 include/block/block.h |    1 +
 2 files changed, 20 insertions(+)

diff --git a/block.c b/block.c
index aee33e0..e83666a 100644
--- a/block.c
+++ b/block.c
@@ -4342,6 +4342,25 @@ bool bdrv_has_dedup(BlockDriverState *bs)
     return bdi.has_dedup;
 }
 
+/* Return true if the device has deduplication and it's running */
+bool bdrv_is_dedup_running(BlockDriverState *bs)
+{
+    BlockDriverInfo bdi;
+    int ret = 0;
+
+    ret = bdrv_get_info((BlockDriverState *) bs, &bdi);
+
+    if (ret < 0) {
+        return false;
+    }
+
+    if (!bdi.has_dedup) {
+        return false;
+    }
+
+    return bdi.dedup_running;
+}
+
 /* needed for generic scsi interface */
 
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
diff --git a/include/block/block.h b/include/block/block.h
index e6f86ac..f4e1d2a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -307,6 +307,7 @@ int bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 void bdrv_lock_medium(BlockDriverState *bs, bool locked);
 bool bdrv_has_dedup(BlockDriverState *bs);
+bool bdrv_is_dedup_running(BlockDriverState *bs);
 void bdrv_eject(BlockDriverState *bs, bool eject_flag);
 const char *bdrv_get_format_name(BlockDriverState *bs);
 BlockDriverState *bdrv_find(const char *name);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 55/62] block: Add bdrv_resume_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (53 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 54/62] block: Add bdrv_is_dedup_running Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 56/62] block: Add bdrv_pause_dedup Benoît Canet
                   ` (7 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block.c               |   19 +++++++++++++++++++
 include/block/block.h |    1 +
 2 files changed, 20 insertions(+)

diff --git a/block.c b/block.c
index e83666a..4e80da8 100644
--- a/block.c
+++ b/block.c
@@ -4361,6 +4361,25 @@ bool bdrv_is_dedup_running(BlockDriverState *bs)
     return bdi.dedup_running;
 }
 
+int bdrv_resume_dedup(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!bdrv_has_dedup(bs)) {
+        return -EINVAL;
+    }
+
+    if (bdrv_is_dedup_running(bs)) {
+        return 0;
+    }
+
+    if (drv && drv->bdrv_resume_dedup) {
+        drv->bdrv_resume_dedup(bs);
+    }
+
+    return 0;
+}
+
 /* needed for generic scsi interface */
 
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
diff --git a/include/block/block.h b/include/block/block.h
index f4e1d2a..94ac50a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -308,6 +308,7 @@ int bdrv_media_changed(BlockDriverState *bs);
 void bdrv_lock_medium(BlockDriverState *bs, bool locked);
 bool bdrv_has_dedup(BlockDriverState *bs);
 bool bdrv_is_dedup_running(BlockDriverState *bs);
+int bdrv_resume_dedup(BlockDriverState *bs);
 void bdrv_eject(BlockDriverState *bs, bool eject_flag);
 const char *bdrv_get_format_name(BlockDriverState *bs);
 BlockDriverState *bdrv_find(const char *name);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 56/62] block: Add bdrv_pause_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (54 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 55/62] block: Add bdrv_resume_dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 57/62] qcow2: Add qcow2_pause_dedup Benoît Canet
                   ` (6 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block.c               |   19 +++++++++++++++++++
 include/block/block.h |    1 +
 2 files changed, 20 insertions(+)

diff --git a/block.c b/block.c
index 4e80da8..8c527b6 100644
--- a/block.c
+++ b/block.c
@@ -4380,6 +4380,25 @@ int bdrv_resume_dedup(BlockDriverState *bs)
     return 0;
 }
 
+int bdrv_pause_dedup(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!bdrv_has_dedup(bs)) {
+        return -EINVAL;
+    }
+
+    if (!bdrv_is_dedup_running(bs)) {
+        return 0;
+    }
+
+    if (drv && drv->bdrv_pause_dedup) {
+        drv->bdrv_pause_dedup(bs);
+    }
+
+    return 0;
+}
+
 /* needed for generic scsi interface */
 
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
diff --git a/include/block/block.h b/include/block/block.h
index 94ac50a..1328a27 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -309,6 +309,7 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked);
 bool bdrv_has_dedup(BlockDriverState *bs);
 bool bdrv_is_dedup_running(BlockDriverState *bs);
 int bdrv_resume_dedup(BlockDriverState *bs);
+int bdrv_pause_dedup(BlockDriverState *bs);
 void bdrv_eject(BlockDriverState *bs, bool eject_flag);
 const char *bdrv_get_format_name(BlockDriverState *bs);
 BlockDriverState *bdrv_find(const char *name);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 57/62] qcow2: Add qcow2_pause_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (55 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 56/62] block: Add bdrv_pause_dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 58/62] qcow2: Add qcow2_resume_dedup Benoît Canet
                   ` (5 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |   19 +++++++++++++++++++
 block/qcow2.c       |    2 ++
 block/qcow2.h       |    1 +
 3 files changed, 22 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index dd320bf..e007387 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1278,15 +1278,20 @@ static void qcow2_deduplicate_after_resuming(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     uint64_t i;
     bool processed;
+    bool exit;
 
     /* for each l2 entry */
     for (i = 0; i < s->l2_size * s->l1_size; i++) {
         qemu_co_mutex_lock(&s->lock);
         processed = qcow2_process_to_dedup_cluster(bs, i);
+        exit = s->dedup_status == QCOW_DEDUP_STOPPING;
         qemu_co_mutex_unlock(&s->lock);
         if (processed || !(i % s->l2_size)) {
             co_sleep_ns(rt_clock, s->dedup_co_delay);
         }
+        if (exit) {
+            return;
+        }
     }
 }
 
@@ -1450,3 +1455,17 @@ void qcow2_dedup_update_metrics(BlockDriverState *bs)
     s->dedup_metrics.ram_usage = nb_hashs * GTREE_NODE_SIZE * 2;
     s->dedup_metrics.ram_usage += nb_hashs * sizeof(QCowHashNode);
 }
+
+void qcow2_pause_dedup(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    if (s->dedup_status != QCOW_DEDUP_STARTED) {
+        return;
+    }
+
+    s->dedup_status = QCOW_DEDUP_STOPPING;
+    /* must handle half processed write requests */
+    qcow2_dedup_reset(bs);
+    s->dedup_status = QCOW_DEDUP_STOPPED;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index e442268..c17ab63 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2011,6 +2011,8 @@ static BlockDriver bdrv_qcow2 = {
 
     .bdrv_invalidate_cache      = qcow2_invalidate_cache,
 
+    .bdrv_pause_dedup           = qcow2_pause_dedup,
+
     .create_options = qcow2_create_options,
     .bdrv_check = qcow2_check,
 };
diff --git a/block/qcow2.h b/block/qcow2.h
index d8e8539..5940c89 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -511,5 +511,6 @@ bool qcow2_dedup_is_running(BlockDriverState *bs);
 int qcow2_dedup_init(BlockDriverState *bs);
 void qcow2_dedup_close(BlockDriverState *bs);
 void qcow2_dedup_update_metrics(BlockDriverState *bs);
+void qcow2_pause_dedup(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 58/62] qcow2: Add qcow2_resume_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (56 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 57/62] qcow2: Add qcow2_pause_dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 59/62] qcow2: Make dedup status persists Benoît Canet
                   ` (4 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |   14 ++++++++++++++
 block/qcow2.c       |    1 +
 block/qcow2.h       |    1 +
 3 files changed, 16 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index e007387..93545af 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1469,3 +1469,17 @@ void qcow2_pause_dedup(BlockDriverState *bs)
     qcow2_dedup_reset(bs);
     s->dedup_status = QCOW_DEDUP_STOPPED;
 }
+
+void qcow2_resume_dedup(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    if (s->dedup_status != QCOW_DEDUP_STOPPED) {
+        return;
+    }
+
+    s->dedup_status = QCOW_DEDUP_STARTING;
+
+    s->dedup_resume_co = qemu_coroutine_create(qcow2_co_dedup_resume);
+    qemu_coroutine_enter(s->dedup_resume_co, bs);
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index c17ab63..d5681ad 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2012,6 +2012,7 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_invalidate_cache      = qcow2_invalidate_cache,
 
     .bdrv_pause_dedup           = qcow2_pause_dedup,
+    .bdrv_resume_dedup           = qcow2_resume_dedup,
 
     .create_options = qcow2_create_options,
     .bdrv_check = qcow2_check,
diff --git a/block/qcow2.h b/block/qcow2.h
index 5940c89..2b5a7d4 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -512,5 +512,6 @@ int qcow2_dedup_init(BlockDriverState *bs);
 void qcow2_dedup_close(BlockDriverState *bs);
 void qcow2_dedup_update_metrics(BlockDriverState *bs);
 void qcow2_pause_dedup(BlockDriverState *bs);
+void qcow2_resume_dedup(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 59/62] qcow2: Make dedup status persists.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (57 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 58/62] qcow2: Add qcow2_resume_dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 60/62] qerror: Add QERR_DEVICE_NOT_DEDUPLICATED Benoît Canet
                   ` (3 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    7 +++++++
 block/qcow2.c       |    5 ++++-
 block/qcow2.h       |    1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 93545af..85ef66f 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1323,6 +1323,8 @@ static void coroutine_fn qcow2_co_dedup_resume(void *opaque)
     s->dedup_status = QCOW_DEDUP_STARTED;
     qemu_co_mutex_unlock(&s->lock);
 
+    qcow2_update_header(bs);
+
     qcow2_deduplicate_after_resuming(bs);
 
     return;
@@ -1429,6 +1431,10 @@ int qcow2_dedup_init(BlockDriverState *bs)
         return 0;
     }
 
+    if (!s->start_dedup) {
+        return 0;
+    }
+
     s->dedup_status = QCOW_DEDUP_STARTING;
 
     /* resume deduplication */
@@ -1465,6 +1471,7 @@ void qcow2_pause_dedup(BlockDriverState *bs)
     }
 
     s->dedup_status = QCOW_DEDUP_STOPPING;
+    qcow2_update_header(bs);
     /* must handle half processed write requests */
     qcow2_dedup_reset(bs);
     s->dedup_status = QCOW_DEDUP_STOPPED;
diff --git a/block/qcow2.c b/block/qcow2.c
index d5681ad..1e61050 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -168,6 +168,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                 s->dedup_table_size =
                     be32_to_cpu(dedup_table_extension.size);
                 s->dedup_hash_algo = dedup_table_extension.hash_algo;
+                s->start_dedup = dedup_table_extension.strategies & (1 << 2);
             break;
 
         default:
@@ -1221,7 +1222,9 @@ int qcow2_update_header(BlockDriverState *bs)
         dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
         dedup_table_extension.hash_algo = s->dedup_hash_algo;
         dedup_table_extension.strategies |= 1; /* RAM based lookup */
-        dedup_table_extension.strategies |= 1 << 2; /* deduplication running */
+        if (s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED) {
+            dedup_table_extension.strategies |= 1 << 2;
+        }
         ret = header_ext_add(buf,
                              QCOW2_EXT_MAGIC_DEDUP_TABLE,
                              &dedup_table_extension,
diff --git a/block/qcow2.h b/block/qcow2.h
index 2b5a7d4..3fdfe14 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -228,6 +228,7 @@ typedef struct BDRVQcowState {
     int64_t free_byte_offset;
 
     bool has_dedup;
+    bool start_dedup;
     QCowDedupStatus dedup_status;
     QCowHashAlgo dedup_hash_algo;
     Coroutine *dedup_resume_co;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 60/62] qerror: Add QERR_DEVICE_NOT_DEDUPLICATED.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (58 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 59/62] qcow2: Make dedup status persists Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 61/62] qmp: Add block-pause-dedup Benoît Canet
                   ` (2 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 include/qapi/qmp/qerror.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 6c0a18d..3f99c8c 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -108,6 +108,9 @@ void assert_no_error(Error *err);
 #define QERR_DEVICE_NOT_ACTIVE \
     ERROR_CLASS_DEVICE_NOT_ACTIVE, "Device '%s' has not been activated"
 
+#define QERR_DEVICE_NOT_DEDUPLICATED \
+    ERROR_CLASS_GENERIC_ERROR, "Device '%s' doesn't support deduplication"
+
 #define QERR_DEVICE_NOT_ENCRYPTED \
     ERROR_CLASS_GENERIC_ERROR, "Device '%s' is not encrypted"
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 61/62] qmp: Add block-pause-dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (59 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 60/62] qerror: Add QERR_DEVICE_NOT_DEDUPLICATED Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 62/62] qmp: Add block_resume_dedup Benoît Canet
  2013-01-16 16:03 ` [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Eric Blake
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 blockdev.c       |   18 ++++++++++++++++++
 qapi-schema.json |   18 ++++++++++++++++++
 qmp-commands.hx  |   23 +++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index d724e2d..4c5f954 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -896,6 +896,24 @@ void qmp_block_passwd(const char *device, const char *password, Error **errp)
     }
 }
 
+void qmp_block_pause_dedup(const char *device, Error **errp)
+{
+    BlockDriverState *bs;
+    int err;
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    err = bdrv_pause_dedup(bs);
+    if (err == -EINVAL) {
+        error_set(errp, QERR_DEVICE_NOT_DEDUPLICATED, bdrv_get_device_name(bs));
+        return;
+    }
+}
+
 static void qmp_bdrv_open_encrypted(BlockDriverState *bs, const char *filename,
                                     int bdrv_flags, BlockDriver *drv,
                                     const char *password, Error **errp)
diff --git a/qapi-schema.json b/qapi-schema.json
index 1a5014c..d8c8348 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,6 +720,24 @@
 { 'command': 'query-block', 'returns': ['BlockInfo'] }
 
 ##
+# @block-pause-dedup:
+#
+# This command pause the deduplication on a device that support it.
+#
+# @device:   the name of the device to pause the deduplication on
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @device is not deduplicated, DeviceNotDeduplicated
+#
+# Notes:  Not all block formats support deduplication one must use
+#         query-blockstats before and look at the optional deduplication field.
+#
+# Since: 1.5
+##
+{ 'command': 'block-pause-dedup', 'data': {'device': 'str' } }
+
+##
 # @BlockDeviceDedupInfo
 #
 # Statistics of the deduplication on a virtual block device implementing it
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5c692d0..acc9fd0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1236,6 +1236,29 @@ Example:
 EQMP
 
     {
+        .name       = "block-pause-dedup",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_pause_dedup,
+    },
+
+SQMP
+block-pause-dedup
+------------
+
+Pause the deduplication on a device that support it.
+
+Arguments:
+
+- "device": device name (json-string)
+
+Example:
+
+-> { "execute": "block-pause-dedup", "arguments": { "device": "ide0-hd0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "block_set_io_throttle",
         .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [RFC V5 62/62] qmp: Add block_resume_dedup.
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (60 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 61/62] qmp: Add block-pause-dedup Benoît Canet
@ 2013-01-16 15:48 ` Benoît Canet
  2013-01-16 16:03 ` [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Eric Blake
  62 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 blockdev.c       |   18 ++++++++++++++++++
 qapi-schema.json |   18 ++++++++++++++++++
 qmp-commands.hx  |   23 +++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 4c5f954..02b6535 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -914,6 +914,24 @@ void qmp_block_pause_dedup(const char *device, Error **errp)
     }
 }
 
+void qmp_block_resume_dedup(const char *device, Error **errp)
+{
+    BlockDriverState *bs;
+    int err;
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    err = bdrv_resume_dedup(bs);
+    if (err == -EINVAL) {
+        error_set(errp, QERR_DEVICE_NOT_DEDUPLICATED, bdrv_get_device_name(bs));
+        return;
+    }
+}
+
 static void qmp_bdrv_open_encrypted(BlockDriverState *bs, const char *filename,
                                     int bdrv_flags, BlockDriver *drv,
                                     const char *password, Error **errp)
diff --git a/qapi-schema.json b/qapi-schema.json
index d8c8348..607f24b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -738,6 +738,24 @@
 { 'command': 'block-pause-dedup', 'data': {'device': 'str' } }
 
 ##
+# @block-resume-dedup:
+#
+# This command resume the deduplication on a device that support it.
+#
+# @device:   the name of the device to resume the deduplication on
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @device is not deduplicated, DeviceNotDeduplicated
+#
+# Notes:  Not all block formats support deduplication one must use
+#         query-blockstats before and look at the optional deduplication field.
+#
+# Since: 1.5
+##
+{ 'command': 'block-resume-dedup', 'data': {'device': 'str' } }
+
+##
 # @BlockDeviceDedupInfo
 #
 # Statistics of the deduplication on a virtual block device implementing it
diff --git a/qmp-commands.hx b/qmp-commands.hx
index acc9fd0..d953847 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1259,6 +1259,29 @@ Example:
 EQMP
 
     {
+        .name       = "block-resume-dedup",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_resume_dedup,
+    },
+
+SQMP
+block-resume-dedup
+------------
+
+Resume the deduplication on a device that support it.
+
+Arguments:
+
+- "device": device name (json-string)
+
+Example:
+
+-> { "execute": "block-resume-dedup", "arguments": { "device": "ide0-hd0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "block_set_io_throttle",
         .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication
  2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
                   ` (61 preceding siblings ...)
  2013-01-16 15:48 ` [Qemu-devel] [RFC V5 62/62] qmp: Add block_resume_dedup Benoît Canet
@ 2013-01-16 16:03 ` Eric Blake
  2013-01-16 16:26   ` Benoît Canet
  62 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2013-01-16 16:03 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1699 bytes --]

On 01/16/2013 08:47 AM, Benoît Canet wrote:
> This 3 step patchset implements deduplication in QCOW2.
> 
> First patchset create the core infrastructure for deduplication and enable it
> in QCOW2 image.
> It ends at "qcow2: Enable the deduplication feature."

Psychologically, reviewers tend to shy away from a 62-patch series, as
it implies a major time commitment to go through.  Sending this as three
separate series, with clear instructions in the later ones that they
depend on earlier ones, aids the review process, even if it actually
results in more mail.  This is because each series no longer has quite
as many associated patches and it becomes easier for a reviewer to
tackle one series at a time.

> 
> Second patchset implements some metrics in QMP.
> It ends at "qapi: Return virtual block device deduplication metrics in QMP"
> 
> Third patchset implements asynchronous deduplication.
> It's a work in progress patchset that is included in this post so reviewers
> can have a grasp of where the feature is heading.

Splitting patches into multiple series is especially useful when only
part of the series is ready for inclusion.

> 
> One can compile and install https://github.com/wernerd/Skein3Fish and use the
> --enable-skein-dedup configure option in order to use the faster skein HASH.
> 
> Images must be created with "-o dedup=[skein|sha256]" in order to activate the
> deduplication in the image.
> 
> Deduplication is now fast enough to be usable.
> Nice side effect is that duplicated writes are faster than native QCOW2:
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication
  2013-01-16 16:03 ` [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Eric Blake
@ 2013-01-16 16:26   ` Benoît Canet
  0 siblings, 0 replies; 67+ messages in thread
From: Benoît Canet @ 2013-01-16 16:26 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, stefanha

> Psychologically, reviewers tend to shy away from a 62-patch series, as
> it implies a major time commitment to go through.  Sending this as three
> separate series, with clear instructions in the later ones that they
> depend on earlier ones, aids the review process, even if it actually
> results in more mail.  This is because each series no longer has quite
> as many associated patches and it becomes easier for a reviewer to
> tackle one series at a time.

Splitted and reposted as three patchsets.

Regards

Benoît

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields.
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields Benoît Canet
@ 2013-01-16 16:30   ` Eric Blake
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2013-01-16 16:30 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

On 01/16/2013 08:47 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/qcow2.h |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 718b52b..b31b64e 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -43,6 +43,10 @@
>  #define QCOW_OFLAG_COPIED     (1LL << 63)
>  /* indicate that the cluster is compressed (they never have the copied flag) */
>  #define QCOW_OFLAG_COMPRESSED (1LL << 62)
> +/* indicate that the cluster must be processed when deduplication restart
> + * also indicate that the on disk dedup hash must be ignored and discarded

s/restart also/restarts. Also,/


> +/* deduplication node */
> +typedef struct {
> +    QCowHash hash;
> +    uint64_t physical_sect;       /* where the cluster is stored on disk */
> +    uint64_t first_logical_sect;  /* logical sector of the first occurence of

s/occurence/occurrence/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification.
  2013-01-16 15:47 ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Benoît Canet
@ 2013-01-16 16:43   ` Eric Blake
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2013-01-16 16:43 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 3643 bytes --]

On 01/16/2013 08:47 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 102 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..d5f8072 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,12 @@ in the description of a field.
>                                  tables to repair refcounts before accessing the
>                                  image.
>  
> -                    Bits 1-63:  Reserved (set to 0)
> +                    Bit 1:      Deduplication bit.  If this bit is set then
> +                                deduplication is used on this image.

If this bit is set, you probably want to require that the deduplication
header extension is present.

> +                                L2 tables size 64KB is different from
> +                                cluster size 4KB.

I'm still not sure what this sentence means.  Remember, cluster size of
normal disk data is configurable; are you stating that if dedup is in
effect, then the cluster size MUST be fixed at 4k (or in other words,
that header offsets 20-23 [cluster_bits] must be exactly 12)?  And my
understanding is that with dedup, there are now two L1 and L2 tables -
the normal tables to get at the actual logical data, and the dedup
tables for getting at the hashes.  Are you stating that both L2 tables
are 64k, or that just the dedup L2 is 64k?

>  
> +== Deduplication ==
> +
> +The deduplication extension contains information concerning deduplication.

Just as I suggested that the deduplication feature bit field above
should require this extension be present, here, I would probably require
that this extension not be present unless the deduplication feature bit
is set.

> +
> +    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
> +
> +          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
> +                    pointers
> +
> +              12:   Hash algo enum field
> +                        0: SHA-256
> +                        1: SHA3
> +                        2: SKEIN-256
> +
> +              13:   Dedup strategies bitmap
> +                        0: RAM based hash lookup (always set to 1 for now)
> +                        1: Disk based hash lookup

Are these two bits mutually exclusive, or can they both be used at once?

> +                        2: Deduplication running if set to 1
> +
> +        14 - 69:    Set to zero and reserved for future use
> +
> +Disk based lookup structure will be described in a future QCOW2 specification.

If so, it may be better to document in this revision of the file that
the disk-based hash lookup strategy bit must always be 0 for now.

> +
> +== Deduplication table (RAM method) ==
> +

>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -211,7 +311,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
>  
>  The L1 table has a variable size (stored in the header) and may use multiple
>  clusters, however it must be contiguous in the image file. L2 tables are
> -exactly one cluster in size.
> +exactly one cluster in size excepted for the deduplication case.

s/excepted/except/ - and again, is this for all L2 tables, or just the
dedup L2 tables?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2013-01-16 16:44 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Benoît Canet
2013-01-16 16:43   ` Eric Blake
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields Benoît Canet
2013-01-16 16:30   ` Eric Blake
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 03/62] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 04/62] qcow2: Make update_refcount public Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 05/62] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 06/62] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 07/62] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 08/62] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 09/62] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 10/62] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 11/62] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 12/62] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 13/62] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 14/62] qcow2: Load and save deduplication table header extension Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 15/62] qcow2: Extract qcow2_do_table_init Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 16/62] qcow2-cache: Allow to choose table size at creation Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 17/62] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 18/62] block: Add qemu-img dedup create option Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 19/62] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 20/62] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 21/62] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 22/62] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 23/62] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 24/62] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 25/62] qcow2: Add verification of dedup table Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 26/62] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 27/62] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 28/62] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 29/62] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 30/62] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 31/62] qcow2: Use large L2 table for deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 32/62] qcow: Set large dedup hash block size Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 33/62] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 34/62] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 35/62] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 36/62] qcow2: Enable the deduplication feature Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 37/62] qcow2: Add deduplication metrics structures Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 38/62] qcow2: Initialize deduplication metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 39/62] qcow2: Collect unaligned writes missing data reads metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 40/62] qcow2: Collect deduplicated cluster metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 41/62] qcow2: Collect undeduplicated " Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 42/62] qcow2: Count QCowHashNode creation metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 43/62] qcow2: Count QCowHashNode removal from tree for metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 44/62] qcow2: Count cluster deleted metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 45/62] qcow2: Count deduplication refcount overflow metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 46/62] qapi: Add support for deduplication infos in qapi-schema.json Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 47/62] block: Add deduplication metrics to BlockDriverInfo Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 48/62] qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 49/62] qcow2: returns deduplication metrics and status via bdrv_get_info() Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 50/62] qapi: Return virtual block device deduplication metrics in QMP Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 51/62] block: Add BlockDriver function prototype to pause and resume deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 52/62] qcow2: Add code to deduplicate cluster flagged with QCOW_OFLAG_TO_DEDUP Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 53/62] block: Add bdrv_has_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 54/62] block: Add bdrv_is_dedup_running Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 55/62] block: Add bdrv_resume_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 56/62] block: Add bdrv_pause_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 57/62] qcow2: Add qcow2_pause_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 58/62] qcow2: Add qcow2_resume_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 59/62] qcow2: Make dedup status persists Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 60/62] qerror: Add QERR_DEVICE_NOT_DEDUPLICATED Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 61/62] qmp: Add block-pause-dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 62/62] qmp: Add block_resume_dedup Benoît Canet
2013-01-16 16:03 ` [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Eric Blake
2013-01-16 16:26   ` Benoît Canet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).