From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:36292) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UGVwV-0005si-4U for qemu-devel@nongnu.org; Fri, 15 Mar 2013 10:49:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UGVwQ-0003uJ-QG for qemu-devel@nongnu.org; Fri, 15 Mar 2013 10:49:11 -0400 Received: from nodalink.pck.nerim.net ([62.212.105.220]:59502 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UGVwQ-0003u0-5X for qemu-devel@nongnu.org; Fri, 15 Mar 2013 10:49:06 -0400 From: =?UTF-8?q?Beno=C3=AEt=20Canet?= Date: Fri, 15 Mar 2013 15:49:14 +0100 Message-Id: <1363358986-8360-1-git-send-email-benoit@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [RFC V7 00/32] QCOW2 deduplication core functionality List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: kwolf@redhat.com, =?UTF-8?q?Beno=C3=AEt=20Canet?= , stefanha@redhat.com This patchset create the core infrastructure for deduplication and enable= it. One can compile and install https://github.com/wernerd/Skein3Fish and use= the --enable-skein-dedup configure option in order to use the faster skein HA= SH. Images must be created with "-o dedup=3D[skein|sha256]" in order to activ= ate the deduplication in the image. What's new: ---------- The real new feature of this new version is the move to drop the second i= n ram lookup structure. If the code is correct it would allow to use only one more disk based loo= kup structure replacing the last in ram GTree. I didn't find a solution to make qemu-iotest works properly with the coro= utine based loading sequence of the deduplication code so I am posting the code= for proper review until the loading sequence is dropped. v7: -Fixes most of thing spotted by Stefan and Eric excepted changes conc= erning the on disk structure that will be dropped. -Kill the second gtree by reading clusters and recomputing hashes See 322ea5ba -Also do cow on rewrite do finish kill the second gtree See 793c0b4c -Somes write loop simplification spotted by Stefan will be done on ne= xt iteration [RFC V6 01/33] qcow2: Add deduplication to the qcow2 specification. Skip this part of the review as the on disk format used to store hash= es will change. [RFC V6 02/33] qmp: Add DedupStatus enum. Skipped [RFC V6 03/33] qcow2: Add deduplication structures and fields. rename QCOW_OFLAG_TO_DEDUP to QCOW_OFLAG_PENDING_DEDUP [Stefan] s/restart/restarts/ [Stefan] s/Persistant/Persistent/ [Stefan] Change nb_clusters_processed and nb_undedupable_sectors to uint64_t i= n case of really large writes. [Stefan] Change dedup_table_size to size_t [Stefan] s/occurence/occurrence/ [Eric] drop QCOW_STRATEGY_DISK reorder QCOW_STRATEGY_RUNNING and QCOW_STRATEGY_RAM and use (1 << 0) = =C2=A0[Eric] rename QCOW_FLAG to QCOW_DEDUP_FLAG [Stefan] rename QCOW_STRATEGY* to QCOW_DEDUP_STRATEGY* [Beno=C3=AEt] [RFC V6 04/33] qcow2: Add qcow2_dedup_read_missing_and_concatenate s/all the required data required/everything required/ [Stefan] mark qcow2_read_cluster_data as coroutine_fn [Stefan] there is already a test case is qemu-io-test [Stefan] call qcow2_co_readv instead of bdrv_co_readv in qcow2_read_cluster_da= ta [Stefan] specify on the comments that the caller must use qemu_vfree to clean = data on success after calling qcow2_dedup_read_missing_and_concatenate= [Eric] [RFC V6 05/33] qcow2: Make update_refcount public. Drop patch and replace it with another making update_cluster_refcoun= t public [Beno=C3=AEt] [RFC V6 06/33] qcow2: Create a way to link to l2 tables when deduplicatin= g. Drop QCOW_OFLAG_FIRST in favor of QCOW_OFLAG_COPIED [Stefan] [RFC V6 07/33] qcow2: Add qcow2_dedup and related functions Add cluster_index temporary variable in qcow2_deduplicate_cluster [B= eno=C3=AEt] Use qcow2_update_cluster_refcount in qcow2_deduplicate_cluster [Ben= o=C3=AEt] rename qcow2_build_dedup_hash to qcow2_dedup_hash_new [Stefan] rename qcow2_dedup_build_qcow_hash_node to qcow2_hash_node_new [Stef= an] Drop QCOW_OFLAG_FIRST in favor of QCOW_OFLAG_COPIED [Stefan] s/dont't/don't/ [Stefan] replace &=3D by =3D [Stefan] in qcow2_deduplicate_cluster increment refcount before relinking l2 [Stefan] simplify pointer arithmetic on *data [Stefan] s/begining_index/start_index/ [Stefan] s/compute/computes/ [Stefan] s/come/comes/ [Stefan] todo: cache flushes ? , nb_clusters > 0, not done: putting complete hash node in tree when created. "Also, we put an incomplete node into dedup_tree_by_hash. The c= aller must ensure that other coroutines do not use dedup_tree_by_hash(= ) before we've filled in real values, or the callers need to check for QCOW_FLAG_EMPTY. Seems a little risky, so why insert before com= pleting the hash node?" [Stefan] reason: the next cluster to be tested maybe a duplicate of this = one so the tree is filled with an incomple hash node so the detection can be done. [RFC V6 08/33] qcow2: Add qcow2_dedup_store_new_hashes. s/QCowHashNoderepresent/QCowHashNode represents/ [Eric] s/occurence/occurrence/ [Eric] s/cluste/cluster/ [Eric] qcow2_create_block() -> qcow2_create_dedup_block() [Stefan] simplify qcow2_has_dedup_block [Stefan] qcow2_write_hash_to_block_and_dirty -> qcow2_set_hash_block_entry [S= tefan] dont mention QCOW_OFLAG_COPIED in comment [Stefan] s/succes/success/ [Stefan] s/errno/-errno/ [Stefan] RFC V6 09/33] qcow2: Implement qcow2_compute_cluster_hash. factorize probe with gnu tls probe [Stefan] [RFC V6 10/33] qcow2: Extract qcow2_dedup_grow_table delete patch [Stefan] [RFC V6 11/33] qcow2: Add qcow2_dedup_grow_table and use it. Do not try to factorize function -> create it [Stefan] create BLKDBG_DEDUP_GROW_* block event and use them RFC V6 12/33] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicat= e clusters Merge fields in m->l2_entry_flags [Stefan] [RFC V6 13/33] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe TODO: [RFC V6 14/33] qcow2: Create qcow2_is_cluster_to_dedup. Fix/Replace QCOW_OFLAG_COPIED by L1E_OFFSET_MASK [Stefan] [RFC V6 15/33] qcow2: Load and save deduplication table header extension. Fix buffer overflow [Stefan] Do not add validate the other fields as they will go when the disk ha= sh store datastructure will change in next revision of the patchset. [RFC V6 16/33] qcow2: Extract qcow2_do_table_init. Add blkevent for dedup table loading -> not done since datastructure w= ill change [RFC V6 18/33] qcow2: Extract qcow2_add_feature and qcow2_remove_feature. Rename to qcow2_set_incompat_feature and qcow2_clear_incompat_feature [Stefan] Get rid of wrapper functions and update callers [Stefan] [RFC V6 19/33] block: Add qcow2_dedup format and image creation code Drop qcow2_activate_dedup wrapper function [Stefan] makes "BDRVQcowState *s =3D bs->opaque;" a local variable [Stefan] comment the creation of the qcow2_dedup format [Stefan] always force version to 3 when using qcow_dedup format [Beno=C3=AEt] [RFC V6 20/33] qcow2: Add a deduplication boolean to update_refcount. Drop this patch [Stefan] [RFC V6 21/33] qcow2: Drop hash for a given cluster when dedup makes refc= ount > 2^16/2. Do the work in the dedup code [Stefan] [RFC V6 22/33] qcow2: Remove hash when cluster is deleted. s/it's/its/ [Stefan] s/qcow2_dedup_refcount_zero_reached/qcow2_dedup_destroy_hash/g [Stefa= n] qcow2_dedup_destroy_hash was choosen to fit the future on disk hash s= tore [RFC V6 24/33] qcow2: Integrate deduplication in qcow2_co_writev loop. s/"goto fail;"/"break;"/ so we get ret =3D 0 [Stefan] simplify ?:; using ifs [Stefan] move down the writing of the hashes after l2 linking [Stefan] last simplification will be done in next iteration. [RFC V6 25/33] qcow2: Serialize write requests when deduplication is acti= vated. Do not lock around qcow2_dedup_is_running() call [Stefan] s/fix/fixes/ [Eric] s/more// [Eric] Add comment explaining why [Stefan] [RFC V6 26/33] qcow2: Add verification of dedup table. Move this patch after the one checking the dedup table [Stefan]. [RFC V6 27/33] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup. Add comment explaining the modification [Stefan, Kevin] [RFC V6 29/33] qcow2: Do not overwrite existing entries with QCOW_OFLAG_C= OPIED. TODO: I do not remember the exact reason for this patch. I will test if it's still required with the one serializing writes. [RFC V6 31/33] qcow: Set large dedup hash block size Use s->l2_size * sizeof(uint64_t) [Stefan] [RFC V6 32/33] qemu-iotests: Filter dedup=3Don/off so existing tests don'= t break Squash in f35972d22477bc3521ad7b4d97a1c469d8f71059 [Eric] [RFC V6 33/33] qcow2: Add qcow2_dedup_init and qcow2_dedup_close. Skip suggested changes since the on disk data structure will change v6: Fix typo in "Drop hash for a given cluster..." commits message [Eric] Fix spurious whitespace change in "qcow2: Add qcow2_dedup_read_mis...= " [Eric] Fix spelling mistake in "qcow2: Add qcow2_dedup_read_mis..." [Eric] Make #defines for deduplication strategies and use them [Eric] Specify that refcount_order must be >=3D 4 in qcow2 spec [Eric] Remove LAZY_REFCOUNT [Beno=C3=AEt] Do not modifigy L2 Size [Beno=C3=AEt] v5: Move qemu-io-test dedup patch [Eric] Reserve some room at the end of the QCOW header extensions. [Eric] Fix the specification. [Eric] Now overflow deduplication refcount at 2^16/2 [Stefan] Increase L2 table size and deduplication block hash size. v4: Fix and complete qcow2 spec [Stefan] Hash the hash_algo field in the header extension [Stefan] Fix qcow2 spec [Eric] Remove pointer to hash and simplify hash memory management [Stefan] Rename and move qcow2_read_cluster_data to qcow2.c [Stefan] Document lock dropping behaviour of the previous function [Stefan] cleanup qcow2_dedup_read_missing_cluster_data [Stefan] rename *_offset to *_sect [Stefan] add a ./configure check for ssl [Stefan] Replace openssl by gnutls [Stefan] Implement Skein hashes Rewrite pretty every qcow2-dedup.c commits after Add qcow2_dedup_read_missing_and_concatenate to simplify the code Use 64KB deduplication hash block to reduce allocation flushes Use 64KB l2 tables to reduce allocation flushes [breaks compatibility= ] Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultin= gs in frequent caches flushes Do not create and load dedup RAM structures when bdrs->read_only is t= rue v3: make it work barely replace kernel red black trees by gtree. Beno=C3=AEt Canet (32): qcow2: Add deduplication to the qcow2 specification. qmp: Add DedupStatus enum. qcow2: Add deduplication structures and fields. qcow2: Add qcow2_dedup_read_missing_and_concatenate qcow2: Create a way to link to l2 tables when deduplicating. qcow2: Make qcow2_update_cluster_refcount public. qcow2: Add qcow2_dedup and related functions qcow2: Add qcow2_dedup_store_new_hashes. qcow2: Do allocate on rewrite on the dedup case. qcow2: Implement qcow2_compute_cluster_hash. qcow2: Add qcow2_dedup_grow_table and use it. qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters. qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe qcow2: Create qcow2_is_cluster_to_dedup. qcow2: Load and save deduplication table header extension. qcow2: Extract qcow2_do_table_init. qcow2-cache: Allow to choose table size at creation. qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature. block: Add qcow2_dedup format and image creation code. qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2. qcow2: Remove hash when cluster is deleted. qcow2: Add qcow2_dedup_is_running to probe if dedup is running. qcow2: Integrate deduplication in qcow2_co_writev loop. qcow2: Serialize write requests when deduplication is activated. qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup. qcow2: Add check_dedup_l2 in order to check l2 of dedup table. qcow2: Add verification of dedup table. qcow2: Integrate SKEIN hash algorithm in deduplication. qcow: Set large dedup hash block size. qcow2: Add qcow2_dedup_init and qcow2_dedup_close. qcow2: Add qcow2_co_dedup_resume to restart deduplication. qcow2: Enable the deduplication feature. block/Makefile.objs | 1 + block/qcow2-cache.c | 12 +- block/qcow2-cluster.c | 94 ++- block/qcow2-dedup.c | 1350 ++++++++++++++++++++++++++++++++++++= ++++++ block/qcow2-refcount.c | 162 ++++- block/qcow2.c | 448 ++++++++++++-- block/qcow2.h | 139 ++++- configure | 121 +++- docs/specs/qcow2.txt | 105 +++- include/block/block.h | 4 + include/block/block_int.h | 1 + qapi-schema.json | 18 + tests/qemu-iotests/common.rc | 3 +- 13 files changed, 2341 insertions(+), 117 deletions(-) create mode 100644 block/qcow2-dedup.c -- 1.7.10.4