qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Benoît Canet" <benoit@irqsave.net>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, pbonzini@redhat.com,
	"Benoît Canet" <benoit@irqsave.net>,
	stefanha@redhat.com
Subject: [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification.
Date: Wed, 16 Jan 2013 16:47:40 +0100	[thread overview]
Message-ID: <1358351321-4891-2-git-send-email-benoit@irqsave.net> (raw)
In-Reply-To: <1358351321-4891-1-git-send-email-benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..d5f8072 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+                                L2 tables size 64KB is different from
+                                cluster size 4KB.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +165,100 @@ the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains information concerning deduplication.
+
+    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
+
+          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+                    pointers
+
+              12:   Hash algo enum field
+                        0: SHA-256
+                        1: SHA3
+                        2: SKEIN-256
+
+              13:   Dedup strategies bitmap
+                        0: RAM based hash lookup (always set to 1 for now)
+                        1: Disk based hash lookup
+                        2: Deduplication running if set to 1
+
+        14 - 69:    Set to zero and reserved for future use
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to permanently store the information to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0       l1_dedup_index                                              Size
+              |
+|--------------------------------------------------------------------|
+|             |                                                      |
+|             |        L1 Deduplication table                        |
+|             |                                                      |
+|--------------------------------------------------------------------|
+              |
+              |
+              |
+0             |           l2_dedup_block_entries
+              |
+|---------------------------------|
+|                                 |
+|    L2 deduplication block       |
+|                                 |
+|                 l2_dedup_index  |
+|---------------------------------|
+                         |
+         0               |              40
+                         |
+         |-------------------------------|
+         |                               |
+         |    Deduplication table entry  |
+         |                               |
+         |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+    Byte  0 - 31:   hash of data cluster
+
+         32 - 39:   Logical offset of first encountered block having
+                    this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+cluster_size = 4096
+dedup_block_size = 65536 * 5
+l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -211,7 +311,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
 
 The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
-exactly one cluster in size.
+exactly one cluster in size excepted for the deduplication case.
 
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows:
-- 
1.7.10.4

  reply	other threads:[~2013-01-16 15:48 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-16 15:47 [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Benoît Canet
2013-01-16 15:47 ` Benoît Canet [this message]
2013-01-16 16:43   ` [Qemu-devel] [RFC V5 01/62] qcow2: Add deduplication to the qcow2 specification Eric Blake
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 02/62] qcow2: Add deduplication structures and fields Benoît Canet
2013-01-16 16:30   ` Eric Blake
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 03/62] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 04/62] qcow2: Make update_refcount public Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 05/62] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 06/62] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 07/62] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 08/62] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 09/62] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 10/62] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 11/62] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 12/62] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 13/62] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 14/62] qcow2: Load and save deduplication table header extension Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 15/62] qcow2: Extract qcow2_do_table_init Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 16/62] qcow2-cache: Allow to choose table size at creation Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 17/62] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 18/62] block: Add qemu-img dedup create option Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 19/62] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
2013-01-16 15:47 ` [Qemu-devel] [RFC V5 20/62] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 21/62] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 22/62] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 23/62] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 24/62] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 25/62] qcow2: Add verification of dedup table Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 26/62] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 27/62] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 28/62] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 29/62] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 30/62] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 31/62] qcow2: Use large L2 table for deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 32/62] qcow: Set large dedup hash block size Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 33/62] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 34/62] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 35/62] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 36/62] qcow2: Enable the deduplication feature Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 37/62] qcow2: Add deduplication metrics structures Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 38/62] qcow2: Initialize deduplication metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 39/62] qcow2: Collect unaligned writes missing data reads metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 40/62] qcow2: Collect deduplicated cluster metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 41/62] qcow2: Collect undeduplicated " Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 42/62] qcow2: Count QCowHashNode creation metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 43/62] qcow2: Count QCowHashNode removal from tree for metrics Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 44/62] qcow2: Count cluster deleted metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 45/62] qcow2: Count deduplication refcount overflow metric Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 46/62] qapi: Add support for deduplication infos in qapi-schema.json Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 47/62] block: Add deduplication metrics to BlockDriverInfo Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 48/62] qcow2: Add qcow2_dedup_update_metrics to compute dedup RAM usage Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 49/62] qcow2: returns deduplication metrics and status via bdrv_get_info() Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 50/62] qapi: Return virtual block device deduplication metrics in QMP Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 51/62] block: Add BlockDriver function prototype to pause and resume deduplication Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 52/62] qcow2: Add code to deduplicate cluster flagged with QCOW_OFLAG_TO_DEDUP Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 53/62] block: Add bdrv_has_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 54/62] block: Add bdrv_is_dedup_running Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 55/62] block: Add bdrv_resume_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 56/62] block: Add bdrv_pause_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 57/62] qcow2: Add qcow2_pause_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 58/62] qcow2: Add qcow2_resume_dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 59/62] qcow2: Make dedup status persists Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 60/62] qerror: Add QERR_DEVICE_NOT_DEDUPLICATED Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 61/62] qmp: Add block-pause-dedup Benoît Canet
2013-01-16 15:48 ` [Qemu-devel] [RFC V5 62/62] qmp: Add block_resume_dedup Benoît Canet
2013-01-16 16:03 ` [Qemu-devel] [RFC V5 00/62] QCOW2 deduplication Eric Blake
2013-01-16 16:26   ` Benoît Canet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1358351321-4891-2-git-send-email-benoit@irqsave.net \
    --to=benoit@irqsave.net \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).