From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:53098) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqQzX-0007j0-1k for qemu-devel@nongnu.org; Wed, 02 Jan 2013 11:16:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TqQzT-0005w5-UD for qemu-devel@nongnu.org; Wed, 02 Jan 2013 11:16:30 -0500 Received: from nodalink.pck.nerim.net ([62.212.105.220]:56153 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqQzT-0005vx-GO for qemu-devel@nongnu.org; Wed, 02 Jan 2013 11:16:27 -0500 From: =?UTF-8?q?Beno=C3=AEt=20Canet?= Date: Wed, 2 Jan 2013 17:16:04 +0100 Message-Id: <1357143393-29832-2-git-send-email-benoit@irqsave.net> In-Reply-To: <1357143393-29832-1-git-send-email-benoit@irqsave.net> References: <1357143393-29832-1-git-send-email-benoit@irqsave.net> Subject: [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: kwolf@redhat.com, pbonzini@redhat.com, =?UTF-8?q?Beno=C3=AEt=20Canet?= , stefanha@redhat.com Signed-off-by: Benoit Canet --- docs/specs/qcow2.txt | 100 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 99 insertions(+), 1 deletion(-) diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index 36a559d..c9c0d47 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -80,7 +80,12 @@ in the description of a field. tables to repair refcounts before accessing the image. - Bits 1-63: Reserved (set to 0) + Bit 1: Deduplication bit. If this bit is set then + deduplication is used on this image. + L2 tables size 64KB is different from + cluster size 4KB. + + Bits 2-63: Reserved (set to 0) 80 - 87: compatible_features Bitmask of compatible features. An implementation can @@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following: 0x00000000 - End of the header extension area 0xE2792ACA - Backing file format name 0x6803f857 - Feature name table + 0xCD8E819B - Deduplication other - Unknown header extension, can be safely ignored @@ -159,6 +165,98 @@ the header extension data. Each entry look like this: terminated if it has full length) +== Deduplication == + +The deduplication extension contains the informations concerning the +deduplication. + + Byte 0 - 7: Offset of the RAM deduplication table + + 8 - 11: Size of the RAM deduplication table = number of L1 64-bit + pointers + + 12: Hash algo enum field + 0: SHA-256 + 1: SHA3 + 2: SKEIN-256 + + 13: Dedup stategies bitmap + 0: RAM based hash lookup + 1: Disk based hash lookup + +Disk based lookup structure will be described in a future QCOW2 specification. + +== Deduplication table (RAM method) == + +The deduplication table maps a physical offset to a data hash and +logical offset. It is used to store permanently the informations required to +do the deduplication. It is loaded at startup into a RAM based representation +used to do the lookups. + +The deduplication table contains 64-bit offsets to the level 2 deduplication +table blocks. +Each entry of these blocks contains a 32-byte SHA256 hash followed by the +64-bit logical offset of the first encountered cluster having this hash. + +== Deduplication table schematic (RAM method) == + +0 l1_dedup_index Size + | +|--------------------------------------------------------------------| +| | | +| | L1 Deduplication table | +| | | +|--------------------------------------------------------------------| + | + | + | +0 | l2_dedup_block_entries + | +|---------------------------------| +| | +| L2 deduplication block | +| | +| l2_dedup_index | +|---------------------------------| + | + 0 | 40 + | + |-------------------------------| + | | + | Deduplication table entry | + | | + |-------------------------------| + + +== Deduplication table entry description (RAM method) == + +Each L2 deduplication table entry has the following structure: + + Byte 0 - 31: hash of data cluster + + 32 - 39: Logical offset of first encountered block having + this hash + +== Deduplication table arithmetics (RAM method) == + +Entries in the deduplication table are ordered by physical cluster index. + +The number of entries in an l2 deduplication table block is : +l2_dedup_block_entries = dedup_block_size / (32 + 8) + +The index in the level 1 deduplication table is : +l1_dedup_index = physical_cluster_index / l2_block_cluster_entries + +The index in the level 2 deduplication table is: +l2_dedup_index = physical_cluster_index % l2_block_cluster_entries + +cluster_size = 4096 +dedup_block_size = 65536 +l2_size = 65536 + +The 16 remaining bytes in each l2 deduplication blocks are set to zero and +reserved for a future usage. + == Host cluster management == qcow2 manages the allocation of host clusters by maintaining a reference count -- 1.7.10.4