* [Qemu-devel] [PATCH v2 0/5] Initial VHDX support
@ 2013-04-23 14:24 Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 1/5] qemu: add castagnoli crc32c checksum algorithm Jeff Cody
` (4 more replies)
0 siblings, 5 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This adds the initial support for VHDX image files.
It currently only supports read operations of VHDX, for fixed and dynamic files.
Notably, the following is not yet supported:
* Differencing files
* Log replay (so we will refuse to open any images that are not 'clean')
* .bdrv_create()
* write operations other than to the header
Jeff Cody (5):
qemu: add castagnoli crc32c checksum algorithm
block: vhdx header for the QEMU support of VHDX images
block: initial VHDX driver support framework - supports open and probe
block: add read-only support to VHDX image format.
block: add header update capability for VHDX images.
block/Makefile.objs | 1 +
block/vhdx.c | 1041 +++++++++++++++++++++++++++++++++++++++++++++++++
block/vhdx.h | 352 +++++++++++++++++
configure | 13 +
include/qemu/crc32c.h | 35 ++
util/Makefile.objs | 1 +
util/crc32c.c | 115 ++++++
7 files changed, 1558 insertions(+)
create mode 100644 block/vhdx.c
create mode 100644 block/vhdx.h
create mode 100644 include/qemu/crc32c.h
create mode 100644 util/crc32c.c
--
1.8.1.4
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 1/5] qemu: add castagnoli crc32c checksum algorithm
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
@ 2013-04-23 14:24 ` Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
` (3 subsequent siblings)
4 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This adds the Castagnoli CRC32C algorithm, using the 0x11EDC6F41
polynomial.
This is extracted from the linux kernel cryptographic crc32.c module.
The algorithm is based on:
Castagnoli93: Guy Castagnoli and Stefan Braeuer and Martin Herrman
"Optimization of Cyclic Redundancy-Check Codes with 24
and 32 Parity Bits", IEEE Transactions on Communication,
Volume 41, Number 6, June 1993
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
include/qemu/crc32c.h | 35 +++++++++++++++
util/Makefile.objs | 1 +
util/crc32c.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 151 insertions(+)
create mode 100644 include/qemu/crc32c.h
create mode 100644 util/crc32c.c
diff --git a/include/qemu/crc32c.h b/include/qemu/crc32c.h
new file mode 100644
index 0000000..56d1c3b
--- /dev/null
+++ b/include/qemu/crc32c.h
@@ -0,0 +1,35 @@
+/*
+ * Castagnoli CRC32C Checksum Algorithm
+ *
+ * Polynomial: 0x11EDC6F41
+ *
+ * Castagnoli93: Guy Castagnoli and Stefan Braeuer and Martin Herrman
+ * "Optimization of Cyclic Redundancy-Check Codes with 24
+ * and 32 Parity Bits",IEEE Transactions on Communication,
+ * Volume 41, Number 6, June 1993
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ * Jeff Cody <jcody@redhat.com>
+ *
+ * Based on the Linux kernel cryptographic crc32c module,
+ *
+ * Copyright (c) 2004 Cisco Systems, Inc.
+ * Copyright (c) 2008 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#ifndef QEMU_CRC32_H
+#define QEMU_CRC32_H
+
+#include "qemu-common.h"
+
+uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index c5652f5..4a1bd4e 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -10,3 +10,4 @@ util-obj-$(CONFIG_POSIX) += compatfd.o
util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
util-obj-y += qemu-option.o qemu-progress.o
util-obj-y += hexdump.o
+util-obj-y += crc32c.o
diff --git a/util/crc32c.c b/util/crc32c.c
new file mode 100644
index 0000000..8866327
--- /dev/null
+++ b/util/crc32c.c
@@ -0,0 +1,115 @@
+/*
+ * Castagnoli CRC32C Checksum Algorithm
+ *
+ * Polynomial: 0x11EDC6F41
+ *
+ * Castagnoli93: Guy Castagnoli and Stefan Braeuer and Martin Herrman
+ * "Optimization of Cyclic Redundancy-Check Codes with 24
+ * and 32 Parity Bits",IEEE Transactions on Communication,
+ * Volume 41, Number 6, June 1993
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ * Jeff Cody <jcody@redhat.com>
+ *
+ * Based on the Linux kernel cryptographic crc32c module,
+ *
+ * Copyright (c) 2004 Cisco Systems, Inc.
+ * Copyright (c) 2008 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu/crc32c.h"
+
+/*
+ * This is the CRC-32C table
+ * Generated with:
+ * width = 32 bits
+ * poly = 0x1EDC6F41
+ * reflect input bytes = true
+ * reflect output bytes = true
+ */
+
+static const uint32_t crc32c_table[256] = {
+ 0x00000000L, 0xF26B8303L, 0xE13B70F7L, 0x1350F3F4L,
+ 0xC79A971FL, 0x35F1141CL, 0x26A1E7E8L, 0xD4CA64EBL,
+ 0x8AD958CFL, 0x78B2DBCCL, 0x6BE22838L, 0x9989AB3BL,
+ 0x4D43CFD0L, 0xBF284CD3L, 0xAC78BF27L, 0x5E133C24L,
+ 0x105EC76FL, 0xE235446CL, 0xF165B798L, 0x030E349BL,
+ 0xD7C45070L, 0x25AFD373L, 0x36FF2087L, 0xC494A384L,
+ 0x9A879FA0L, 0x68EC1CA3L, 0x7BBCEF57L, 0x89D76C54L,
+ 0x5D1D08BFL, 0xAF768BBCL, 0xBC267848L, 0x4E4DFB4BL,
+ 0x20BD8EDEL, 0xD2D60DDDL, 0xC186FE29L, 0x33ED7D2AL,
+ 0xE72719C1L, 0x154C9AC2L, 0x061C6936L, 0xF477EA35L,
+ 0xAA64D611L, 0x580F5512L, 0x4B5FA6E6L, 0xB93425E5L,
+ 0x6DFE410EL, 0x9F95C20DL, 0x8CC531F9L, 0x7EAEB2FAL,
+ 0x30E349B1L, 0xC288CAB2L, 0xD1D83946L, 0x23B3BA45L,
+ 0xF779DEAEL, 0x05125DADL, 0x1642AE59L, 0xE4292D5AL,
+ 0xBA3A117EL, 0x4851927DL, 0x5B016189L, 0xA96AE28AL,
+ 0x7DA08661L, 0x8FCB0562L, 0x9C9BF696L, 0x6EF07595L,
+ 0x417B1DBCL, 0xB3109EBFL, 0xA0406D4BL, 0x522BEE48L,
+ 0x86E18AA3L, 0x748A09A0L, 0x67DAFA54L, 0x95B17957L,
+ 0xCBA24573L, 0x39C9C670L, 0x2A993584L, 0xD8F2B687L,
+ 0x0C38D26CL, 0xFE53516FL, 0xED03A29BL, 0x1F682198L,
+ 0x5125DAD3L, 0xA34E59D0L, 0xB01EAA24L, 0x42752927L,
+ 0x96BF4DCCL, 0x64D4CECFL, 0x77843D3BL, 0x85EFBE38L,
+ 0xDBFC821CL, 0x2997011FL, 0x3AC7F2EBL, 0xC8AC71E8L,
+ 0x1C661503L, 0xEE0D9600L, 0xFD5D65F4L, 0x0F36E6F7L,
+ 0x61C69362L, 0x93AD1061L, 0x80FDE395L, 0x72966096L,
+ 0xA65C047DL, 0x5437877EL, 0x4767748AL, 0xB50CF789L,
+ 0xEB1FCBADL, 0x197448AEL, 0x0A24BB5AL, 0xF84F3859L,
+ 0x2C855CB2L, 0xDEEEDFB1L, 0xCDBE2C45L, 0x3FD5AF46L,
+ 0x7198540DL, 0x83F3D70EL, 0x90A324FAL, 0x62C8A7F9L,
+ 0xB602C312L, 0x44694011L, 0x5739B3E5L, 0xA55230E6L,
+ 0xFB410CC2L, 0x092A8FC1L, 0x1A7A7C35L, 0xE811FF36L,
+ 0x3CDB9BDDL, 0xCEB018DEL, 0xDDE0EB2AL, 0x2F8B6829L,
+ 0x82F63B78L, 0x709DB87BL, 0x63CD4B8FL, 0x91A6C88CL,
+ 0x456CAC67L, 0xB7072F64L, 0xA457DC90L, 0x563C5F93L,
+ 0x082F63B7L, 0xFA44E0B4L, 0xE9141340L, 0x1B7F9043L,
+ 0xCFB5F4A8L, 0x3DDE77ABL, 0x2E8E845FL, 0xDCE5075CL,
+ 0x92A8FC17L, 0x60C37F14L, 0x73938CE0L, 0x81F80FE3L,
+ 0x55326B08L, 0xA759E80BL, 0xB4091BFFL, 0x466298FCL,
+ 0x1871A4D8L, 0xEA1A27DBL, 0xF94AD42FL, 0x0B21572CL,
+ 0xDFEB33C7L, 0x2D80B0C4L, 0x3ED04330L, 0xCCBBC033L,
+ 0xA24BB5A6L, 0x502036A5L, 0x4370C551L, 0xB11B4652L,
+ 0x65D122B9L, 0x97BAA1BAL, 0x84EA524EL, 0x7681D14DL,
+ 0x2892ED69L, 0xDAF96E6AL, 0xC9A99D9EL, 0x3BC21E9DL,
+ 0xEF087A76L, 0x1D63F975L, 0x0E330A81L, 0xFC588982L,
+ 0xB21572C9L, 0x407EF1CAL, 0x532E023EL, 0xA145813DL,
+ 0x758FE5D6L, 0x87E466D5L, 0x94B49521L, 0x66DF1622L,
+ 0x38CC2A06L, 0xCAA7A905L, 0xD9F75AF1L, 0x2B9CD9F2L,
+ 0xFF56BD19L, 0x0D3D3E1AL, 0x1E6DCDEEL, 0xEC064EEDL,
+ 0xC38D26C4L, 0x31E6A5C7L, 0x22B65633L, 0xD0DDD530L,
+ 0x0417B1DBL, 0xF67C32D8L, 0xE52CC12CL, 0x1747422FL,
+ 0x49547E0BL, 0xBB3FFD08L, 0xA86F0EFCL, 0x5A048DFFL,
+ 0x8ECEE914L, 0x7CA56A17L, 0x6FF599E3L, 0x9D9E1AE0L,
+ 0xD3D3E1ABL, 0x21B862A8L, 0x32E8915CL, 0xC083125FL,
+ 0x144976B4L, 0xE622F5B7L, 0xF5720643L, 0x07198540L,
+ 0x590AB964L, 0xAB613A67L, 0xB831C993L, 0x4A5A4A90L,
+ 0x9E902E7BL, 0x6CFBAD78L, 0x7FAB5E8CL, 0x8DC0DD8FL,
+ 0xE330A81AL, 0x115B2B19L, 0x020BD8EDL, 0xF0605BEEL,
+ 0x24AA3F05L, 0xD6C1BC06L, 0xC5914FF2L, 0x37FACCF1L,
+ 0x69E9F0D5L, 0x9B8273D6L, 0x88D28022L, 0x7AB90321L,
+ 0xAE7367CAL, 0x5C18E4C9L, 0x4F48173DL, 0xBD23943EL,
+ 0xF36E6F75L, 0x0105EC76L, 0x12551F82L, 0xE03E9C81L,
+ 0x34F4F86AL, 0xC69F7B69L, 0xD5CF889DL, 0x27A40B9EL,
+ 0x79B737BAL, 0x8BDCB4B9L, 0x988C474DL, 0x6AE7C44EL,
+ 0xBE2DA0A5L, 0x4C4623A6L, 0x5F16D052L, 0xAD7D5351L
+};
+
+
+uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length)
+{
+ while (length--) {
+ crc = crc32c_table[(crc ^ *data++) & 0xFFL] ^ (crc >> 8);
+ }
+ return crc^0xffffffff;
+}
+
--
1.8.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 1/5] qemu: add castagnoli crc32c checksum algorithm Jeff Cody
@ 2013-04-23 14:24 ` Jeff Cody
2013-04-23 15:10 ` Kevin Wolf
` (2 more replies)
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
` (2 subsequent siblings)
4 siblings, 3 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This is based on Microsoft's VHDX specification:
"VHDX Format Specification v0.95", published 4/12/2012
https://www.microsoft.com/en-us/download/details.aspx?id=29681
These structures define the various header, metadata, and other
block structures defined in the VHDX specification.
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block/vhdx.h | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 327 insertions(+)
create mode 100644 block/vhdx.h
diff --git a/block/vhdx.h b/block/vhdx.h
new file mode 100644
index 0000000..f5cf1ed
--- /dev/null
+++ b/block/vhdx.h
@@ -0,0 +1,327 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ * Jeff Cody <jcody@redhat.com>
+ *
+ * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
+ * by Microsoft:
+ * https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_VHDX_H
+#define BLOCK_VHDX_H
+
+/* Structures and fields present in the VHDX file */
+
+/* The header section has the following blocks,
+ * each block is 64KB:
+ *
+ * _____________________________________________________________________________
+ * | File Id. | Header 1 | Header 2 | Region Table | Reserved (768KB) |
+ * |----------|---------------|------------|--------------|--------------------|
+ * | | | | | |
+ * 0.........64KB...........128KB........192KB..........256KB................1MB
+ */
+
+#define VHDX_HEADER_BLOCK_SIZE (64*1024)
+
+#define VHDX_FILE_ID_OFFSET 0
+#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1)
+#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2)
+#define VHDX_REGION_TABLE_OFFSET (VHDX_HEADER_BLOCK_SIZE*3)
+
+
+/*
+ * A note on the use of MS-GUID fields. For more details on the GUID,
+ * please see: https://en.wikipedia.org/wiki/Globally_unique_identifier.
+ *
+ * The VHDX specification only states that these are MS GUIDs, and which
+ * bytes are data1-data4. It makes no mention of what algorithm should be used
+ * to generate the GUID, nor what standard. However, looking at the specified
+ * known GUID fields, it appears the GUIDs are:
+ * Standard/DCE GUID type (noted by 10b in the MSB of byte 0 of .data4)
+ * Random algorithm (noted by 0x4XXX for .data3)
+ */
+
+/* ---- HEADER SECTION STRUCTURES ---- */
+
+/* Important note: these structures are as defined in the VHDX specification,
+ * including byte order and size. However, without being packed structures,
+ * they will not match 1:1 data read from disk. Rather than use potentially
+ * non-portable packed structures, data is copied from read buffers into
+ * the structures below. However, for reference, please refrain from
+ * modifying these structures to something that does not represent the spec */
+
+#define VHDX_FILE_ID_MAGIC 0x656C696678646876 /* 'vhdxfile' */
+typedef struct vhdx_file_identifier {
+ uint64_t signature; /* "vhdxfile" in ASCII */
+ uint16_t creator[256]; /* optional; utf-16 string to identify
+ the vhdx file creator. Diagnotistic
+ only */
+} vhdx_file_identifier;
+
+
+/* the guid is a 16 byte unique ID - the definition for this used by
+ * Microsoft is not just 16 bytes though - it is a structure that is defined,
+ * so we need to follow it here so that endianness does not trip us up */
+
+typedef struct ms_guid {
+ uint32_t data1;
+ uint16_t data2;
+ uint16_t data3;
+ uint8_t data4[8];
+} ms_guid;
+
+#define guid_eq(a, b) \
+ (memcmp(&(a), &(b), sizeof(ms_guid)) == 0)
+
+#define VHDX_HEADER_SIZE (4*1024) /* although the vhdx_header struct in disk
+ is only 582 bytes, for purposes of crc
+ the header is the first 4KB of the 64KB
+ block */
+
+#define VHDX_HDR_MAGIC 0x64616568 /* 'head' */
+typedef struct QEMU_PACKED vhdx_header {
+ uint32_t signature; /* "head" in ASCII */
+ uint32_t checksum; /* CRC-32C hash of the whole header */
+ uint64_t sequence_number; /* Seq number of this header. Each
+ VHDX file has 2 of these headers,
+ and only the header with the highest
+ sequence number is valid */
+ ms_guid file_write_guid; /* 128 bit unique identifier. Must be
+ updated to new, unique value before
+ the first modification is made to
+ file */
+ ms_guid data_write_guid; /* 128 bit unique identifier. Must be
+ updated to new, unique value before
+ the first modification is made to
+ visible data. Visbile data is
+ defined as:
+ - system & user metadata
+ - raw block data
+ - disk size
+ - any change that will
+ cause the virtual disk
+ sector read to differ
+
+ This does not need to change if
+ blocks are re-arranged */
+ ms_guid log_guid; /* 128 bit unique identifier. If zero,
+ there is no valid log. If non-zero,
+ log entries with this guid are
+ valid. */
+ uint16_t log_version; /* version of the log format. Mustn't be
+ zero, unless log_guid is also zero */
+ uint16_t version; /* version of th evhdx file. Currently,
+ only supported version is "1" */
+ uint32_t log_length; /* length of the log. Must be multiple
+ of 1MB */
+ uint64_t log_offset; /* byte offset in the file of the log.
+ Must also be a multiple of 1MB */
+} vhdx_header;
+
+/* 4KB in packed data size, not to be used except for initial data read */
+typedef struct QEMU_PACKED vhdx_header_padded {
+ vhdx_header header;
+ uint8_t reserved[502]; /* per the VHDX spec */
+ uint8_t reserved_[3514]; /* for the initial packed struct read */
+} vhdx_header_padded;
+
+/* Header for the region table block */
+#define VHDX_RT_MAGIC 0x69676572 /* 'regi ' */
+typedef struct QEMU_PACKED vhdx_region_table_header {
+ uint32_t signature; /* "regi" in ASCII */
+ uint32_t checksum; /* CRC-32C hash of the 64KB table */
+ uint32_t entry_count; /* number of valid entries */
+ uint32_t reserved;
+} vhdx_region_table_header;
+
+/* Individual region table entry. There may be a maximum of 2047 of these
+ *
+ * There are two known region table properties. Both are required.
+ * BAT (block allocation table): 2DC27766F62342009D64115E9BFD4A08
+ * Metadata: 8B7CA20647904B9AB8FE575F050F886E
+ */
+#define VHDX_REGION_ENTRY_REQUIRED 0x01 /* if set, parser must understand
+ this entry in order to open
+ file */
+typedef struct QEMU_PACKED vhdx_region_table_entry {
+ ms_guid guid; /* 128-bit unique identifier */
+ uint64_t file_offset; /* offset of the object in the file.
+ Must be multiple of 1MB */
+ uint32_t length; /* length, in bytes, of the object */
+ uint32_t data_bits;
+} vhdx_region_table_entry;
+
+
+/* ---- LOG ENTRY STRUCTURES ---- */
+#define VHDX_LOG_HDR_SIZE 64
+#define VHDX_LOGE_MAGIC 0x65676F6C /* 'loge' */
+typedef struct QEMU_PACKED vhdx_log_entry_header {
+ uint32_t signature; /* "loge" in ASCII */
+ uint32_t checksum; /* CRC-32C hash of the 64KB table */
+ uint32_t entry_length; /* length in bytes, multiple of 1MB */
+ uint32_t tail; /* byte offset of first log entry of a
+ seq, where this entry is the last
+ entry */
+ uint64_t sequence_number; /* incremented with each log entry.
+ May not be zero. */
+ uint32_t descriptor_count; /* number of descriptors in this log
+ entry, must be >= 0 */
+ uint32_t reserved;
+ ms_guid log_guid; /* value of the log_guid from
+ vhdx_header. If not found in
+ vhdx_header, it is invalid */
+ uint64_t flushed_file_offset; /* see spec for full details - this
+ sould be vhdx file size in bytes */
+ uint64_t last_file_offset; /* size in bytes that all allocated
+ file structures fit into */
+} vhdx_log_entry_header;
+
+#define VHDX_LOG_DESC_SIZE 32
+
+#define VHDX_ZERO_MAGIC 0x6F72657A /* 'zero' */
+#define VHDX_DATA_MAGIC 0x63736564 /* 'desc' */
+typedef struct QEMU_PACKED vhdx_log_descriptor {
+ uint32_t signature; /* "zero" or "desc" in ASCII */
+ union {
+ uint32_t reserved; /* zero desc */
+ uint32_t trailing_bytes; /* data desc: bytes 4092-4096 of the
+ data sector */
+ };
+ union {
+ uint64_t zero_length; /* zero desc: length of the section to
+ zero */
+ uint64_t leading_bytes; /* data desc: bytes 0-7 of the data
+ sector */
+ };
+ uint64_t file_offset; /* file offset to write zeros - multiple
+ of 4kB */
+ uint64_t sequence_number; /* must match same field in
+ vhdx_log_entry_header */
+} vhdx_log_descriptor;
+
+#define VHDX_DATAS_MAGIC 0x61746164 /* 'data' */
+typedef struct QEMU_PACKED vhdx_log_data_sector {
+ uint32_t data_signature; /* "data" in ASCII */
+ uint32_t sequence_high; /* 4 MSB of 8 byte sequence_number */
+ uint8_t data[4084]; /* raw data, bytes 8-4091 (inclusive).
+ see the data descriptor field for the
+ other mising bytes */
+ uint32_t sequence_low; /* 4 LSB of 8 byte sequence_number */
+} vhdx_log_data_sector;
+
+
+
+/* block states - different state values depending on whether it is a
+ * payload block, or a sector block. */
+
+#define PAYLOAD_BLOCK_NOT_PRESENT 0
+#define PAYLOAD_BLOCK_UNDEFINED 1
+#define PAYLOAD_BLOCK_ZERO 2
+#define PAYLOAD_BLOCK_UNMAPPED 5
+#define PAYLOAD_BLOCK_FULL_PRESENT 6
+#define PAYLOAD_BLOCK_PARTIALLY_PRESENT 7
+
+#define SB_BLOCK_NOT_PRESENT 0
+#define SB_BLOCK_PRESENT 6
+
+/* per the spec */
+#define VHDX_MAX_SECTORS_PER_BLOCK (1<<23)
+
+/* upper 44 bits are the file offset in 1MB units lower 3 bits are the state
+ other bits are reserved */
+#define VHDX_BAT_STATE_BIT_MASK 0x07
+#define VHDX_BAT_FILE_OFF_BITS (64-44)
+typedef uint64_t vhdx_bat_entry;
+
+/* ---- METADATA REGION STRUCTURES ---- */
+
+#define VHDX_METADATA_ENTRY_SIZE 32
+#define VHDX_METADATA_MAX_ENTRIES 2047 /* not including the header */
+#define VHDX_METADATA_TABLE_MAX_SIZE \
+ (VHDX_METADATA_ENTRY_SIZE * (VHDX_METADATA_MAX_ENTRIES+1))
+#define VHDX_METADATA_MAGIC 0x617461646174656D /* 'metadata' */
+typedef struct QEMU_PACKED vhdx_metadata_table_header {
+ uint64_t signature; /* "metadata" in ASCII */
+ uint16_t reserved;
+ uint16_t entry_count; /* number table entries. <= 2047 */
+ uint32_t reserved2[5];
+} vhdx_metadata_table_header;
+
+#define VHDX_META_FLAGS_IS_USER 0x01 /* max 1024 entries */
+#define VHDX_META_FLAGS_IS_VIRTUAL_DISK 0x02 /* virtual disk metadata if set,
+ otherwise file metdata */
+#define VHDX_META_FLAGS_IS_REQUIRED 0x04 /* parse must understand this
+ entry to open the file */
+typedef struct QEMU_PACKED vhdx_metadata_table_entry {
+ ms_guid item_id; /* 128-bit identifier for metadata */
+ uint32_t offset; /* byte offset of the metadata. At
+ least 64kB. Relative to start of
+ metadata region */
+ /* note: if length = 0, so is offset */
+ uint32_t length; /* length of metadata. <= 1MB. */
+ uint32_t data_bits; /* least-significant 3 bits are flags, the
+ rest are reserved (see above) */
+ uint32_t reserved2;
+} vhdx_metadata_table_entry;
+
+#define VHDX_PARAMS_LEAVE_BLOCKS_ALLOCED 0x01 /* Do not change any blocks to
+ be BLOCK_NOT_PRESENT.
+ If set indicates a fixed
+ size VHDX file */
+#define VHDX_PARAMS_HAS_PARENT 0x02 /* has parent / backing file */
+typedef struct QEMU_PACKED vhdx_file_parameters {
+ uint32_t block_size; /* size of each payload block, always
+ power of 2, <= 256MB and >= 1MB. */
+ uint32_t data_bits; /* least-significant 2 bits are flags, the rest
+ are reserved (see above) */
+} vhdx_file_parameters;
+
+typedef struct QEMU_PACKED vhdx_virtual_disk_size {
+ uint64_t virtual_disk_size; /* Size of the virtual disk, in bytes.
+ Must be multiple of the sector size,
+ max of 64TB */
+} vhdx_virtual_disk_size;
+
+typedef struct QEMU_PACKED vhdx_page83_data {
+ uint8_t page_83_data[16]; /* unique id for scsi devices that
+ support page 0x83 */
+} vhdx_page83_data;
+
+typedef struct QEMU_PACKED vhdx_virtual_disk_logical_sector_size {
+ uint32_t logical_sector_size; /* virtual disk sector size (in bytes).
+ Can only be 512 or 4096 bytes */
+} vhdx_virtual_disk_logical_sector_size;
+
+typedef struct QEMU_PACKED vhdx_virtual_disk_physical_sector_size {
+ uint32_t physical_sector_size; /* physical sector size (in bytes).
+ Can only be 512 or 4096 bytes */
+} vhdx_virtual_disk_physical_sector_size;
+
+typedef struct QEMU_PACKED vhdx_parent_locator_header {
+ uint8_t locator_type[16]; /* type of the parent virtual disk. */
+ uint16_t reserved;
+ uint16_t key_value_count; /* number of key/value pairs for this
+ locator */
+} vhdx_parent_locator_header;
+
+/* key and value strings are UNICODE strings, UTF-16 LE encoding, no NULs */
+typedef struct QEMU_PACKED vhdx_parent_locator_entry {
+ uint32_t key_offset; /* offset in metadata for key, > 0 */
+ uint32_t value_offset; /* offset in metadata for value, >0 */
+ uint16_t key_length; /* length of entry key, > 0 */
+ uint16_t value_length; /* length of entry value, > 0 */
+} vhdx_parent_locator_entry;
+
+
+/* ----- END VHDX SPECIFICATION STRUCTURES ---- */
+
+#endif
--
1.8.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 1/5] qemu: add castagnoli crc32c checksum algorithm Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
@ 2013-04-23 14:24 ` Jeff Cody
2013-04-23 15:46 ` Kevin Wolf
` (4 more replies)
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images Jeff Cody
4 siblings, 5 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This is the initial block driver framework for VHDX image support (
i.e. Hyper-V image file formats), that supports opening VHDX files, and
parsing the headers.
This commit does not yet enable:
- reading
- writing
- updating the header
- differencing files (images with parents)
- log replay / dirty logs (only clean images)
This is based on Microsoft's VHDX specification:
"VHDX Format Specification v0.95", published 4/12/2012
https://www.microsoft.com/en-us/download/details.aspx?id=29681
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block/Makefile.objs | 1 +
block/vhdx.c | 769 ++++++++++++++++++++++++++++++++++++++++++++++++++++
block/vhdx.h | 25 ++
3 files changed, 795 insertions(+)
create mode 100644 block/vhdx.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 6c4b5bc..5f0358a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
+block-obj-y += vhdx.o
block-obj-y += parallels.o blkdebug.o blkverify.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/vhdx.c b/block/vhdx.c
new file mode 100644
index 0000000..b0ea2ba
--- /dev/null
+++ b/block/vhdx.c
@@ -0,0 +1,769 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ * Jeff Cody <jcody@redhat.com>
+ *
+ * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
+ * by Microsoft:
+ * https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "qemu/module.h"
+#include "qemu/crc32c.h"
+#include "block/vhdx.h"
+
+
+/* Several metadata and region table data entries are identified by
+ * guids in a MS-specific GUID format. */
+
+
+/* ------- Known Region Table GUIDs ---------------------- */
+static const ms_guid bat_guid = { .data1 = 0x2dc27766,
+ .data2 = 0xf623,
+ .data3 = 0x4200,
+ .data4 = { 0x9d, 0x64, 0x11, 0x5e,
+ 0x9b, 0xfd, 0x4a, 0x08} };
+
+static const ms_guid metadata_guid = { .data1 = 0x8b7ca206,
+ .data2 = 0x4790,
+ .data3 = 0x4b9a,
+ .data4 = { 0xb8, 0xfe, 0x57, 0x5f,
+ 0x05, 0x0f, 0x88, 0x6e} };
+
+
+
+/* ------- Known Metadata Entry GUIDs ---------------------- */
+static const ms_guid file_param_guid = { .data1 = 0xcaa16737,
+ .data2 = 0xfa36,
+ .data3 = 0x4d43,
+ .data4 = { 0xb3, 0xb6, 0x33, 0xf0,
+ 0xaa, 0x44, 0xe7, 0x6b} };
+
+static const ms_guid virtual_size_guid = { .data1 = 0x2FA54224,
+ .data2 = 0xcd1b,
+ .data3 = 0x4876,
+ .data4 = { 0xb2, 0x11, 0x5d, 0xbe,
+ 0xd8, 0x3b, 0xf4, 0xb8} };
+
+static const ms_guid page83_guid = { .data1 = 0xbeca12ab,
+ .data2 = 0xb2e6,
+ .data3 = 0x4523,
+ .data4 = { 0x93, 0xef, 0xc3, 0x09,
+ 0xe0, 0x00, 0xc7, 0x46} };
+
+
+static const ms_guid phys_sector_guid = { .data1 = 0xcda348c7,
+ .data2 = 0x445d,
+ .data3 = 0x4471,
+ .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
+ 0x52, 0x51, 0xc5, 0x56} };
+
+static const ms_guid parent_locator_guid = { .data1 = 0xa8d35f2d,
+ .data2 = 0xb30b,
+ .data3 = 0x454d,
+ .data4 = { 0xab, 0xf7, 0xd3,
+ 0xd8, 0x48, 0x34,
+ 0xab, 0x0c} };
+
+static const ms_guid logical_sector_guid = { .data1 = 0x8141bf1d,
+ .data2 = 0xa96f,
+ .data3 = 0x4709,
+ .data4 = { 0xba, 0x47, 0xf2,
+ 0x33, 0xa8, 0xfa,
+ 0xab, 0x5f} };
+
+/* Each parent type must have a valid GUID; this is for parent images
+ * of type 'VHDX'. If we were to allow e.g. a QCOW2 parent, we would
+ * need to make up our own QCOW2 GUID type */
+static const ms_guid parent_vhdx_guid = { .data1 = 0xb04aefb7,
+ .data2 = 0xd19e,
+ .data3 = 0x4a81,
+ .data4 = { 0xb7, 0x89, 0x25, 0xb8,
+ 0xe9, 0x44, 0x59, 0x13} };
+
+
+#define META_FILE_PARAMETER_PRESENT 0x01
+#define META_VIRTUAL_DISK_SIZE_PRESENT 0x02
+#define META_PAGE_83_PRESENT 0x04
+#define META_LOGICAL_SECTOR_SIZE_PRESENT 0x08
+#define META_PHYS_SECTOR_SIZE_PRESENT 0x10
+#define META_PARENT_LOCATOR_PRESENT 0x20
+
+#define META_ALL_PRESENT \
+ (META_FILE_PARAMETER_PRESENT | META_VIRTUAL_DISK_SIZE_PRESENT | \
+ META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
+ META_PHYS_SECTOR_SIZE_PRESENT)
+
+typedef struct vhdx_metadata_entries {
+ vhdx_metadata_table_entry file_parameters_entry;
+ vhdx_metadata_table_entry virtual_disk_size_entry;
+ vhdx_metadata_table_entry page83_data_entry;
+ vhdx_metadata_table_entry logical_sector_size_entry;
+ vhdx_metadata_table_entry phys_sector_size_entry;
+ vhdx_metadata_table_entry parent_locator_entry;
+ uint16_t present;
+} vhdx_metadata_entries;
+
+
+typedef struct BDRVVHDXState {
+ CoMutex lock;
+
+ int curr_header;
+ vhdx_header *headers[2];
+
+ vhdx_region_table_header rt;
+ vhdx_region_table_entry bat_rt; /* region table for the BAT */
+ vhdx_region_table_entry metadata_rt; /* region table for the metadata */
+
+ vhdx_metadata_table_header metadata_hdr;
+ vhdx_metadata_entries metadata_entries;
+
+ vhdx_file_parameters params;
+ uint32_t block_size;
+ uint32_t block_size_bits;
+ uint32_t sectors_per_block;
+ uint32_t sectors_per_block_bits;
+
+ uint64_t virtual_disk_size;
+ uint32_t logical_sector_size;
+ uint32_t physical_sector_size;
+
+ uint64_t chunk_ratio;
+ uint32_t chunk_ratio_bits;
+ uint32_t logical_sector_size_bits;
+
+ uint32_t bat_entries;
+ vhdx_bat_entry *bat;
+ uint64_t bat_offset;
+
+ vhdx_parent_locator_header parent_header;
+ vhdx_parent_locator_entry *parent_entries;
+
+} BDRVVHDXState;
+
+uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
+ int crc_offset)
+{
+ uint32_t crc_new;
+ uint32_t crc_orig;
+ assert(buf != NULL);
+
+ if (crc_offset > 0) {
+ memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
+ memset(buf+crc_offset, 0, sizeof(crc_orig));
+ }
+
+ crc_new = crc32c(crc, buf, size);
+ if (crc_offset > 0) {
+ memcpy(buf+crc_offset, &crc_orig, sizeof(crc_orig));
+ }
+
+ return crc_new;
+}
+
+/* Validates the checksum of the buffer, with an in-place CRC.
+ *
+ * Zero is substituted during crc calculation for the original crc field,
+ * and the crc field is restored afterwards. But the buffer will be modifed
+ * during the calculation, so this may not be not suitable for multi-threaded
+ * use.
+ *
+ * crc_offset: byte offset in buf of the buffer crc
+ * buf: buffer pointer
+ * size: size of buffer (must be > crc_offset+4)
+ *
+ * returns true if checksum is valid, false otherwise
+ */
+bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
+{
+ uint32_t crc_orig;
+ uint32_t crc;
+
+ assert(buf != NULL);
+ assert(size > (crc_offset+4));
+
+ memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
+ crc_orig = le32_to_cpu(crc_orig);
+
+ crc = vhdx_checksum_calc(0xffffffff, buf, size, crc_offset);
+
+ return crc == crc_orig;
+}
+
+
+/*
+ * Per the MS VHDX Specification, for every VHDX file:
+ * - The header section is fixed size - 1 MB
+ * - The header section is always the first "object"
+ * - The first 64KB of the header is the File Identifier
+ * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
+ * - The following 512 bytes constitute a UTF-16 string identifiying the
+ * software that created the file, and is optional and diagnostic only.
+ *
+ * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
+ */
+static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+ if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
+ return 100;
+ }
+ return 0;
+}
+
+/* All VHDX structures on disk are little endian */
+static void vhdx_header_le_import(vhdx_header *h)
+{
+ assert(h != NULL);
+
+ le32_to_cpus(&h->signature);
+ le32_to_cpus(&h->checksum);
+ le64_to_cpus(&h->sequence_number);
+
+ leguid_to_cpus(&h->file_write_guid);
+ leguid_to_cpus(&h->data_write_guid);
+ leguid_to_cpus(&h->log_guid);
+
+ le16_to_cpus(&h->log_version);
+ le16_to_cpus(&h->version);
+ le32_to_cpus(&h->log_length);
+ le64_to_cpus(&h->log_offset);
+}
+
+
+/* opens the specified header block from the VHDX file header section */
+static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
+{
+ int ret = 0;
+ vhdx_header *header1;
+ vhdx_header *header2;
+ uint64_t h1_seq = 0;
+ uint64_t h2_seq = 0;
+ uint8_t *buffer;
+
+ header1 = qemu_blockalign(bs, sizeof(vhdx_header));
+ header2 = qemu_blockalign(bs, sizeof(vhdx_header));
+
+ buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
+
+ s->headers[0] = header1;
+ s->headers[1] = header2;
+
+ /* We have to read the whole VHDX_HEADER_SIZE instead of
+ * sizeof(vhdx_header), because the checksum is over the whole
+ * region */
+ ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+ /* copy over just the relevant portion that we need */
+ memcpy(header1, buffer, sizeof(vhdx_header));
+ vhdx_header_le_import(header1);
+
+ if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
+ header1->signature == VHDX_HDR_MAGIC) {
+ h1_seq = header1->sequence_number;
+ }
+
+ ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+ /* copy over just the relevant portion that we need */
+ memcpy(header2, buffer, sizeof(vhdx_header));
+ vhdx_header_le_import(header2);
+
+ if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
+ header2->signature == VHDX_HDR_MAGIC) {
+ h2_seq = header2->sequence_number;
+ }
+
+ if (h1_seq > h2_seq) {
+ s->curr_header = 0;
+ } else if (h2_seq > h1_seq) {
+ s->curr_header = 1;
+ } else {
+ printf("NO VALID HEADER\n");
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ ret = 0;
+
+ goto exit;
+
+fail:
+ qemu_vfree(header1);
+ qemu_vfree(header2);
+ s->headers[0] = NULL;
+ s->headers[1] = NULL;
+exit:
+ qemu_vfree(buffer);
+ return ret;
+}
+
+
+static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
+{
+ int ret = 0;
+ uint8_t *buffer;
+ int offset = 0;
+ vhdx_region_table_entry rt_entry;
+ int i;
+
+ /* We have to read the whole 64KB block, because the crc32 is over the
+ * whole block */
+ buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
+
+ ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
+ VHDX_HEADER_BLOCK_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+ memcpy(&s->rt, buffer, sizeof(s->rt));
+ le32_to_cpus(&s->rt.signature);
+ le32_to_cpus(&s->rt.checksum);
+ le32_to_cpus(&s->rt.entry_count);
+ le32_to_cpus(&s->rt.reserved);
+ offset += sizeof(s->rt);
+
+ if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
+ s->rt.signature != VHDX_RT_MAGIC) {
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ for (i = 0; i < s->rt.entry_count; i++) {
+ memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
+ offset += sizeof(rt_entry);
+
+ leguid_to_cpus(&rt_entry.guid);
+ le64_to_cpus(&rt_entry.file_offset);
+ le32_to_cpus(&rt_entry.length);
+ le32_to_cpus(&rt_entry.data_bits);
+
+ /* see if we recognize the entry */
+ if (guid_eq(rt_entry.guid, bat_guid)) {
+ s->bat_rt = rt_entry;
+ continue;
+ }
+
+ if (guid_eq(rt_entry.guid, metadata_guid)) {
+ s->metadata_rt = rt_entry;
+ continue;
+ }
+
+ if (rt_entry.data_bits & VHDX_REGION_ENTRY_REQUIRED) {
+ /* cannot read vhdx file - required region table entry that
+ * we do not understand. per spec, we must fail to open */
+ ret = -ENOTSUP;
+ goto fail;
+ }
+ }
+ ret = 0;
+
+fail:
+ qemu_vfree(buffer);
+ return ret;
+}
+
+
+
+/* Metadata initial parser
+ *
+ * This loads all the metadata entry fields. This may cause additional
+ * fields to be processed (e.g. parent locator, etc..).
+ *
+ * There are 5 Metadata items that are always required:
+ * - File Parameters (block size, has a parent)
+ * - Virtual Disk Size (size, in bytes, of the virtual drive)
+ * - Page 83 Data (scsi page 83 guid)
+ * - Logical Sector Size (logical sector size in bytes, either 512 or
+ * 4096. We only support 512 currently)
+ * - Physical Sector Size (512 or 4096)
+ *
+ * Also, if the File Parameters indicate this is a differencing file,
+ * we must also look for the Parent Locator metadata item.
+ */
+static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
+{
+ int ret = 0;
+ uint8_t *buffer;
+ int offset = 0;
+ int i = 0;
+ uint32_t block_size, sectors_per_block, logical_sector_size;
+ uint64_t chunk_ratio;
+ vhdx_metadata_table_entry md_entry;
+
+ buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
+
+ ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
+ VHDX_METADATA_TABLE_MAX_SIZE);
+ if (ret < 0) {
+ goto exit;
+ }
+ memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
+ offset += sizeof(s->metadata_hdr);
+
+ le64_to_cpus(&s->metadata_hdr.signature);
+ le16_to_cpus(&s->metadata_hdr.reserved);
+ le16_to_cpus(&s->metadata_hdr.entry_count);
+
+ if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ s->metadata_entries.present = 0;
+
+ if ((s->metadata_hdr.entry_count * sizeof(md_entry)) >
+ (VHDX_METADATA_TABLE_MAX_SIZE - offset)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ for (i = 0; i < s->metadata_hdr.entry_count; i++) {
+ memcpy(&md_entry, buffer+offset, sizeof(md_entry));
+ offset += sizeof(md_entry);
+
+ leguid_to_cpus(&md_entry.item_id);
+ le32_to_cpus(&md_entry.offset);
+ le32_to_cpus(&md_entry.length);
+ le32_to_cpus(&md_entry.data_bits);
+ le32_to_cpus(&md_entry.reserved2);
+
+ if (guid_eq(md_entry.item_id, file_param_guid)) {
+ s->metadata_entries.file_parameters_entry = md_entry;
+ s->metadata_entries.present |= META_FILE_PARAMETER_PRESENT;
+ continue;
+ }
+
+ if (guid_eq(md_entry.item_id, virtual_size_guid)) {
+ s->metadata_entries.virtual_disk_size_entry = md_entry;
+ s->metadata_entries.present |= META_VIRTUAL_DISK_SIZE_PRESENT;
+ continue;
+ }
+
+ if (guid_eq(md_entry.item_id, page83_guid)) {
+ s->metadata_entries.page83_data_entry = md_entry;
+ s->metadata_entries.present |= META_PAGE_83_PRESENT;
+ continue;
+ }
+
+ if (guid_eq(md_entry.item_id, logical_sector_guid)) {
+ s->metadata_entries.logical_sector_size_entry = md_entry;
+ s->metadata_entries.present |= META_LOGICAL_SECTOR_SIZE_PRESENT;
+ continue;
+ }
+
+ if (guid_eq(md_entry.item_id, phys_sector_guid)) {
+ s->metadata_entries.phys_sector_size_entry = md_entry;
+ s->metadata_entries.present |= META_PHYS_SECTOR_SIZE_PRESENT;
+ continue;
+ }
+
+ if (guid_eq(md_entry.item_id, parent_locator_guid)) {
+ s->metadata_entries.parent_locator_entry = md_entry;
+ s->metadata_entries.present |= META_PARENT_LOCATOR_PRESENT;
+ continue;
+ }
+
+ if (md_entry.data_bits & VHDX_META_FLAGS_IS_REQUIRED) {
+ /* cannot read vhdx file - required region table entry that
+ * we do not understand. per spec, we must fail to open */
+ ret = -ENOTSUP;
+ goto exit;
+ }
+ }
+
+ if (s->metadata_entries.present != META_ALL_PRESENT) {
+ ret = -ENOTSUP;
+ goto exit;
+ }
+
+ ret = bdrv_pread(bs->file,
+ s->metadata_entries.file_parameters_entry.offset
+ + s->metadata_rt.file_offset,
+ &s->params,
+ sizeof(s->params));
+
+ if (ret < 0) {
+ goto exit;
+ }
+
+ le32_to_cpus(&s->params.block_size);
+ le32_to_cpus(&s->params.data_bits);
+
+
+ /* We now have the file parameters, so we can tell if this is a
+ * differencing file (i.e.. has_parent), is dynamic or fixed
+ * sized (leave_blocks_allocated), and the block size */
+
+ /* The parent locator required iff the file parameters has_parent set */
+ if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
+ if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
+ /* TODO: parse parent locator fields */
+ ret = -ENOTSUP; /* temp, until differencing files are supported */
+ goto exit;
+ } else {
+ /* if has_parent is set, but there is not parent locator present,
+ * then that is an invalid combination */
+ ret = -EINVAL;
+ goto exit;
+ }
+ }
+
+ /* determine virtual disk size, logical sector size,
+ * and phys sector size */
+
+ ret = bdrv_pread(bs->file,
+ s->metadata_entries.virtual_disk_size_entry.offset
+ + s->metadata_rt.file_offset,
+ &s->virtual_disk_size,
+ sizeof(uint64_t));
+ if (ret < 0) {
+ goto exit;
+ }
+ ret = bdrv_pread(bs->file,
+ s->metadata_entries.logical_sector_size_entry.offset
+ + s->metadata_rt.file_offset,
+ &s->logical_sector_size,
+ sizeof(uint32_t));
+ if (ret < 0) {
+ goto exit;
+ }
+ ret = bdrv_pread(bs->file,
+ s->metadata_entries.phys_sector_size_entry.offset
+ + s->metadata_rt.file_offset,
+ &s->physical_sector_size,
+ sizeof(uint32_t));
+ if (ret < 0) {
+ goto exit;
+ }
+
+ le64_to_cpus(&s->virtual_disk_size);
+ le32_to_cpus(&s->logical_sector_size);
+ le32_to_cpus(&s->physical_sector_size);
+
+ if (s->logical_sector_size == 0 || s->params.block_size == 0) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ /* both block_size and sector_size are guaranteed powers of 2 */
+ s->sectors_per_block = s->params.block_size / s->logical_sector_size;
+ s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
+ (uint64_t)s->logical_sector_size /
+ (uint64_t)s->params.block_size;
+
+ /* These values are ones we will want to use for division / multiplication
+ * later on, and they are all guaranteed (per the spec) to be powers of 2,
+ * so we can take advantage of that for shift operations during
+ * reads/writes */
+ logical_sector_size = s->logical_sector_size;
+ if (logical_sector_size & (logical_sector_size - 1)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ sectors_per_block = s->sectors_per_block;
+ if (sectors_per_block & (sectors_per_block - 1)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+ chunk_ratio = s->chunk_ratio;
+ if (chunk_ratio & (chunk_ratio - 1)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+ block_size = s->params.block_size;
+ if (block_size & (block_size - 1)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ while (logical_sector_size >>= 1) {
+ s->logical_sector_size_bits++;
+ }
+ while (sectors_per_block >>= 1) {
+ s->sectors_per_block_bits++;
+ }
+ while (chunk_ratio >>= 1) {
+ s->chunk_ratio_bits++;
+ }
+ while (block_size >>= 1) {
+ s->block_size_bits++;
+ }
+
+ if (s->logical_sector_size != BDRV_SECTOR_SIZE) {
+ printf("VHDX error - QEMU only supports 512 byte sector sizes\n");
+ ret = -ENOTSUP;
+ goto exit;
+ }
+
+ ret = 0;
+
+exit:
+ qemu_vfree(buffer);
+ return ret;
+}
+
+/* Parse the replay log. Per the VHDX spec, if the log is present
+ * it must be replayed prior to opening the file, even read-only.
+ *
+ * If read-only, we must replay the log in RAM (or refuse to open
+ * a dirty VHDX file read-only */
+static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
+{
+ int ret = 0;
+ int i;
+ vhdx_header *hdr;
+
+ hdr = s->headers[s->curr_header];
+
+ /* either either the log guid, or log length is zero,
+ * then a replay log is present */
+ for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
+ ret |= hdr->log_guid.data4[i];
+ }
+ if (hdr->log_guid.data1 == 0 &&
+ hdr->log_guid.data2 == 0 &&
+ hdr->log_guid.data3 == 0 &&
+ ret == 0) {
+ goto exit;
+ }
+
+ /* per spec, only log version of 0 is supported */
+ if (hdr->log_version != 0) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ if (hdr->log_length == 0) {
+ goto exit;
+ }
+
+ /* We currently do not support images with logs to replay */
+ ret = -ENOTSUP;
+
+exit:
+ return ret;
+}
+
+
+static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
+{
+ BDRVVHDXState *s = bs->opaque;
+ int ret = 0;
+ int i;
+
+ s->bat = NULL;
+
+ qemu_co_mutex_init(&s->lock);
+
+ ret = vhdx_parse_header(bs, s);
+ if (ret) {
+ goto fail;
+ }
+
+ ret = vhdx_parse_log(bs, s);
+ if (ret) {
+ goto fail;
+ }
+
+ ret = vhdx_open_region_tables(bs, s);
+ if (ret) {
+ goto fail;
+ }
+
+ ret = vhdx_parse_metadata(bs, s);
+ if (ret) {
+ goto fail;
+ }
+ s->block_size = s->params.block_size;
+
+ /* the VHDX spec dictates that virtual_disk_size is always a multiple of
+ * logical_sector_size */
+ bs->total_sectors = s->virtual_disk_size / s->logical_sector_size;
+
+ s->bat_offset = s->bat_rt.file_offset;
+ s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
+ s->bat = qemu_blockalign(bs, s->bat_rt.length);
+
+ ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
+
+ for (i = 0; i < s->bat_entries; i++) {
+ le64_to_cpus(&s->bat[i]);
+ }
+
+ if (flags & BDRV_O_RDWR) {
+ ret = -ENOTSUP;
+ goto fail;
+ }
+
+ /* TODO: differencing files, read, write */
+
+ return 0;
+fail:
+ qemu_vfree(s->bat);
+ return ret;
+}
+
+static int vhdx_reopen_prepare(BDRVReopenState *state,
+ BlockReopenQueue *queue, Error **errp)
+{
+ return 0;
+}
+
+
+static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
+{
+ return -ENOTSUP;
+}
+
+
+
+static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
+{
+ return -ENOTSUP;
+}
+
+
+static void vhdx_close(BlockDriverState *bs)
+{
+ BDRVVHDXState *s = bs->opaque;
+
+ qemu_vfree(s->headers[0]);
+ qemu_vfree(s->headers[1]);
+ qemu_vfree(s->bat);
+ qemu_vfree(s->parent_entries);
+}
+
+static BlockDriver bdrv_vhdx = {
+ .format_name = "vhdx",
+ .instance_size = sizeof(BDRVVHDXState),
+
+ .bdrv_probe = vhdx_probe,
+ .bdrv_open = vhdx_open,
+ .bdrv_close = vhdx_close,
+ .bdrv_reopen_prepare = vhdx_reopen_prepare,
+ .bdrv_co_readv = vhdx_co_readv,
+ .bdrv_co_writev = vhdx_co_writev,
+};
+
+static void bdrv_vhdx_init(void)
+{
+ bdrv_register(&bdrv_vhdx);
+}
+
+block_init(bdrv_vhdx_init);
diff --git a/block/vhdx.h b/block/vhdx.h
index f5cf1ed..d71efac 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -324,4 +324,29 @@ typedef struct QEMU_PACKED vhdx_parent_locator_entry {
/* ----- END VHDX SPECIFICATION STRUCTURES ---- */
+
+uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
+ int crc_offset);
+
+bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
+
+
+static void leguid_to_cpus(ms_guid *guid)
+{
+ le32_to_cpus(&guid->data1);
+ le16_to_cpus(&guid->data2);
+ le16_to_cpus(&guid->data3);
+}
+
+static void cpu_to_leguids(ms_guid *guid)
+{
+ cpu_to_le32s(&guid->data1);
+ cpu_to_le16s(&guid->data2);
+ cpu_to_le16s(&guid->data3);
+}
+
+
+
+
+
#endif
--
1.8.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format.
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
` (2 preceding siblings ...)
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
@ 2013-04-23 14:24 ` Jeff Cody
2013-04-24 14:38 ` Stefan Hajnoczi
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images Jeff Cody
4 siblings, 1 reply; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This adds in read-only support to the VHDX image format. This supports
reads for fixed-size, and dynamic sized VHDX images.
Differencing files are still unsupported.
The image must be opened without BDRV_O_RDWR set, because we do not
yet update the headers. I.e., pass 'readonly=on' in the drive image
options from the QEMU commandline.
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block/vhdx.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 121 insertions(+), 2 deletions(-)
diff --git a/block/vhdx.c b/block/vhdx.c
index b0ea2ba..ca4a298 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -114,6 +114,17 @@ typedef struct vhdx_metadata_entries {
} vhdx_metadata_entries;
+typedef struct vhdx_sector_info {
+ uint32_t bat_idx; /* BAT entry index */
+ uint32_t sectors_avail; /* sectors available in payload block */
+ uint32_t bytes_left; /* bytes left in the block after data to r/w */
+ uint32_t bytes_avail; /* bytes available in payload block */
+ uint64_t file_offset; /* absolute offset in bytes, in file */
+ uint64_t block_offset; /* block offset, in bytes */
+} vhdx_sector_info;
+
+
+
typedef struct BDRVVHDXState {
CoMutex lock;
@@ -709,7 +720,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
goto fail;
}
- /* TODO: differencing files, read, write */
+ /* TODO: differencing files, write */
return 0;
fail:
@@ -724,10 +735,118 @@ static int vhdx_reopen_prepare(BDRVReopenState *state,
}
+/*
+ * Perform sector to block offset translations, to get various
+ * sector and file offsets into the image. See vhdx_sector_info
+ */
+static void vhdx_block_translate(BDRVVHDXState *s, int64_t sector_num,
+ int nb_sectors, vhdx_sector_info *sinfo)
+{
+ uint32_t block_offset;
+
+ sinfo->bat_idx = sector_num >> s->sectors_per_block_bits;
+ /* effectively a modulo - this gives us the offset into the block
+ * (in sector sizes) for our sector number */
+ block_offset = sector_num - (sinfo->bat_idx << s->sectors_per_block_bits);
+ /* the chunk ratio gives us the interleaving of the sector
+ * bitmaps, so we need to advance our page block index by the
+ * sector bitmaps entry number */
+ sinfo->bat_idx += sinfo->bat_idx >> s->chunk_ratio_bits;
+
+ /* the number of sectors we can read/write in this cycle */
+ sinfo->sectors_avail = s->sectors_per_block - block_offset;
+
+ sinfo->bytes_left = sinfo->sectors_avail << s->logical_sector_size_bits;
+
+ if (sinfo->sectors_avail > nb_sectors) {
+ sinfo->sectors_avail = nb_sectors;
+ }
+
+ sinfo->bytes_avail = sinfo->sectors_avail << s->logical_sector_size_bits;
+
+ sinfo->file_offset = s->bat[sinfo->bat_idx] >> VHDX_BAT_FILE_OFF_BITS;
+
+ sinfo->block_offset = block_offset << s->logical_sector_size_bits;
+
+ /* The file offset must be past the header section, so must be > 0 */
+ if (sinfo->file_offset == 0) {
+ return;
+ }
+
+ /* block offset is the offset in vhdx logical sectors, in
+ * the payload data block. Convert that to a byte offset
+ * in the block, and add in the payload data block offset
+ * in the file, in bytes, to get the final read address */
+
+ sinfo->file_offset <<= 20; /* now in bytes, rather than 1MB units */
+ sinfo->file_offset += sinfo->block_offset;
+}
+
+
+
static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
{
- return -ENOTSUP;
+ BDRVVHDXState *s = bs->opaque;
+ int ret = 0;
+ vhdx_sector_info sinfo;
+ uint64_t bytes_done = 0;
+ QEMUIOVector hd_qiov;
+
+ qemu_iovec_init(&hd_qiov, qiov->niov);
+
+ qemu_co_mutex_lock(&s->lock);
+
+ while (nb_sectors > 0) {
+ /* We are a differencing file, so we need to inspect the sector bitmap
+ * to see if we have the data or not */
+ if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
+ /* not supported yet */
+ ret = -ENOTSUP;
+ goto exit;
+ } else {
+ vhdx_block_translate(s, sector_num, nb_sectors, &sinfo);
+
+ qemu_iovec_reset(&hd_qiov);
+ qemu_iovec_concat(&hd_qiov, qiov, bytes_done, sinfo.bytes_avail);
+
+ /* check the payload block state */
+ switch (s->bat[sinfo.bat_idx] & VHDX_BAT_STATE_BIT_MASK) {
+ case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */
+ case PAYLOAD_BLOCK_UNDEFINED: /* fall through */
+ case PAYLOAD_BLOCK_UNMAPPED: /* fall through */
+ case PAYLOAD_BLOCK_ZERO:
+ /* return zero */
+ qemu_iovec_memset(&hd_qiov, 0, 0, sinfo.bytes_avail);
+ break;
+ case PAYLOAD_BLOCK_FULL_PRESENT:
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(bs->file,
+ sinfo.file_offset >> BDRV_SECTOR_BITS,
+ sinfo.sectors_avail, &hd_qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ goto exit;
+ }
+ break;
+ case PAYLOAD_BLOCK_PARTIALLY_PRESENT:
+ /* we don't yet support difference files, fall through
+ * to error */
+ default:
+ ret = -EIO;
+ goto exit;
+ break;
+ }
+ nb_sectors -= sinfo.sectors_avail;
+ sector_num += sinfo.sectors_avail;
+ bytes_done += sinfo.bytes_avail;
+ }
+ }
+ ret = 0;
+exit:
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_iovec_destroy(&hd_qiov);
+ return ret;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
` (3 preceding siblings ...)
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format Jeff Cody
@ 2013-04-23 14:24 ` Jeff Cody
2013-04-24 14:47 ` Stefan Hajnoczi
2013-04-28 10:05 ` Fam Zheng
4 siblings, 2 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 14:24 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha
This adds the ability to update the headers in a VHDX image, including
generating a new MS-compatible GUID.
As VHDX depends on uuid.h, VHDX is now a configurable build option.
If VHDX support is enabled, that will also enable uuid as well.
To enable/disable VHDX: --enable-vhdx, --disable-vhdx
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block/Makefile.objs | 2 +-
block/vhdx.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
configure | 13 +++++
3 files changed, 169 insertions(+), 3 deletions(-)
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5f0358a..da0a990 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
-block-obj-y += vhdx.o
+block-obj-$(CONFIG_VHDX) += vhdx.o
block-obj-y += parallels.o blkdebug.o blkverify.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/vhdx.c b/block/vhdx.c
index ca4a298..272304d 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -21,6 +21,7 @@
#include "qemu/crc32c.h"
#include "block/vhdx.h"
+#include <uuid/uuid.h>
/* Several metadata and region table data entries are identified by
* guids in a MS-specific GUID format. */
@@ -156,11 +157,40 @@ typedef struct BDRVVHDXState {
vhdx_bat_entry *bat;
uint64_t bat_offset;
+ ms_guid session_guid;
+
+
vhdx_parent_locator_header parent_header;
vhdx_parent_locator_entry *parent_entries;
} BDRVVHDXState;
+/* Calculates new checksum.
+ *
+ * Zero is substituted during crc calculation for the original crc field
+ * crc_offset: byte offset in buf of the buffer crc
+ * buf: buffer pointer
+ * size: size of buffer (must be > crc_offset+4)
+ *
+ * Note: The resulting checksum is in the CPU endianness, not necessarily
+ * in the file format endianness (LE). Any header export to disk should
+ * make sure that vhdx_header_le_export() is used to convert to the
+ * correct endianness
+ */
+static uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
+{
+ uint32_t crc;
+
+ assert(buf != NULL);
+ assert(size > (crc_offset+4));
+
+ memset(buf+crc_offset, 0, sizeof(crc));
+ crc = crc32c(0xffffffff, buf, size);
+ memcpy(buf+crc_offset, &crc, sizeof(crc));
+
+ return crc;
+}
+
uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
int crc_offset)
{
@@ -212,6 +242,24 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
/*
+ * This generates a UUID that is compliant with the MS GUIDs used
+ * in the VHDX spec (and elsewhere).
+ *
+ * We can do this with uuid_generate if uuid.h is present,
+ * however not all systems have uuid and the generation is
+ * pretty straightforward for the DCE + random usage case
+ *
+ */
+static void vhdx_guid_generate(ms_guid *guid)
+{
+ uuid_t uuid;
+ assert(guid != NULL);
+
+ uuid_generate(uuid);
+ memcpy(guid, uuid, 16);
+}
+
+/*
* Per the MS VHDX Specification, for every VHDX file:
* - The header section is fixed size - 1 MB
* - The header section is always the first "object"
@@ -249,6 +297,107 @@ static void vhdx_header_le_import(vhdx_header *h)
le64_to_cpus(&h->log_offset);
}
+/* All VHDX structures on disk are little endian */
+static void vhdx_header_le_export(vhdx_header *orig_h, vhdx_header *new_h)
+{
+ assert(orig_h != NULL);
+ assert(new_h != NULL);
+
+ new_h->signature = cpu_to_le32(orig_h->signature);
+ new_h->checksum = cpu_to_le32(orig_h->checksum);
+ new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
+
+ memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(ms_guid));
+ memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(ms_guid));
+ memcpy(&new_h->log_guid, &orig_h->log_guid, sizeof(ms_guid));
+
+ cpu_to_leguids(&new_h->file_write_guid);
+ cpu_to_leguids(&new_h->data_write_guid);
+ cpu_to_leguids(&new_h->log_guid);
+
+ new_h->log_version = cpu_to_le16(orig_h->log_version);
+ new_h->version = cpu_to_le16(orig_h->version);
+ new_h->log_length = cpu_to_le32(orig_h->log_length);
+ new_h->log_offset = cpu_to_le64(orig_h->log_offset);
+}
+
+/* Update the VHDX headers
+ *
+ * This follows the VHDX spec procedures for header updates.
+ *
+ * - non-current header is updated with largest sequence number
+ */
+static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+{
+ int ret = 0;
+ int hdr_idx = 0;
+ uint64_t header_offset = VHDX_HEADER1_OFFSET;
+
+ vhdx_header *active_header;
+ vhdx_header *inactive_header;
+ vhdx_header header_le;
+ uint8_t *buffer;
+
+ /* operate on the non-current header */
+ if (s->curr_header == 0) {
+ hdr_idx = 1;
+ header_offset = VHDX_HEADER2_OFFSET;
+ }
+
+ active_header = s->headers[s->curr_header];
+ inactive_header = s->headers[hdr_idx];
+
+ inactive_header->sequence_number = active_header->sequence_number + 1;
+
+ /* a new file guid must be generate before any file write, including
+ * headers */
+ memcpy(&inactive_header->file_write_guid, &s->session_guid,
+ sizeof(ms_guid));
+
+ /* a new data guid only needs to be generate before any guest-visisble
+ * writes, so update it if the image is opened r/w. */
+ if (rw) {
+ vhdx_guid_generate(&inactive_header->data_write_guid);
+ }
+
+ /* the header checksum is not over just the packed size of vhdx_header,
+ * but rather over the entire 'reserved' range for the header, which is
+ * 4KB (VHDX_HEADER_SIZE). */
+
+ buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
+ /* we can't assume the extra reserved bytes are 0 */
+ ret = bdrv_pread(bs->file, header_offset, buffer, VHDX_HEADER_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+ /* overwrite the actual vhdx_header portion */
+ memcpy(buffer, inactive_header, sizeof(vhdx_header));
+ inactive_header->checksum = vhdx_update_checksum(buffer,
+ VHDX_HEADER_SIZE, 4);
+ vhdx_header_le_export(inactive_header, &header_le);
+ bdrv_pwrite_sync(bs->file, header_offset, &header_le, sizeof(vhdx_header));
+ s->curr_header = hdr_idx;
+
+fail:
+ qemu_vfree(buffer);
+ return ret;
+}
+
+/*
+ * The VHDX spec calls for header updates to be performed twice, so that both
+ * the current and non-current header have valid info
+ */
+static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+{
+ int ret;
+
+ ret = vhdx_update_header(bs, s, rw);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = vhdx_update_header(bs, s, rw);
+ return ret;
+}
/* opens the specified header block from the VHDX file header section */
static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
@@ -680,6 +829,11 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
qemu_co_mutex_init(&s->lock);
+ /* This is used for any header updates, for the file_write_guid.
+ * The spec dictates that a new value should be used for the first
+ * header update */
+ vhdx_guid_generate(&s->session_guid);
+
ret = vhdx_parse_header(bs, s);
if (ret) {
goto fail;
@@ -716,8 +870,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
}
if (flags & BDRV_O_RDWR) {
- ret = -ENOTSUP;
- goto fail;
+ vhdx_update_headers(bs, s, false);
}
/* TODO: differencing files, write */
diff --git a/configure b/configure
index 51a6c56..9b048ab 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ gtk=""
gtkabi="2.0"
tpm="no"
libssh2=""
+vhdx="no"
# parse CC options first
for opt do
@@ -934,6 +935,11 @@ for opt do
;;
--enable-libssh2) libssh2="yes"
;;
+ --enable-vhdx) vhdx="yes" ;
+ uuid="yes"
+ ;;
+ --disable-vhdx) vhdx="no"
+ ;;
*) echo "ERROR: unknown option $opt"; show_help="yes"
;;
esac
@@ -1201,6 +1207,8 @@ echo " --gcov=GCOV use specified gcov [$gcov_tool]"
echo " --enable-tpm enable TPM support"
echo " --disable-libssh2 disable ssh block device support"
echo " --enable-libssh2 enable ssh block device support"
+echo " --disable-vhdx disables support for the Microsoft VHDX image format"
+echo " --enable-vhdx enable support for the Microsoft VHDX image format"
echo ""
echo "NOTE: The object files are built at the place where configure is launched"
exit 1
@@ -3580,6 +3588,7 @@ echo "gcov enabled $gcov"
echo "TPM support $tpm"
echo "libssh2 support $libssh2"
echo "TPM passthrough $tpm_passthrough"
+echo "vhdx $vhdx"
if test "$sdl_too_old" = "yes"; then
echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3968,6 +3977,10 @@ if test "$virtio_blk_data_plane" = "yes" ; then
echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
fi
+if test "$vhdx" = "yes" ; then
+ echo "CONFIG_VHDX=y" >> $config_host_mak
+fi
+
# USB host support
case "$usb" in
linux)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
@ 2013-04-23 15:10 ` Kevin Wolf
2013-04-23 16:32 ` Jeff Cody
2013-04-24 12:31 ` Stefan Hajnoczi
2013-04-25 13:05 ` Kevin Wolf
2 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2013-04-23 15:10 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> This is based on Microsoft's VHDX specification:
> "VHDX Format Specification v0.95", published 4/12/2012
> https://www.microsoft.com/en-us/download/details.aspx?id=29681
>
> These structures define the various header, metadata, and other
> block structures defined in the VHDX specification.
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> block/vhdx.h | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 327 insertions(+)
> create mode 100644 block/vhdx.h
>
> diff --git a/block/vhdx.h b/block/vhdx.h
> new file mode 100644
> index 0000000..f5cf1ed
> --- /dev/null
> +++ b/block/vhdx.h
> @@ -0,0 +1,327 @@
> +/*
> + * Block driver for Hyper-V VHDX Images
> + *
> + * Copyright (c) 2013 Red Hat, Inc.,
> + *
> + * Authors:
> + * Jeff Cody <jcody@redhat.com>
> + *
> + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> + * by Microsoft:
> + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_VHDX_H
> +#define BLOCK_VHDX_H
> +
> +/* Structures and fields present in the VHDX file */
> +
> +/* The header section has the following blocks,
> + * each block is 64KB:
> + *
> + * _____________________________________________________________________________
> + * | File Id. | Header 1 | Header 2 | Region Table | Reserved (768KB) |
> + * |----------|---------------|------------|--------------|--------------------|
> + * | | | | | |
> + * 0.........64KB...........128KB........192KB..........256KB................1MB
> + */
> +
> +#define VHDX_HEADER_BLOCK_SIZE (64*1024)
> +
> +#define VHDX_FILE_ID_OFFSET 0
> +#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1)
> +#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2)
> +#define VHDX_REGION_TABLE_OFFSET (VHDX_HEADER_BLOCK_SIZE*3)
> +
> +
> +/*
> + * A note on the use of MS-GUID fields. For more details on the GUID,
> + * please see: https://en.wikipedia.org/wiki/Globally_unique_identifier.
> + *
> + * The VHDX specification only states that these are MS GUIDs, and which
> + * bytes are data1-data4. It makes no mention of what algorithm should be used
> + * to generate the GUID, nor what standard. However, looking at the specified
> + * known GUID fields, it appears the GUIDs are:
> + * Standard/DCE GUID type (noted by 10b in the MSB of byte 0 of .data4)
> + * Random algorithm (noted by 0x4XXX for .data3)
> + */
> +
> +/* ---- HEADER SECTION STRUCTURES ---- */
> +
> +/* Important note: these structures are as defined in the VHDX specification,
> + * including byte order and size. However, without being packed structures,
> + * they will not match 1:1 data read from disk. Rather than use potentially
> + * non-portable packed structures, data is copied from read buffers into
> + * the structures below. However, for reference, please refrain from
> + * modifying these structures to something that does not represent the spec */
> +
> +#define VHDX_FILE_ID_MAGIC 0x656C696678646876 /* 'vhdxfile' */
> +typedef struct vhdx_file_identifier {
According to the qemu coding style, this is not a valid name for a
struct. I think you can choose between VHDXFileIdentifier and
VhdxFileIdentifier, where existing code tends towards the former.
More instances of this follow, but I won't comment on each.
> + uint64_t signature; /* "vhdxfile" in ASCII */
> + uint16_t creator[256]; /* optional; utf-16 string to identify
> + the vhdx file creator. Diagnotistic
> + only */
> +} vhdx_file_identifier;
> +
> +
> +/* the guid is a 16 byte unique ID - the definition for this used by
> + * Microsoft is not just 16 bytes though - it is a structure that is defined,
> + * so we need to follow it here so that endianness does not trip us up */
> +
> +typedef struct ms_guid {
> + uint32_t data1;
> + uint16_t data2;
> + uint16_t data3;
> + uint8_t data4[8];
> +} ms_guid;
> +
> +#define guid_eq(a, b) \
> + (memcmp(&(a), &(b), sizeof(ms_guid)) == 0)
> +
> +#define VHDX_HEADER_SIZE (4*1024) /* although the vhdx_header struct in disk
> + is only 582 bytes, for purposes of crc
> + the header is the first 4KB of the 64KB
> + block */
> +
> +#define VHDX_HDR_MAGIC 0x64616568 /* 'head' */
> +typedef struct QEMU_PACKED vhdx_header {
> + uint32_t signature; /* "head" in ASCII */
> + uint32_t checksum; /* CRC-32C hash of the whole header */
> + uint64_t sequence_number; /* Seq number of this header. Each
> + VHDX file has 2 of these headers,
> + and only the header with the highest
> + sequence number is valid */
> + ms_guid file_write_guid; /* 128 bit unique identifier. Must be
> + updated to new, unique value before
> + the first modification is made to
> + file */
> + ms_guid data_write_guid; /* 128 bit unique identifier. Must be
> + updated to new, unique value before
> + the first modification is made to
> + visible data. Visbile data is
> + defined as:
> + - system & user metadata
> + - raw block data
> + - disk size
> + - any change that will
> + cause the virtual disk
> + sector read to differ
> +
> + This does not need to change if
> + blocks are re-arranged */
> + ms_guid log_guid; /* 128 bit unique identifier. If zero,
> + there is no valid log. If non-zero,
> + log entries with this guid are
> + valid. */
> + uint16_t log_version; /* version of the log format. Mustn't be
> + zero, unless log_guid is also zero */
> + uint16_t version; /* version of th evhdx file. Currently,
> + only supported version is "1" */
> + uint32_t log_length; /* length of the log. Must be multiple
> + of 1MB */
> + uint64_t log_offset; /* byte offset in the file of the log.
> + Must also be a multiple of 1MB */
> +} vhdx_header;
> +
> +/* 4KB in packed data size, not to be used except for initial data read */
> +typedef struct QEMU_PACKED vhdx_header_padded {
> + vhdx_header header;
> + uint8_t reserved[502]; /* per the VHDX spec */
> + uint8_t reserved_[3514]; /* for the initial packed struct read */
> +} vhdx_header_padded;
> +
> +/* Header for the region table block */
> +#define VHDX_RT_MAGIC 0x69676572 /* 'regi ' */
> +typedef struct QEMU_PACKED vhdx_region_table_header {
> + uint32_t signature; /* "regi" in ASCII */
> + uint32_t checksum; /* CRC-32C hash of the 64KB table */
> + uint32_t entry_count; /* number of valid entries */
> + uint32_t reserved;
> +} vhdx_region_table_header;
> +
> +/* Individual region table entry. There may be a maximum of 2047 of these
> + *
> + * There are two known region table properties. Both are required.
> + * BAT (block allocation table): 2DC27766F62342009D64115E9BFD4A08
> + * Metadata: 8B7CA20647904B9AB8FE575F050F886E
> + */
> +#define VHDX_REGION_ENTRY_REQUIRED 0x01 /* if set, parser must understand
> + this entry in order to open
> + file */
> +typedef struct QEMU_PACKED vhdx_region_table_entry {
> + ms_guid guid; /* 128-bit unique identifier */
> + uint64_t file_offset; /* offset of the object in the file.
> + Must be multiple of 1MB */
> + uint32_t length; /* length, in bytes, of the object */
> + uint32_t data_bits;
> +} vhdx_region_table_entry;
> +
> +
> +/* ---- LOG ENTRY STRUCTURES ---- */
> +#define VHDX_LOG_HDR_SIZE 64
> +#define VHDX_LOGE_MAGIC 0x65676F6C /* 'loge' */
> +typedef struct QEMU_PACKED vhdx_log_entry_header {
> + uint32_t signature; /* "loge" in ASCII */
> + uint32_t checksum; /* CRC-32C hash of the 64KB table */
> + uint32_t entry_length; /* length in bytes, multiple of 1MB */
> + uint32_t tail; /* byte offset of first log entry of a
> + seq, where this entry is the last
> + entry */
> + uint64_t sequence_number; /* incremented with each log entry.
> + May not be zero. */
> + uint32_t descriptor_count; /* number of descriptors in this log
> + entry, must be >= 0 */
> + uint32_t reserved;
> + ms_guid log_guid; /* value of the log_guid from
> + vhdx_header. If not found in
> + vhdx_header, it is invalid */
> + uint64_t flushed_file_offset; /* see spec for full details - this
> + sould be vhdx file size in bytes */
> + uint64_t last_file_offset; /* size in bytes that all allocated
> + file structures fit into */
> +} vhdx_log_entry_header;
> +
> +#define VHDX_LOG_DESC_SIZE 32
> +
> +#define VHDX_ZERO_MAGIC 0x6F72657A /* 'zero' */
> +#define VHDX_DATA_MAGIC 0x63736564 /* 'desc' */
> +typedef struct QEMU_PACKED vhdx_log_descriptor {
> + uint32_t signature; /* "zero" or "desc" in ASCII */
> + union {
> + uint32_t reserved; /* zero desc */
> + uint32_t trailing_bytes; /* data desc: bytes 4092-4096 of the
> + data sector */
> + };
Indentation inside the union is off.
> + union {
> + uint64_t zero_length; /* zero desc: length of the section to
> + zero */
> + uint64_t leading_bytes; /* data desc: bytes 0-7 of the data
> + sector */
> + };
Here as well.
> + uint64_t file_offset; /* file offset to write zeros - multiple
> + of 4kB */
> + uint64_t sequence_number; /* must match same field in
> + vhdx_log_entry_header */
> +} vhdx_log_descriptor;
> +
> +#define VHDX_DATAS_MAGIC 0x61746164 /* 'data' */
> +typedef struct QEMU_PACKED vhdx_log_data_sector {
> + uint32_t data_signature; /* "data" in ASCII */
> + uint32_t sequence_high; /* 4 MSB of 8 byte sequence_number */
> + uint8_t data[4084]; /* raw data, bytes 8-4091 (inclusive).
> + see the data descriptor field for the
> + other mising bytes */
> + uint32_t sequence_low; /* 4 LSB of 8 byte sequence_number */
> +} vhdx_log_data_sector;
> +
> +
> +
> +/* block states - different state values depending on whether it is a
> + * payload block, or a sector block. */
> +
> +#define PAYLOAD_BLOCK_NOT_PRESENT 0
> +#define PAYLOAD_BLOCK_UNDEFINED 1
> +#define PAYLOAD_BLOCK_ZERO 2
> +#define PAYLOAD_BLOCK_UNMAPPED 5
> +#define PAYLOAD_BLOCK_FULL_PRESENT 6
> +#define PAYLOAD_BLOCK_PARTIALLY_PRESENT 7
Looks like an enum?
> +#define SB_BLOCK_NOT_PRESENT 0
> +#define SB_BLOCK_PRESENT 6
> +
> +/* per the spec */
> +#define VHDX_MAX_SECTORS_PER_BLOCK (1<<23)
> +
> +/* upper 44 bits are the file offset in 1MB units lower 3 bits are the state
> + other bits are reserved */
> +#define VHDX_BAT_STATE_BIT_MASK 0x07
> +#define VHDX_BAT_FILE_OFF_BITS (64-44)
> +typedef uint64_t vhdx_bat_entry;
> +
> +/* ---- METADATA REGION STRUCTURES ---- */
> +
> +#define VHDX_METADATA_ENTRY_SIZE 32
> +#define VHDX_METADATA_MAX_ENTRIES 2047 /* not including the header */
> +#define VHDX_METADATA_TABLE_MAX_SIZE \
> + (VHDX_METADATA_ENTRY_SIZE * (VHDX_METADATA_MAX_ENTRIES+1))
> +#define VHDX_METADATA_MAGIC 0x617461646174656D /* 'metadata' */
> +typedef struct QEMU_PACKED vhdx_metadata_table_header {
> + uint64_t signature; /* "metadata" in ASCII */
> + uint16_t reserved;
> + uint16_t entry_count; /* number table entries. <= 2047 */
> + uint32_t reserved2[5];
> +} vhdx_metadata_table_header;
> +
> +#define VHDX_META_FLAGS_IS_USER 0x01 /* max 1024 entries */
> +#define VHDX_META_FLAGS_IS_VIRTUAL_DISK 0x02 /* virtual disk metadata if set,
> + otherwise file metdata */
> +#define VHDX_META_FLAGS_IS_REQUIRED 0x04 /* parse must understand this
> + entry to open the file */
> +typedef struct QEMU_PACKED vhdx_metadata_table_entry {
> + ms_guid item_id; /* 128-bit identifier for metadata */
> + uint32_t offset; /* byte offset of the metadata. At
> + least 64kB. Relative to start of
> + metadata region */
> + /* note: if length = 0, so is offset */
> + uint32_t length; /* length of metadata. <= 1MB. */
> + uint32_t data_bits; /* least-significant 3 bits are flags, the
> + rest are reserved (see above) */
> + uint32_t reserved2;
> +} vhdx_metadata_table_entry;
> +
> +#define VHDX_PARAMS_LEAVE_BLOCKS_ALLOCED 0x01 /* Do not change any blocks to
> + be BLOCK_NOT_PRESENT.
> + If set indicates a fixed
> + size VHDX file */
> +#define VHDX_PARAMS_HAS_PARENT 0x02 /* has parent / backing file */
> +typedef struct QEMU_PACKED vhdx_file_parameters {
> + uint32_t block_size; /* size of each payload block, always
> + power of 2, <= 256MB and >= 1MB. */
> + uint32_t data_bits; /* least-significant 2 bits are flags, the rest
> + are reserved (see above) */
> +} vhdx_file_parameters;
> +
> +typedef struct QEMU_PACKED vhdx_virtual_disk_size {
> + uint64_t virtual_disk_size; /* Size of the virtual disk, in bytes.
> + Must be multiple of the sector size,
> + max of 64TB */
> +} vhdx_virtual_disk_size;
> +
> +typedef struct QEMU_PACKED vhdx_page83_data {
> + uint8_t page_83_data[16]; /* unique id for scsi devices that
> + support page 0x83 */
> +} vhdx_page83_data;
> +
> +typedef struct QEMU_PACKED vhdx_virtual_disk_logical_sector_size {
> + uint32_t logical_sector_size; /* virtual disk sector size (in bytes).
> + Can only be 512 or 4096 bytes */
> +} vhdx_virtual_disk_logical_sector_size;
> +
> +typedef struct QEMU_PACKED vhdx_virtual_disk_physical_sector_size {
> + uint32_t physical_sector_size; /* physical sector size (in bytes).
> + Can only be 512 or 4096 bytes */
> +} vhdx_virtual_disk_physical_sector_size;
What's the point with all the single-field structs?
> +typedef struct QEMU_PACKED vhdx_parent_locator_header {
> + uint8_t locator_type[16]; /* type of the parent virtual disk. */
> + uint16_t reserved;
> + uint16_t key_value_count; /* number of key/value pairs for this
> + locator */
> +} vhdx_parent_locator_header;
> +
> +/* key and value strings are UNICODE strings, UTF-16 LE encoding, no NULs */
> +typedef struct QEMU_PACKED vhdx_parent_locator_entry {
> + uint32_t key_offset; /* offset in metadata for key, > 0 */
> + uint32_t value_offset; /* offset in metadata for value, >0 */
> + uint16_t key_length; /* length of entry key, > 0 */
> + uint16_t value_length; /* length of entry value, > 0 */
> +} vhdx_parent_locator_entry;
> +
> +
> +/* ----- END VHDX SPECIFICATION STRUCTURES ---- */
> +
> +#endif
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
@ 2013-04-23 15:46 ` Kevin Wolf
2013-04-23 16:11 ` Jeff Cody
2013-04-24 13:21 ` Stefan Hajnoczi
` (3 subsequent siblings)
4 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2013-04-23 15:46 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> This is the initial block driver framework for VHDX image support (
> i.e. Hyper-V image file formats), that supports opening VHDX files, and
> parsing the headers.
>
> This commit does not yet enable:
> - reading
> - writing
> - updating the header
> - differencing files (images with parents)
> - log replay / dirty logs (only clean images)
>
> This is based on Microsoft's VHDX specification:
> "VHDX Format Specification v0.95", published 4/12/2012
> https://www.microsoft.com/en-us/download/details.aspx?id=29681
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
Okay, I need more time actually reading the spec and comparing this code
to it before I can really say anything about the logic (I hope I'll get
to it tomorrow), but for a start, let me point out the obvious things.
> ---
> block/Makefile.objs | 1 +
> block/vhdx.c | 769 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> block/vhdx.h | 25 ++
> 3 files changed, 795 insertions(+)
> create mode 100644 block/vhdx.c
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 6c4b5bc..5f0358a 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> +block-obj-y += vhdx.o
> block-obj-y += parallels.o blkdebug.o blkverify.o
> block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> block-obj-$(CONFIG_POSIX) += raw-posix.o
> diff --git a/block/vhdx.c b/block/vhdx.c
> new file mode 100644
> index 0000000..b0ea2ba
> --- /dev/null
> +++ b/block/vhdx.c
> @@ -0,0 +1,769 @@
> +/*
> + * Block driver for Hyper-V VHDX Images
> + *
> + * Copyright (c) 2013 Red Hat, Inc.,
> + *
> + * Authors:
> + * Jeff Cody <jcody@redhat.com>
> + *
> + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> + * by Microsoft:
> + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +#include "qemu/crc32c.h"
> +#include "block/vhdx.h"
> +
> +
> +/* Several metadata and region table data entries are identified by
> + * guids in a MS-specific GUID format. */
> +
> +
> +/* ------- Known Region Table GUIDs ---------------------- */
> +static const ms_guid bat_guid = { .data1 = 0x2dc27766,
> + .data2 = 0xf623,
> + .data3 = 0x4200,
> + .data4 = { 0x9d, 0x64, 0x11, 0x5e,
> + 0x9b, 0xfd, 0x4a, 0x08} };
> +
> +static const ms_guid metadata_guid = { .data1 = 0x8b7ca206,
> + .data2 = 0x4790,
> + .data3 = 0x4b9a,
> + .data4 = { 0xb8, 0xfe, 0x57, 0x5f,
> + 0x05, 0x0f, 0x88, 0x6e} };
> +
> +
> +
> +/* ------- Known Metadata Entry GUIDs ---------------------- */
> +static const ms_guid file_param_guid = { .data1 = 0xcaa16737,
> + .data2 = 0xfa36,
> + .data3 = 0x4d43,
> + .data4 = { 0xb3, 0xb6, 0x33, 0xf0,
> + 0xaa, 0x44, 0xe7, 0x6b} };
> +
> +static const ms_guid virtual_size_guid = { .data1 = 0x2FA54224,
> + .data2 = 0xcd1b,
> + .data3 = 0x4876,
> + .data4 = { 0xb2, 0x11, 0x5d, 0xbe,
> + 0xd8, 0x3b, 0xf4, 0xb8} };
> +
> +static const ms_guid page83_guid = { .data1 = 0xbeca12ab,
> + .data2 = 0xb2e6,
> + .data3 = 0x4523,
> + .data4 = { 0x93, 0xef, 0xc3, 0x09,
> + 0xe0, 0x00, 0xc7, 0x46} };
> +
> +
> +static const ms_guid phys_sector_guid = { .data1 = 0xcda348c7,
> + .data2 = 0x445d,
> + .data3 = 0x4471,
> + .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
> + 0x52, 0x51, 0xc5, 0x56} };
> +
> +static const ms_guid parent_locator_guid = { .data1 = 0xa8d35f2d,
> + .data2 = 0xb30b,
> + .data3 = 0x454d,
> + .data4 = { 0xab, 0xf7, 0xd3,
> + 0xd8, 0x48, 0x34,
> + 0xab, 0x0c} };
> +
> +static const ms_guid logical_sector_guid = { .data1 = 0x8141bf1d,
> + .data2 = 0xa96f,
> + .data3 = 0x4709,
> + .data4 = { 0xba, 0x47, 0xf2,
> + 0x33, 0xa8, 0xfa,
> + 0xab, 0x5f} };
> +
> +/* Each parent type must have a valid GUID; this is for parent images
> + * of type 'VHDX'. If we were to allow e.g. a QCOW2 parent, we would
> + * need to make up our own QCOW2 GUID type */
> +static const ms_guid parent_vhdx_guid = { .data1 = 0xb04aefb7,
> + .data2 = 0xd19e,
> + .data3 = 0x4a81,
> + .data4 = { 0xb7, 0x89, 0x25, 0xb8,
> + 0xe9, 0x44, 0x59, 0x13} };
> +
> +
> +#define META_FILE_PARAMETER_PRESENT 0x01
> +#define META_VIRTUAL_DISK_SIZE_PRESENT 0x02
> +#define META_PAGE_83_PRESENT 0x04
> +#define META_LOGICAL_SECTOR_SIZE_PRESENT 0x08
> +#define META_PHYS_SECTOR_SIZE_PRESENT 0x10
> +#define META_PARENT_LOCATOR_PRESENT 0x20
> +
> +#define META_ALL_PRESENT \
> + (META_FILE_PARAMETER_PRESENT | META_VIRTUAL_DISK_SIZE_PRESENT | \
> + META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
> + META_PHYS_SECTOR_SIZE_PRESENT)
> +
> +typedef struct vhdx_metadata_entries {
> + vhdx_metadata_table_entry file_parameters_entry;
> + vhdx_metadata_table_entry virtual_disk_size_entry;
> + vhdx_metadata_table_entry page83_data_entry;
> + vhdx_metadata_table_entry logical_sector_size_entry;
> + vhdx_metadata_table_entry phys_sector_size_entry;
> + vhdx_metadata_table_entry parent_locator_entry;
> + uint16_t present;
> +} vhdx_metadata_entries;
Should everything before this line actually be in the header?
> +
> +typedef struct BDRVVHDXState {
> + CoMutex lock;
> +
> + int curr_header;
> + vhdx_header *headers[2];
> +
> + vhdx_region_table_header rt;
> + vhdx_region_table_entry bat_rt; /* region table for the BAT */
> + vhdx_region_table_entry metadata_rt; /* region table for the metadata */
> +
> + vhdx_metadata_table_header metadata_hdr;
> + vhdx_metadata_entries metadata_entries;
> +
> + vhdx_file_parameters params;
> + uint32_t block_size;
> + uint32_t block_size_bits;
> + uint32_t sectors_per_block;
> + uint32_t sectors_per_block_bits;
> +
> + uint64_t virtual_disk_size;
> + uint32_t logical_sector_size;
> + uint32_t physical_sector_size;
> +
> + uint64_t chunk_ratio;
> + uint32_t chunk_ratio_bits;
> + uint32_t logical_sector_size_bits;
> +
> + uint32_t bat_entries;
> + vhdx_bat_entry *bat;
> + uint64_t bat_offset;
> +
> + vhdx_parent_locator_header parent_header;
> + vhdx_parent_locator_entry *parent_entries;
> +
> +} BDRVVHDXState;
> +
> +uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> + int crc_offset)
> +{
> + uint32_t crc_new;
> + uint32_t crc_orig;
> + assert(buf != NULL);
> +
> + if (crc_offset > 0) {
> + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
Please add spaces before and after operators. I saw this a lot in this
series. I'm relatively sure that you can use checkpatch.pl to get all
instances pointed out.
> + memset(buf+crc_offset, 0, sizeof(crc_orig));
> + }
> +
> + crc_new = crc32c(crc, buf, size);
> + if (crc_offset > 0) {
> + memcpy(buf+crc_offset, &crc_orig, sizeof(crc_orig));
> + }
> +
> + return crc_new;
> +}
> +
> +/* Validates the checksum of the buffer, with an in-place CRC.
> + *
> + * Zero is substituted during crc calculation for the original crc field,
> + * and the crc field is restored afterwards. But the buffer will be modifed
> + * during the calculation, so this may not be not suitable for multi-threaded
> + * use.
> + *
> + * crc_offset: byte offset in buf of the buffer crc
> + * buf: buffer pointer
> + * size: size of buffer (must be > crc_offset+4)
> + *
> + * returns true if checksum is valid, false otherwise
> + */
> +bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
> +{
> + uint32_t crc_orig;
> + uint32_t crc;
> +
> + assert(buf != NULL);
> + assert(size > (crc_offset+4));
> +
> + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> + crc_orig = le32_to_cpu(crc_orig);
> +
> + crc = vhdx_checksum_calc(0xffffffff, buf, size, crc_offset);
> +
> + return crc == crc_orig;
> +}
> +
> +
> +/*
> + * Per the MS VHDX Specification, for every VHDX file:
> + * - The header section is fixed size - 1 MB
> + * - The header section is always the first "object"
> + * - The first 64KB of the header is the File Identifier
> + * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
> + * - The following 512 bytes constitute a UTF-16 string identifiying the
> + * software that created the file, and is optional and diagnostic only.
> + *
> + * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
> + */
> +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> + if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
There's a VHDX_FILE_ID_MAGIC constant in the header. Don't you want to
use it?
> + return 100;
> + }
> + return 0;
> +}
> +
> +/* All VHDX structures on disk are little endian */
> +static void vhdx_header_le_import(vhdx_header *h)
> +{
> + assert(h != NULL);
> +
> + le32_to_cpus(&h->signature);
> + le32_to_cpus(&h->checksum);
> + le64_to_cpus(&h->sequence_number);
> +
> + leguid_to_cpus(&h->file_write_guid);
> + leguid_to_cpus(&h->data_write_guid);
> + leguid_to_cpus(&h->log_guid);
> +
> + le16_to_cpus(&h->log_version);
> + le16_to_cpus(&h->version);
> + le32_to_cpus(&h->log_length);
> + le64_to_cpus(&h->log_offset);
> +}
> +
> +
> +/* opens the specified header block from the VHDX file header section */
> +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + vhdx_header *header1;
> + vhdx_header *header2;
> + uint64_t h1_seq = 0;
> + uint64_t h2_seq = 0;
> + uint8_t *buffer;
> +
> + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> +
> + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> +
> + s->headers[0] = header1;
> + s->headers[1] = header2;
> +
> + /* We have to read the whole VHDX_HEADER_SIZE instead of
> + * sizeof(vhdx_header), because the checksum is over the whole
> + * region */
> + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header1, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header1);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header1->signature == VHDX_HDR_MAGIC) {
> + h1_seq = header1->sequence_number;
> + }
> +
> + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header2, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header2);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header2->signature == VHDX_HDR_MAGIC) {
> + h2_seq = header2->sequence_number;
> + }
> +
> + if (h1_seq > h2_seq) {
> + s->curr_header = 0;
> + } else if (h2_seq > h1_seq) {
> + s->curr_header = 1;
> + } else {
> + printf("NO VALID HEADER\n");
Make that an qerror_report() at least.
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + ret = 0;
> +
> + goto exit;
> +
> +fail:
> + qemu_vfree(header1);
> + qemu_vfree(header2);
> + s->headers[0] = NULL;
> + s->headers[1] = NULL;
> +exit:
> + qemu_vfree(buffer);
> + return ret;
> +}
> +
> +
> +static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + uint8_t *buffer;
> + int offset = 0;
> + vhdx_region_table_entry rt_entry;
> + int i;
> +
> + /* We have to read the whole 64KB block, because the crc32 is over the
> + * whole block */
> + buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
> +
> + ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
> + VHDX_HEADER_BLOCK_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + memcpy(&s->rt, buffer, sizeof(s->rt));
> + le32_to_cpus(&s->rt.signature);
> + le32_to_cpus(&s->rt.checksum);
> + le32_to_cpus(&s->rt.entry_count);
> + le32_to_cpus(&s->rt.reserved);
> + offset += sizeof(s->rt);
> +
> + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> + s->rt.signature != VHDX_RT_MAGIC) {
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + for (i = 0; i < s->rt.entry_count; i++) {
> + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> + offset += sizeof(rt_entry);
> +
> + leguid_to_cpus(&rt_entry.guid);
> + le64_to_cpus(&rt_entry.file_offset);
> + le32_to_cpus(&rt_entry.length);
> + le32_to_cpus(&rt_entry.data_bits);
> +
> + /* see if we recognize the entry */
> + if (guid_eq(rt_entry.guid, bat_guid)) {
> + s->bat_rt = rt_entry;
> + continue;
> + }
> +
> + if (guid_eq(rt_entry.guid, metadata_guid)) {
> + s->metadata_rt = rt_entry;
> + continue;
> + }
> +
> + if (rt_entry.data_bits & VHDX_REGION_ENTRY_REQUIRED) {
> + /* cannot read vhdx file - required region table entry that
> + * we do not understand. per spec, we must fail to open */
> + ret = -ENOTSUP;
> + goto fail;
> + }
> + }
> + ret = 0;
> +
> +fail:
> + qemu_vfree(buffer);
> + return ret;
> +}
> +
> +
> +
> +/* Metadata initial parser
> + *
> + * This loads all the metadata entry fields. This may cause additional
> + * fields to be processed (e.g. parent locator, etc..).
> + *
> + * There are 5 Metadata items that are always required:
> + * - File Parameters (block size, has a parent)
> + * - Virtual Disk Size (size, in bytes, of the virtual drive)
> + * - Page 83 Data (scsi page 83 guid)
> + * - Logical Sector Size (logical sector size in bytes, either 512 or
> + * 4096. We only support 512 currently)
> + * - Physical Sector Size (512 or 4096)
> + *
> + * Also, if the File Parameters indicate this is a differencing file,
> + * we must also look for the Parent Locator metadata item.
> + */
> +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + uint8_t *buffer;
> + int offset = 0;
> + int i = 0;
> + uint32_t block_size, sectors_per_block, logical_sector_size;
> + uint64_t chunk_ratio;
> + vhdx_metadata_table_entry md_entry;
> +
> + buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
> +
> + ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
> + VHDX_METADATA_TABLE_MAX_SIZE);
> + if (ret < 0) {
> + goto exit;
> + }
> + memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
> + offset += sizeof(s->metadata_hdr);
> +
> + le64_to_cpus(&s->metadata_hdr.signature);
> + le16_to_cpus(&s->metadata_hdr.reserved);
> + le16_to_cpus(&s->metadata_hdr.entry_count);
> +
> + if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + s->metadata_entries.present = 0;
> +
> + if ((s->metadata_hdr.entry_count * sizeof(md_entry)) >
> + (VHDX_METADATA_TABLE_MAX_SIZE - offset)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + for (i = 0; i < s->metadata_hdr.entry_count; i++) {
> + memcpy(&md_entry, buffer+offset, sizeof(md_entry));
> + offset += sizeof(md_entry);
> +
> + leguid_to_cpus(&md_entry.item_id);
> + le32_to_cpus(&md_entry.offset);
> + le32_to_cpus(&md_entry.length);
> + le32_to_cpus(&md_entry.data_bits);
> + le32_to_cpus(&md_entry.reserved2);
> +
> + if (guid_eq(md_entry.item_id, file_param_guid)) {
> + s->metadata_entries.file_parameters_entry = md_entry;
> + s->metadata_entries.present |= META_FILE_PARAMETER_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, virtual_size_guid)) {
> + s->metadata_entries.virtual_disk_size_entry = md_entry;
> + s->metadata_entries.present |= META_VIRTUAL_DISK_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, page83_guid)) {
> + s->metadata_entries.page83_data_entry = md_entry;
> + s->metadata_entries.present |= META_PAGE_83_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, logical_sector_guid)) {
> + s->metadata_entries.logical_sector_size_entry = md_entry;
> + s->metadata_entries.present |= META_LOGICAL_SECTOR_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, phys_sector_guid)) {
> + s->metadata_entries.phys_sector_size_entry = md_entry;
> + s->metadata_entries.present |= META_PHYS_SECTOR_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, parent_locator_guid)) {
> + s->metadata_entries.parent_locator_entry = md_entry;
> + s->metadata_entries.present |= META_PARENT_LOCATOR_PRESENT;
> + continue;
> + }
> +
> + if (md_entry.data_bits & VHDX_META_FLAGS_IS_REQUIRED) {
> + /* cannot read vhdx file - required region table entry that
> + * we do not understand. per spec, we must fail to open */
> + ret = -ENOTSUP;
> + goto exit;
> + }
> + }
> +
> + if (s->metadata_entries.present != META_ALL_PRESENT) {
> + ret = -ENOTSUP;
> + goto exit;
> + }
> +
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.file_parameters_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->params,
> + sizeof(s->params));
> +
> + if (ret < 0) {
> + goto exit;
> + }
> +
> + le32_to_cpus(&s->params.block_size);
> + le32_to_cpus(&s->params.data_bits);
> +
> +
> + /* We now have the file parameters, so we can tell if this is a
> + * differencing file (i.e.. has_parent), is dynamic or fixed
> + * sized (leave_blocks_allocated), and the block size */
> +
> + /* The parent locator required iff the file parameters has_parent set */
> + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
> + if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
> + /* TODO: parse parent locator fields */
> + ret = -ENOTSUP; /* temp, until differencing files are supported */
> + goto exit;
> + } else {
> + /* if has_parent is set, but there is not parent locator present,
> + * then that is an invalid combination */
> + ret = -EINVAL;
> + goto exit;
> + }
> + }
> +
> + /* determine virtual disk size, logical sector size,
> + * and phys sector size */
> +
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.virtual_disk_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->virtual_disk_size,
> + sizeof(uint64_t));
> + if (ret < 0) {
> + goto exit;
> + }
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.logical_sector_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->logical_sector_size,
> + sizeof(uint32_t));
> + if (ret < 0) {
> + goto exit;
> + }
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.phys_sector_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->physical_sector_size,
> + sizeof(uint32_t));
> + if (ret < 0) {
> + goto exit;
> + }
> +
> + le64_to_cpus(&s->virtual_disk_size);
> + le32_to_cpus(&s->logical_sector_size);
> + le32_to_cpus(&s->physical_sector_size);
> +
> + if (s->logical_sector_size == 0 || s->params.block_size == 0) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + /* both block_size and sector_size are guaranteed powers of 2 */
> + s->sectors_per_block = s->params.block_size / s->logical_sector_size;
> + s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
> + (uint64_t)s->logical_sector_size /
> + (uint64_t)s->params.block_size;
> +
> + /* These values are ones we will want to use for division / multiplication
> + * later on, and they are all guaranteed (per the spec) to be powers of 2,
> + * so we can take advantage of that for shift operations during
> + * reads/writes */
> + logical_sector_size = s->logical_sector_size;
> + if (logical_sector_size & (logical_sector_size - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + sectors_per_block = s->sectors_per_block;
> + if (sectors_per_block & (sectors_per_block - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + chunk_ratio = s->chunk_ratio;
> + if (chunk_ratio & (chunk_ratio - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + block_size = s->params.block_size;
> + if (block_size & (block_size - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + while (logical_sector_size >>= 1) {
> + s->logical_sector_size_bits++;
> + }
> + while (sectors_per_block >>= 1) {
> + s->sectors_per_block_bits++;
> + }
> + while (chunk_ratio >>= 1) {
> + s->chunk_ratio_bits++;
> + }
> + while (block_size >>= 1) {
> + s->block_size_bits++;
> + }
> +
> + if (s->logical_sector_size != BDRV_SECTOR_SIZE) {
> + printf("VHDX error - QEMU only supports 512 byte sector sizes\n");
> + ret = -ENOTSUP;
> + goto exit;
> + }
> +
> + ret = 0;
> +
> +exit:
> + qemu_vfree(buffer);
> + return ret;
> +}
> +
> +/* Parse the replay log. Per the VHDX spec, if the log is present
> + * it must be replayed prior to opening the file, even read-only.
> + *
> + * If read-only, we must replay the log in RAM (or refuse to open
> + * a dirty VHDX file read-only */
> +static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + int i;
> + vhdx_header *hdr;
> +
> + hdr = s->headers[s->curr_header];
> +
> + /* either either the log guid, or log length is zero,
> + * then a replay log is present */
> + for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
> + ret |= hdr->log_guid.data4[i];
> + }
> + if (hdr->log_guid.data1 == 0 &&
> + hdr->log_guid.data2 == 0 &&
> + hdr->log_guid.data3 == 0 &&
> + ret == 0) {
> + goto exit;
> + }
> +
> + /* per spec, only log version of 0 is supported */
> + if (hdr->log_version != 0) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + if (hdr->log_length == 0) {
> + goto exit;
> + }
> +
> + /* We currently do not support images with logs to replay */
> + ret = -ENOTSUP;
> +
> +exit:
> + return ret;
> +}
> +
> +
> +static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> +{
> + BDRVVHDXState *s = bs->opaque;
> + int ret = 0;
> + int i;
> +
> + s->bat = NULL;
> +
> + qemu_co_mutex_init(&s->lock);
> +
> + ret = vhdx_parse_header(bs, s);
> + if (ret) {
> + goto fail;
> + }
> +
> + ret = vhdx_parse_log(bs, s);
> + if (ret) {
> + goto fail;
> + }
> +
> + ret = vhdx_open_region_tables(bs, s);
> + if (ret) {
> + goto fail;
> + }
> +
> + ret = vhdx_parse_metadata(bs, s);
> + if (ret) {
> + goto fail;
> + }
> + s->block_size = s->params.block_size;
> +
> + /* the VHDX spec dictates that virtual_disk_size is always a multiple of
> + * logical_sector_size */
> + bs->total_sectors = s->virtual_disk_size / s->logical_sector_size;
> +
> + s->bat_offset = s->bat_rt.file_offset;
> + s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
> + s->bat = qemu_blockalign(bs, s->bat_rt.length);
> +
> + ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
No error checking?
> + for (i = 0; i < s->bat_entries; i++) {
> + le64_to_cpus(&s->bat[i]);
> + }
> +
> + if (flags & BDRV_O_RDWR) {
> + ret = -ENOTSUP;
> + goto fail;
> + }
> +
> + /* TODO: differencing files, read, write */
> +
> + return 0;
> +fail:
> + qemu_vfree(s->bat);
This doesn't consider all the structs that were allocated by functions
called in vhdx_open(). Here the same things should be freed as in
vhdx_close().
> + return ret;
> +}
> +
> +static int vhdx_reopen_prepare(BDRVReopenState *state,
> + BlockReopenQueue *queue, Error **errp)
> +{
> + return 0;
> +}
> +
> +
> +static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
> + int nb_sectors, QEMUIOVector *qiov)
> +{
> + return -ENOTSUP;
> +}
> +
> +
> +
> +static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
> + int nb_sectors, QEMUIOVector *qiov)
> +{
> + return -ENOTSUP;
> +}
> +
> +
> +static void vhdx_close(BlockDriverState *bs)
> +{
> + BDRVVHDXState *s = bs->opaque;
> +
> + qemu_vfree(s->headers[0]);
> + qemu_vfree(s->headers[1]);
> + qemu_vfree(s->bat);
> + qemu_vfree(s->parent_entries);
> +}
> +
> +static BlockDriver bdrv_vhdx = {
> + .format_name = "vhdx",
> + .instance_size = sizeof(BDRVVHDXState),
Use the same alignment for = as below?
> + .bdrv_probe = vhdx_probe,
> + .bdrv_open = vhdx_open,
> + .bdrv_close = vhdx_close,
> + .bdrv_reopen_prepare = vhdx_reopen_prepare,
> + .bdrv_co_readv = vhdx_co_readv,
> + .bdrv_co_writev = vhdx_co_writev,
> +};
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 15:46 ` Kevin Wolf
@ 2013-04-23 16:11 ` Jeff Cody
2013-04-23 16:18 ` Kevin Wolf
0 siblings, 1 reply; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 16:11 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, stefanha
On Tue, Apr 23, 2013 at 05:46:28PM +0200, Kevin Wolf wrote:
> Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > This is the initial block driver framework for VHDX image support (
> > i.e. Hyper-V image file formats), that supports opening VHDX files, and
> > parsing the headers.
> >
> > This commit does not yet enable:
> > - reading
> > - writing
> > - updating the header
> > - differencing files (images with parents)
> > - log replay / dirty logs (only clean images)
> >
> > This is based on Microsoft's VHDX specification:
> > "VHDX Format Specification v0.95", published 4/12/2012
> > https://www.microsoft.com/en-us/download/details.aspx?id=29681
> >
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
>
> Okay, I need more time actually reading the spec and comparing this code
> to it before I can really say anything about the logic (I hope I'll get
> to it tomorrow), but for a start, let me point out the obvious things.
>
> > ---
> > block/Makefile.objs | 1 +
> > block/vhdx.c | 769 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > block/vhdx.h | 25 ++
> > 3 files changed, 795 insertions(+)
> > create mode 100644 block/vhdx.c
> >
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 6c4b5bc..5f0358a 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> > block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> > block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> > block-obj-y += qed-check.o
> > +block-obj-y += vhdx.o
> > block-obj-y += parallels.o blkdebug.o blkverify.o
> > block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > block-obj-$(CONFIG_POSIX) += raw-posix.o
> > diff --git a/block/vhdx.c b/block/vhdx.c
> > new file mode 100644
> > index 0000000..b0ea2ba
> > --- /dev/null
> > +++ b/block/vhdx.c
> > @@ -0,0 +1,769 @@
> > +/*
> > + * Block driver for Hyper-V VHDX Images
> > + *
> > + * Copyright (c) 2013 Red Hat, Inc.,
> > + *
> > + * Authors:
> > + * Jeff Cody <jcody@redhat.com>
> > + *
> > + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> > + * by Microsoft:
> > + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING.LIB file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu-common.h"
> > +#include "block/block_int.h"
> > +#include "qemu/module.h"
> > +#include "qemu/crc32c.h"
> > +#include "block/vhdx.h"
> > +
> > +
> > +/* Several metadata and region table data entries are identified by
> > + * guids in a MS-specific GUID format. */
> > +
> > +
> > +/* ------- Known Region Table GUIDs ---------------------- */
> > +static const ms_guid bat_guid = { .data1 = 0x2dc27766,
> > + .data2 = 0xf623,
> > + .data3 = 0x4200,
> > + .data4 = { 0x9d, 0x64, 0x11, 0x5e,
> > + 0x9b, 0xfd, 0x4a, 0x08} };
> > +
> > +static const ms_guid metadata_guid = { .data1 = 0x8b7ca206,
> > + .data2 = 0x4790,
> > + .data3 = 0x4b9a,
> > + .data4 = { 0xb8, 0xfe, 0x57, 0x5f,
> > + 0x05, 0x0f, 0x88, 0x6e} };
> > +
> > +
> > +
> > +/* ------- Known Metadata Entry GUIDs ---------------------- */
> > +static const ms_guid file_param_guid = { .data1 = 0xcaa16737,
> > + .data2 = 0xfa36,
> > + .data3 = 0x4d43,
> > + .data4 = { 0xb3, 0xb6, 0x33, 0xf0,
> > + 0xaa, 0x44, 0xe7, 0x6b} };
> > +
> > +static const ms_guid virtual_size_guid = { .data1 = 0x2FA54224,
> > + .data2 = 0xcd1b,
> > + .data3 = 0x4876,
> > + .data4 = { 0xb2, 0x11, 0x5d, 0xbe,
> > + 0xd8, 0x3b, 0xf4, 0xb8} };
> > +
> > +static const ms_guid page83_guid = { .data1 = 0xbeca12ab,
> > + .data2 = 0xb2e6,
> > + .data3 = 0x4523,
> > + .data4 = { 0x93, 0xef, 0xc3, 0x09,
> > + 0xe0, 0x00, 0xc7, 0x46} };
> > +
> > +
> > +static const ms_guid phys_sector_guid = { .data1 = 0xcda348c7,
> > + .data2 = 0x445d,
> > + .data3 = 0x4471,
> > + .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
> > + 0x52, 0x51, 0xc5, 0x56} };
> > +
> > +static const ms_guid parent_locator_guid = { .data1 = 0xa8d35f2d,
> > + .data2 = 0xb30b,
> > + .data3 = 0x454d,
> > + .data4 = { 0xab, 0xf7, 0xd3,
> > + 0xd8, 0x48, 0x34,
> > + 0xab, 0x0c} };
> > +
> > +static const ms_guid logical_sector_guid = { .data1 = 0x8141bf1d,
> > + .data2 = 0xa96f,
> > + .data3 = 0x4709,
> > + .data4 = { 0xba, 0x47, 0xf2,
> > + 0x33, 0xa8, 0xfa,
> > + 0xab, 0x5f} };
> > +
> > +/* Each parent type must have a valid GUID; this is for parent images
> > + * of type 'VHDX'. If we were to allow e.g. a QCOW2 parent, we would
> > + * need to make up our own QCOW2 GUID type */
> > +static const ms_guid parent_vhdx_guid = { .data1 = 0xb04aefb7,
> > + .data2 = 0xd19e,
> > + .data3 = 0x4a81,
> > + .data4 = { 0xb7, 0x89, 0x25, 0xb8,
> > + 0xe9, 0x44, 0x59, 0x13} };
> > +
> > +
> > +#define META_FILE_PARAMETER_PRESENT 0x01
> > +#define META_VIRTUAL_DISK_SIZE_PRESENT 0x02
> > +#define META_PAGE_83_PRESENT 0x04
> > +#define META_LOGICAL_SECTOR_SIZE_PRESENT 0x08
> > +#define META_PHYS_SECTOR_SIZE_PRESENT 0x10
> > +#define META_PARENT_LOCATOR_PRESENT 0x20
> > +
> > +#define META_ALL_PRESENT \
> > + (META_FILE_PARAMETER_PRESENT | META_VIRTUAL_DISK_SIZE_PRESENT | \
> > + META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
> > + META_PHYS_SECTOR_SIZE_PRESENT)
> > +
> > +typedef struct vhdx_metadata_entries {
> > + vhdx_metadata_table_entry file_parameters_entry;
> > + vhdx_metadata_table_entry virtual_disk_size_entry;
> > + vhdx_metadata_table_entry page83_data_entry;
> > + vhdx_metadata_table_entry logical_sector_size_entry;
> > + vhdx_metadata_table_entry phys_sector_size_entry;
> > + vhdx_metadata_table_entry parent_locator_entry;
> > + uint16_t present;
> > +} vhdx_metadata_entries;
>
> Should everything before this line actually be in the header?
>
> > +
> > +typedef struct BDRVVHDXState {
> > + CoMutex lock;
> > +
> > + int curr_header;
> > + vhdx_header *headers[2];
> > +
> > + vhdx_region_table_header rt;
> > + vhdx_region_table_entry bat_rt; /* region table for the BAT */
> > + vhdx_region_table_entry metadata_rt; /* region table for the metadata */
> > +
> > + vhdx_metadata_table_header metadata_hdr;
> > + vhdx_metadata_entries metadata_entries;
> > +
> > + vhdx_file_parameters params;
> > + uint32_t block_size;
> > + uint32_t block_size_bits;
> > + uint32_t sectors_per_block;
> > + uint32_t sectors_per_block_bits;
> > +
> > + uint64_t virtual_disk_size;
> > + uint32_t logical_sector_size;
> > + uint32_t physical_sector_size;
> > +
> > + uint64_t chunk_ratio;
> > + uint32_t chunk_ratio_bits;
> > + uint32_t logical_sector_size_bits;
> > +
> > + uint32_t bat_entries;
> > + vhdx_bat_entry *bat;
> > + uint64_t bat_offset;
> > +
> > + vhdx_parent_locator_header parent_header;
> > + vhdx_parent_locator_entry *parent_entries;
> > +
> > +} BDRVVHDXState;
> > +
> > +uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> > + int crc_offset)
> > +{
> > + uint32_t crc_new;
> > + uint32_t crc_orig;
> > + assert(buf != NULL);
> > +
> > + if (crc_offset > 0) {
> > + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
>
> Please add spaces before and after operators. I saw this a lot in this
> series. I'm relatively sure that you can use checkpatch.pl to get all
> instances pointed out.
>
Sure, I will seek those out and change them. BTW, I ran it
checkpatch.pl, and it apparently missed the one above & below.
> > + memset(buf+crc_offset, 0, sizeof(crc_orig));
> > + }
> > +
> > + crc_new = crc32c(crc, buf, size);
> > + if (crc_offset > 0) {
> > + memcpy(buf+crc_offset, &crc_orig, sizeof(crc_orig));
> > + }
> > +
> > + return crc_new;
> > +}
> > +
> > +/* Validates the checksum of the buffer, with an in-place CRC.
> > + *
> > + * Zero is substituted during crc calculation for the original crc field,
> > + * and the crc field is restored afterwards. But the buffer will be modifed
> > + * during the calculation, so this may not be not suitable for multi-threaded
> > + * use.
> > + *
> > + * crc_offset: byte offset in buf of the buffer crc
> > + * buf: buffer pointer
> > + * size: size of buffer (must be > crc_offset+4)
> > + *
> > + * returns true if checksum is valid, false otherwise
> > + */
> > +bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
> > +{
> > + uint32_t crc_orig;
> > + uint32_t crc;
> > +
> > + assert(buf != NULL);
> > + assert(size > (crc_offset+4));
> > +
> > + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> > + crc_orig = le32_to_cpu(crc_orig);
> > +
> > + crc = vhdx_checksum_calc(0xffffffff, buf, size, crc_offset);
> > +
> > + return crc == crc_orig;
> > +}
> > +
> > +
> > +/*
> > + * Per the MS VHDX Specification, for every VHDX file:
> > + * - The header section is fixed size - 1 MB
> > + * - The header section is always the first "object"
> > + * - The first 64KB of the header is the File Identifier
> > + * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
> > + * - The following 512 bytes constitute a UTF-16 string identifiying the
> > + * software that created the file, and is optional and diagnostic only.
> > + *
> > + * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
> > + */
> > +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
> > +{
> > + if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
>
> There's a VHDX_FILE_ID_MAGIC constant in the header. Don't you want to
> use it?
>
Maybe I should remove all the magics from the header. I could use it,
but I think this is more readable and perhaps easier to follow against
the spec. For the upcoming log patches, there are other magic items
that are compared against strings rather than the magic defines as
well.
If you prefer I compared against the _MAGIC defines in the header, I
have no problem with that.
> > + return 100;
> > + }
> > + return 0;
> > +}
> > +
> > +/* All VHDX structures on disk are little endian */
> > +static void vhdx_header_le_import(vhdx_header *h)
> > +{
> > + assert(h != NULL);
> > +
> > + le32_to_cpus(&h->signature);
> > + le32_to_cpus(&h->checksum);
> > + le64_to_cpus(&h->sequence_number);
> > +
> > + leguid_to_cpus(&h->file_write_guid);
> > + leguid_to_cpus(&h->data_write_guid);
> > + leguid_to_cpus(&h->log_guid);
> > +
> > + le16_to_cpus(&h->log_version);
> > + le16_to_cpus(&h->version);
> > + le32_to_cpus(&h->log_length);
> > + le64_to_cpus(&h->log_offset);
> > +}
> > +
> > +
> > +/* opens the specified header block from the VHDX file header section */
> > +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + vhdx_header *header1;
> > + vhdx_header *header2;
> > + uint64_t h1_seq = 0;
> > + uint64_t h2_seq = 0;
> > + uint8_t *buffer;
> > +
> > + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> > + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> > +
> > + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> > +
> > + s->headers[0] = header1;
> > + s->headers[1] = header2;
> > +
> > + /* We have to read the whole VHDX_HEADER_SIZE instead of
> > + * sizeof(vhdx_header), because the checksum is over the whole
> > + * region */
> > + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header1, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header1);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header1->signature == VHDX_HDR_MAGIC) {
> > + h1_seq = header1->sequence_number;
> > + }
> > +
> > + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header2, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header2);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header2->signature == VHDX_HDR_MAGIC) {
> > + h2_seq = header2->sequence_number;
> > + }
> > +
> > + if (h1_seq > h2_seq) {
> > + s->curr_header = 0;
> > + } else if (h2_seq > h1_seq) {
> > + s->curr_header = 1;
> > + } else {
> > + printf("NO VALID HEADER\n");
>
> Make that an qerror_report() at least.
>
OK
> > + ret = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + ret = 0;
> > +
> > + goto exit;
> > +
> > +fail:
> > + qemu_vfree(header1);
> > + qemu_vfree(header2);
> > + s->headers[0] = NULL;
> > + s->headers[1] = NULL;
> > +exit:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
> > +
> > +
> > +static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + uint8_t *buffer;
> > + int offset = 0;
> > + vhdx_region_table_entry rt_entry;
> > + int i;
> > +
> > + /* We have to read the whole 64KB block, because the crc32 is over the
> > + * whole block */
> > + buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
> > +
> > + ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
> > + VHDX_HEADER_BLOCK_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + memcpy(&s->rt, buffer, sizeof(s->rt));
> > + le32_to_cpus(&s->rt.signature);
> > + le32_to_cpus(&s->rt.checksum);
> > + le32_to_cpus(&s->rt.entry_count);
> > + le32_to_cpus(&s->rt.reserved);
> > + offset += sizeof(s->rt);
> > +
> > + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> > + s->rt.signature != VHDX_RT_MAGIC) {
> > + ret = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + for (i = 0; i < s->rt.entry_count; i++) {
> > + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> > + offset += sizeof(rt_entry);
> > +
> > + leguid_to_cpus(&rt_entry.guid);
> > + le64_to_cpus(&rt_entry.file_offset);
> > + le32_to_cpus(&rt_entry.length);
> > + le32_to_cpus(&rt_entry.data_bits);
> > +
> > + /* see if we recognize the entry */
> > + if (guid_eq(rt_entry.guid, bat_guid)) {
> > + s->bat_rt = rt_entry;
> > + continue;
> > + }
> > +
> > + if (guid_eq(rt_entry.guid, metadata_guid)) {
> > + s->metadata_rt = rt_entry;
> > + continue;
> > + }
> > +
> > + if (rt_entry.data_bits & VHDX_REGION_ENTRY_REQUIRED) {
> > + /* cannot read vhdx file - required region table entry that
> > + * we do not understand. per spec, we must fail to open */
> > + ret = -ENOTSUP;
> > + goto fail;
> > + }
> > + }
> > + ret = 0;
> > +
> > +fail:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
> > +
> > +
> > +
> > +/* Metadata initial parser
> > + *
> > + * This loads all the metadata entry fields. This may cause additional
> > + * fields to be processed (e.g. parent locator, etc..).
> > + *
> > + * There are 5 Metadata items that are always required:
> > + * - File Parameters (block size, has a parent)
> > + * - Virtual Disk Size (size, in bytes, of the virtual drive)
> > + * - Page 83 Data (scsi page 83 guid)
> > + * - Logical Sector Size (logical sector size in bytes, either 512 or
> > + * 4096. We only support 512 currently)
> > + * - Physical Sector Size (512 or 4096)
> > + *
> > + * Also, if the File Parameters indicate this is a differencing file,
> > + * we must also look for the Parent Locator metadata item.
> > + */
> > +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + uint8_t *buffer;
> > + int offset = 0;
> > + int i = 0;
> > + uint32_t block_size, sectors_per_block, logical_sector_size;
> > + uint64_t chunk_ratio;
> > + vhdx_metadata_table_entry md_entry;
> > +
> > + buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
> > +
> > + ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
> > + VHDX_METADATA_TABLE_MAX_SIZE);
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
> > + offset += sizeof(s->metadata_hdr);
> > +
> > + le64_to_cpus(&s->metadata_hdr.signature);
> > + le16_to_cpus(&s->metadata_hdr.reserved);
> > + le16_to_cpus(&s->metadata_hdr.entry_count);
> > +
> > + if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + s->metadata_entries.present = 0;
> > +
> > + if ((s->metadata_hdr.entry_count * sizeof(md_entry)) >
> > + (VHDX_METADATA_TABLE_MAX_SIZE - offset)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + for (i = 0; i < s->metadata_hdr.entry_count; i++) {
> > + memcpy(&md_entry, buffer+offset, sizeof(md_entry));
> > + offset += sizeof(md_entry);
> > +
> > + leguid_to_cpus(&md_entry.item_id);
> > + le32_to_cpus(&md_entry.offset);
> > + le32_to_cpus(&md_entry.length);
> > + le32_to_cpus(&md_entry.data_bits);
> > + le32_to_cpus(&md_entry.reserved2);
> > +
> > + if (guid_eq(md_entry.item_id, file_param_guid)) {
> > + s->metadata_entries.file_parameters_entry = md_entry;
> > + s->metadata_entries.present |= META_FILE_PARAMETER_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, virtual_size_guid)) {
> > + s->metadata_entries.virtual_disk_size_entry = md_entry;
> > + s->metadata_entries.present |= META_VIRTUAL_DISK_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, page83_guid)) {
> > + s->metadata_entries.page83_data_entry = md_entry;
> > + s->metadata_entries.present |= META_PAGE_83_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, logical_sector_guid)) {
> > + s->metadata_entries.logical_sector_size_entry = md_entry;
> > + s->metadata_entries.present |= META_LOGICAL_SECTOR_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, phys_sector_guid)) {
> > + s->metadata_entries.phys_sector_size_entry = md_entry;
> > + s->metadata_entries.present |= META_PHYS_SECTOR_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, parent_locator_guid)) {
> > + s->metadata_entries.parent_locator_entry = md_entry;
> > + s->metadata_entries.present |= META_PARENT_LOCATOR_PRESENT;
> > + continue;
> > + }
> > +
> > + if (md_entry.data_bits & VHDX_META_FLAGS_IS_REQUIRED) {
> > + /* cannot read vhdx file - required region table entry that
> > + * we do not understand. per spec, we must fail to open */
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > + }
> > +
> > + if (s->metadata_entries.present != META_ALL_PRESENT) {
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > +
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.file_parameters_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->params,
> > + sizeof(s->params));
> > +
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > +
> > + le32_to_cpus(&s->params.block_size);
> > + le32_to_cpus(&s->params.data_bits);
> > +
> > +
> > + /* We now have the file parameters, so we can tell if this is a
> > + * differencing file (i.e.. has_parent), is dynamic or fixed
> > + * sized (leave_blocks_allocated), and the block size */
> > +
> > + /* The parent locator required iff the file parameters has_parent set */
> > + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
> > + if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
> > + /* TODO: parse parent locator fields */
> > + ret = -ENOTSUP; /* temp, until differencing files are supported */
> > + goto exit;
> > + } else {
> > + /* if has_parent is set, but there is not parent locator present,
> > + * then that is an invalid combination */
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + }
> > +
> > + /* determine virtual disk size, logical sector size,
> > + * and phys sector size */
> > +
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.virtual_disk_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->virtual_disk_size,
> > + sizeof(uint64_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.logical_sector_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->logical_sector_size,
> > + sizeof(uint32_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.phys_sector_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->physical_sector_size,
> > + sizeof(uint32_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > +
> > + le64_to_cpus(&s->virtual_disk_size);
> > + le32_to_cpus(&s->logical_sector_size);
> > + le32_to_cpus(&s->physical_sector_size);
> > +
> > + if (s->logical_sector_size == 0 || s->params.block_size == 0) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + /* both block_size and sector_size are guaranteed powers of 2 */
> > + s->sectors_per_block = s->params.block_size / s->logical_sector_size;
> > + s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
> > + (uint64_t)s->logical_sector_size /
> > + (uint64_t)s->params.block_size;
> > +
> > + /* These values are ones we will want to use for division / multiplication
> > + * later on, and they are all guaranteed (per the spec) to be powers of 2,
> > + * so we can take advantage of that for shift operations during
> > + * reads/writes */
> > + logical_sector_size = s->logical_sector_size;
> > + if (logical_sector_size & (logical_sector_size - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + sectors_per_block = s->sectors_per_block;
> > + if (sectors_per_block & (sectors_per_block - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + chunk_ratio = s->chunk_ratio;
> > + if (chunk_ratio & (chunk_ratio - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + block_size = s->params.block_size;
> > + if (block_size & (block_size - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + while (logical_sector_size >>= 1) {
> > + s->logical_sector_size_bits++;
> > + }
> > + while (sectors_per_block >>= 1) {
> > + s->sectors_per_block_bits++;
> > + }
> > + while (chunk_ratio >>= 1) {
> > + s->chunk_ratio_bits++;
> > + }
> > + while (block_size >>= 1) {
> > + s->block_size_bits++;
> > + }
> > +
> > + if (s->logical_sector_size != BDRV_SECTOR_SIZE) {
> > + printf("VHDX error - QEMU only supports 512 byte sector sizes\n");
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > +
> > + ret = 0;
> > +
> > +exit:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
> > +
> > +/* Parse the replay log. Per the VHDX spec, if the log is present
> > + * it must be replayed prior to opening the file, even read-only.
> > + *
> > + * If read-only, we must replay the log in RAM (or refuse to open
> > + * a dirty VHDX file read-only */
> > +static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + int i;
> > + vhdx_header *hdr;
> > +
> > + hdr = s->headers[s->curr_header];
> > +
> > + /* either either the log guid, or log length is zero,
> > + * then a replay log is present */
> > + for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
> > + ret |= hdr->log_guid.data4[i];
> > + }
> > + if (hdr->log_guid.data1 == 0 &&
> > + hdr->log_guid.data2 == 0 &&
> > + hdr->log_guid.data3 == 0 &&
> > + ret == 0) {
> > + goto exit;
> > + }
> > +
> > + /* per spec, only log version of 0 is supported */
> > + if (hdr->log_version != 0) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + if (hdr->log_length == 0) {
> > + goto exit;
> > + }
> > +
> > + /* We currently do not support images with logs to replay */
> > + ret = -ENOTSUP;
> > +
> > +exit:
> > + return ret;
> > +}
> > +
> > +
> > +static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> > +{
> > + BDRVVHDXState *s = bs->opaque;
> > + int ret = 0;
> > + int i;
> > +
> > + s->bat = NULL;
> > +
> > + qemu_co_mutex_init(&s->lock);
> > +
> > + ret = vhdx_parse_header(bs, s);
> > + if (ret) {
> > + goto fail;
> > + }
> > +
> > + ret = vhdx_parse_log(bs, s);
> > + if (ret) {
> > + goto fail;
> > + }
> > +
> > + ret = vhdx_open_region_tables(bs, s);
> > + if (ret) {
> > + goto fail;
> > + }
> > +
> > + ret = vhdx_parse_metadata(bs, s);
> > + if (ret) {
> > + goto fail;
> > + }
> > + s->block_size = s->params.block_size;
> > +
> > + /* the VHDX spec dictates that virtual_disk_size is always a multiple of
> > + * logical_sector_size */
> > + bs->total_sectors = s->virtual_disk_size / s->logical_sector_size;
> > +
> > + s->bat_offset = s->bat_rt.file_offset;
> > + s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
> > + s->bat = qemu_blockalign(bs, s->bat_rt.length);
> > +
> > + ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
>
> No error checking?
>
Thanks, missed that, adding it in.
> > + for (i = 0; i < s->bat_entries; i++) {
> > + le64_to_cpus(&s->bat[i]);
> > + }
> > +
> > + if (flags & BDRV_O_RDWR) {
> > + ret = -ENOTSUP;
> > + goto fail;
> > + }
> > +
> > + /* TODO: differencing files, read, write */
> > +
> > + return 0;
> > +fail:
> > + qemu_vfree(s->bat);
>
> This doesn't consider all the structs that were allocated by functions
> called in vhdx_open(). Here the same things should be freed as in
> vhdx_close().
>
OK. Those functions clean up after themselves, but you are right, if
a failure happens later they should be cleaned up here as well.
> > + return ret;
> > +}
> > +
> > +static int vhdx_reopen_prepare(BDRVReopenState *state,
> > + BlockReopenQueue *queue, Error **errp)
> > +{
> > + return 0;
> > +}
> > +
> > +
> > +static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
> > + int nb_sectors, QEMUIOVector *qiov)
> > +{
> > + return -ENOTSUP;
> > +}
> > +
> > +
> > +
> > +static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
> > + int nb_sectors, QEMUIOVector *qiov)
> > +{
> > + return -ENOTSUP;
> > +}
> > +
> > +
> > +static void vhdx_close(BlockDriverState *bs)
> > +{
> > + BDRVVHDXState *s = bs->opaque;
> > +
> > + qemu_vfree(s->headers[0]);
> > + qemu_vfree(s->headers[1]);
> > + qemu_vfree(s->bat);
> > + qemu_vfree(s->parent_entries);
> > +}
> > +
> > +static BlockDriver bdrv_vhdx = {
> > + .format_name = "vhdx",
> > + .instance_size = sizeof(BDRVVHDXState),
>
> Use the same alignment for = as below?
>
OK
> > + .bdrv_probe = vhdx_probe,
> > + .bdrv_open = vhdx_open,
> > + .bdrv_close = vhdx_close,
> > + .bdrv_reopen_prepare = vhdx_reopen_prepare,
> > + .bdrv_co_readv = vhdx_co_readv,
> > + .bdrv_co_writev = vhdx_co_writev,
> > +};
>
> Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 16:11 ` Jeff Cody
@ 2013-04-23 16:18 ` Kevin Wolf
0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2013-04-23 16:18 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 23.04.2013 um 18:11 hat Jeff Cody geschrieben:
> On Tue, Apr 23, 2013 at 05:46:28PM +0200, Kevin Wolf wrote:
> > Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > > +/*
> > > + * Per the MS VHDX Specification, for every VHDX file:
> > > + * - The header section is fixed size - 1 MB
> > > + * - The header section is always the first "object"
> > > + * - The first 64KB of the header is the File Identifier
> > > + * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
> > > + * - The following 512 bytes constitute a UTF-16 string identifiying the
> > > + * software that created the file, and is optional and diagnostic only.
> > > + *
> > > + * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
> > > + */
> > > +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
> > > +{
> > > + if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
> >
> > There's a VHDX_FILE_ID_MAGIC constant in the header. Don't you want to
> > use it?
> >
>
> Maybe I should remove all the magics from the header. I could use it,
> but I think this is more readable and perhaps easier to follow against
> the spec. For the upcoming log patches, there are other magic items
> that are compared against strings rather than the magic defines as
> well.
>
> If you prefer I compared against the _MAGIC defines in the header, I
> have no problem with that.
I don't mind using the strings, but then I think removing the numeric
magic from the header is the right thing indeed.
> > > + for (i = 0; i < s->bat_entries; i++) {
> > > + le64_to_cpus(&s->bat[i]);
> > > + }
> > > +
> > > + if (flags & BDRV_O_RDWR) {
> > > + ret = -ENOTSUP;
> > > + goto fail;
> > > + }
> > > +
> > > + /* TODO: differencing files, read, write */
> > > +
> > > + return 0;
> > > +fail:
> > > + qemu_vfree(s->bat);
> >
> > This doesn't consider all the structs that were allocated by functions
> > called in vhdx_open(). Here the same things should be freed as in
> > vhdx_close().
> >
>
> OK. Those functions clean up after themselves, but you are right, if
> a failure happens later they should be cleaned up here as well.
Yes, you can probably get away with not freeing whatever the last
function call could have allocated if it cleans up after itself, but
doing the full thing here is more obviously correct. (You may have to
move the cleanup here instead of duplicating it, or make the functions'
cleanup reset the pointers to NULL)
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-23 15:10 ` Kevin Wolf
@ 2013-04-23 16:32 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-23 16:32 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, stefanha
On Tue, Apr 23, 2013 at 05:10:18PM +0200, Kevin Wolf wrote:
> Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > This is based on Microsoft's VHDX specification:
> > "VHDX Format Specification v0.95", published 4/12/2012
> > https://www.microsoft.com/en-us/download/details.aspx?id=29681
> >
> > These structures define the various header, metadata, and other
> > block structures defined in the VHDX specification.
> >
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> > block/vhdx.h | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 327 insertions(+)
> > create mode 100644 block/vhdx.h
> >
> > diff --git a/block/vhdx.h b/block/vhdx.h
> > new file mode 100644
> > index 0000000..f5cf1ed
> > --- /dev/null
> > +++ b/block/vhdx.h
> > @@ -0,0 +1,327 @@
> > +/*
> > + * Block driver for Hyper-V VHDX Images
> > + *
> > + * Copyright (c) 2013 Red Hat, Inc.,
> > + *
> > + * Authors:
> > + * Jeff Cody <jcody@redhat.com>
> > + *
> > + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> > + * by Microsoft:
> > + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING.LIB file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef BLOCK_VHDX_H
> > +#define BLOCK_VHDX_H
> > +
> > +/* Structures and fields present in the VHDX file */
> > +
> > +/* The header section has the following blocks,
> > + * each block is 64KB:
> > + *
> > + * _____________________________________________________________________________
> > + * | File Id. | Header 1 | Header 2 | Region Table | Reserved (768KB) |
> > + * |----------|---------------|------------|--------------|--------------------|
> > + * | | | | | |
> > + * 0.........64KB...........128KB........192KB..........256KB................1MB
> > + */
> > +
> > +#define VHDX_HEADER_BLOCK_SIZE (64*1024)
> > +
> > +#define VHDX_FILE_ID_OFFSET 0
> > +#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1)
> > +#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2)
> > +#define VHDX_REGION_TABLE_OFFSET (VHDX_HEADER_BLOCK_SIZE*3)
> > +
> > +
> > +/*
> > + * A note on the use of MS-GUID fields. For more details on the GUID,
> > + * please see: https://en.wikipedia.org/wiki/Globally_unique_identifier.
> > + *
> > + * The VHDX specification only states that these are MS GUIDs, and which
> > + * bytes are data1-data4. It makes no mention of what algorithm should be used
> > + * to generate the GUID, nor what standard. However, looking at the specified
> > + * known GUID fields, it appears the GUIDs are:
> > + * Standard/DCE GUID type (noted by 10b in the MSB of byte 0 of .data4)
> > + * Random algorithm (noted by 0x4XXX for .data3)
> > + */
> > +
> > +/* ---- HEADER SECTION STRUCTURES ---- */
> > +
> > +/* Important note: these structures are as defined in the VHDX specification,
> > + * including byte order and size. However, without being packed structures,
> > + * they will not match 1:1 data read from disk. Rather than use potentially
> > + * non-portable packed structures, data is copied from read buffers into
> > + * the structures below. However, for reference, please refrain from
> > + * modifying these structures to something that does not represent the spec */
> > +
> > +#define VHDX_FILE_ID_MAGIC 0x656C696678646876 /* 'vhdxfile' */
> > +typedef struct vhdx_file_identifier {
>
> According to the qemu coding style, this is not a valid name for a
> struct. I think you can choose between VHDXFileIdentifier and
> VhdxFileIdentifier, where existing code tends towards the former.
>
> More instances of this follow, but I won't comment on each.
>
OK. I'll go through and change them all; almost all of them will need
to be changed.
> > + uint64_t signature; /* "vhdxfile" in ASCII */
> > + uint16_t creator[256]; /* optional; utf-16 string to identify
> > + the vhdx file creator. Diagnotistic
> > + only */
> > +} vhdx_file_identifier;
> > +
> > +
> > +/* the guid is a 16 byte unique ID - the definition for this used by
> > + * Microsoft is not just 16 bytes though - it is a structure that is defined,
> > + * so we need to follow it here so that endianness does not trip us up */
> > +
> > +typedef struct ms_guid {
> > + uint32_t data1;
> > + uint16_t data2;
> > + uint16_t data3;
> > + uint8_t data4[8];
> > +} ms_guid;
> > +
> > +#define guid_eq(a, b) \
> > + (memcmp(&(a), &(b), sizeof(ms_guid)) == 0)
> > +
> > +#define VHDX_HEADER_SIZE (4*1024) /* although the vhdx_header struct in disk
> > + is only 582 bytes, for purposes of crc
> > + the header is the first 4KB of the 64KB
> > + block */
> > +
> > +#define VHDX_HDR_MAGIC 0x64616568 /* 'head' */
> > +typedef struct QEMU_PACKED vhdx_header {
> > + uint32_t signature; /* "head" in ASCII */
> > + uint32_t checksum; /* CRC-32C hash of the whole header */
> > + uint64_t sequence_number; /* Seq number of this header. Each
> > + VHDX file has 2 of these headers,
> > + and only the header with the highest
> > + sequence number is valid */
> > + ms_guid file_write_guid; /* 128 bit unique identifier. Must be
> > + updated to new, unique value before
> > + the first modification is made to
> > + file */
> > + ms_guid data_write_guid; /* 128 bit unique identifier. Must be
> > + updated to new, unique value before
> > + the first modification is made to
> > + visible data. Visbile data is
> > + defined as:
> > + - system & user metadata
> > + - raw block data
> > + - disk size
> > + - any change that will
> > + cause the virtual disk
> > + sector read to differ
> > +
> > + This does not need to change if
> > + blocks are re-arranged */
> > + ms_guid log_guid; /* 128 bit unique identifier. If zero,
> > + there is no valid log. If non-zero,
> > + log entries with this guid are
> > + valid. */
> > + uint16_t log_version; /* version of the log format. Mustn't be
> > + zero, unless log_guid is also zero */
> > + uint16_t version; /* version of th evhdx file. Currently,
> > + only supported version is "1" */
> > + uint32_t log_length; /* length of the log. Must be multiple
> > + of 1MB */
> > + uint64_t log_offset; /* byte offset in the file of the log.
> > + Must also be a multiple of 1MB */
> > +} vhdx_header;
> > +
> > +/* 4KB in packed data size, not to be used except for initial data read */
> > +typedef struct QEMU_PACKED vhdx_header_padded {
> > + vhdx_header header;
> > + uint8_t reserved[502]; /* per the VHDX spec */
> > + uint8_t reserved_[3514]; /* for the initial packed struct read */
> > +} vhdx_header_padded;
> > +
> > +/* Header for the region table block */
> > +#define VHDX_RT_MAGIC 0x69676572 /* 'regi ' */
> > +typedef struct QEMU_PACKED vhdx_region_table_header {
> > + uint32_t signature; /* "regi" in ASCII */
> > + uint32_t checksum; /* CRC-32C hash of the 64KB table */
> > + uint32_t entry_count; /* number of valid entries */
> > + uint32_t reserved;
> > +} vhdx_region_table_header;
> > +
> > +/* Individual region table entry. There may be a maximum of 2047 of these
> > + *
> > + * There are two known region table properties. Both are required.
> > + * BAT (block allocation table): 2DC27766F62342009D64115E9BFD4A08
> > + * Metadata: 8B7CA20647904B9AB8FE575F050F886E
> > + */
> > +#define VHDX_REGION_ENTRY_REQUIRED 0x01 /* if set, parser must understand
> > + this entry in order to open
> > + file */
> > +typedef struct QEMU_PACKED vhdx_region_table_entry {
> > + ms_guid guid; /* 128-bit unique identifier */
> > + uint64_t file_offset; /* offset of the object in the file.
> > + Must be multiple of 1MB */
> > + uint32_t length; /* length, in bytes, of the object */
> > + uint32_t data_bits;
> > +} vhdx_region_table_entry;
> > +
> > +
> > +/* ---- LOG ENTRY STRUCTURES ---- */
> > +#define VHDX_LOG_HDR_SIZE 64
> > +#define VHDX_LOGE_MAGIC 0x65676F6C /* 'loge' */
> > +typedef struct QEMU_PACKED vhdx_log_entry_header {
> > + uint32_t signature; /* "loge" in ASCII */
> > + uint32_t checksum; /* CRC-32C hash of the 64KB table */
> > + uint32_t entry_length; /* length in bytes, multiple of 1MB */
> > + uint32_t tail; /* byte offset of first log entry of a
> > + seq, where this entry is the last
> > + entry */
> > + uint64_t sequence_number; /* incremented with each log entry.
> > + May not be zero. */
> > + uint32_t descriptor_count; /* number of descriptors in this log
> > + entry, must be >= 0 */
> > + uint32_t reserved;
> > + ms_guid log_guid; /* value of the log_guid from
> > + vhdx_header. If not found in
> > + vhdx_header, it is invalid */
> > + uint64_t flushed_file_offset; /* see spec for full details - this
> > + sould be vhdx file size in bytes */
> > + uint64_t last_file_offset; /* size in bytes that all allocated
> > + file structures fit into */
> > +} vhdx_log_entry_header;
> > +
> > +#define VHDX_LOG_DESC_SIZE 32
> > +
> > +#define VHDX_ZERO_MAGIC 0x6F72657A /* 'zero' */
> > +#define VHDX_DATA_MAGIC 0x63736564 /* 'desc' */
> > +typedef struct QEMU_PACKED vhdx_log_descriptor {
> > + uint32_t signature; /* "zero" or "desc" in ASCII */
> > + union {
> > + uint32_t reserved; /* zero desc */
> > + uint32_t trailing_bytes; /* data desc: bytes 4092-4096 of the
> > + data sector */
> > + };
>
> Indentation inside the union is off.
>
> > + union {
> > + uint64_t zero_length; /* zero desc: length of the section to
> > + zero */
> > + uint64_t leading_bytes; /* data desc: bytes 0-7 of the data
> > + sector */
> > + };
>
> Here as well.
>
OK, thanks.
> > + uint64_t file_offset; /* file offset to write zeros - multiple
> > + of 4kB */
> > + uint64_t sequence_number; /* must match same field in
> > + vhdx_log_entry_header */
> > +} vhdx_log_descriptor;
> > +
> > +#define VHDX_DATAS_MAGIC 0x61746164 /* 'data' */
> > +typedef struct QEMU_PACKED vhdx_log_data_sector {
> > + uint32_t data_signature; /* "data" in ASCII */
> > + uint32_t sequence_high; /* 4 MSB of 8 byte sequence_number */
> > + uint8_t data[4084]; /* raw data, bytes 8-4091 (inclusive).
> > + see the data descriptor field for the
> > + other mising bytes */
> > + uint32_t sequence_low; /* 4 LSB of 8 byte sequence_number */
> > +} vhdx_log_data_sector;
> > +
> > +
> > +
> > +/* block states - different state values depending on whether it is a
> > + * payload block, or a sector block. */
> > +
> > +#define PAYLOAD_BLOCK_NOT_PRESENT 0
> > +#define PAYLOAD_BLOCK_UNDEFINED 1
> > +#define PAYLOAD_BLOCK_ZERO 2
> > +#define PAYLOAD_BLOCK_UNMAPPED 5
> > +#define PAYLOAD_BLOCK_FULL_PRESENT 6
> > +#define PAYLOAD_BLOCK_PARTIALLY_PRESENT 7
>
> Looks like an enum?
>
The spec lays them out as #define's, so this is a case where I figured
it helped with a reader comparing against the spec. But I can make
this (and the corresponding ones below) into enums, if you are
worried about namespace pollution.
> > +#define SB_BLOCK_NOT_PRESENT 0
> > +#define SB_BLOCK_PRESENT 6
> > +
> > +/* per the spec */
> > +#define VHDX_MAX_SECTORS_PER_BLOCK (1<<23)
> > +
> > +/* upper 44 bits are the file offset in 1MB units lower 3 bits are the state
> > + other bits are reserved */
> > +#define VHDX_BAT_STATE_BIT_MASK 0x07
> > +#define VHDX_BAT_FILE_OFF_BITS (64-44)
> > +typedef uint64_t vhdx_bat_entry;
> > +
> > +/* ---- METADATA REGION STRUCTURES ---- */
> > +
> > +#define VHDX_METADATA_ENTRY_SIZE 32
> > +#define VHDX_METADATA_MAX_ENTRIES 2047 /* not including the header */
> > +#define VHDX_METADATA_TABLE_MAX_SIZE \
> > + (VHDX_METADATA_ENTRY_SIZE * (VHDX_METADATA_MAX_ENTRIES+1))
> > +#define VHDX_METADATA_MAGIC 0x617461646174656D /* 'metadata' */
> > +typedef struct QEMU_PACKED vhdx_metadata_table_header {
> > + uint64_t signature; /* "metadata" in ASCII */
> > + uint16_t reserved;
> > + uint16_t entry_count; /* number table entries. <= 2047 */
> > + uint32_t reserved2[5];
> > +} vhdx_metadata_table_header;
> > +
> > +#define VHDX_META_FLAGS_IS_USER 0x01 /* max 1024 entries */
> > +#define VHDX_META_FLAGS_IS_VIRTUAL_DISK 0x02 /* virtual disk metadata if set,
> > + otherwise file metdata */
> > +#define VHDX_META_FLAGS_IS_REQUIRED 0x04 /* parse must understand this
> > + entry to open the file */
> > +typedef struct QEMU_PACKED vhdx_metadata_table_entry {
> > + ms_guid item_id; /* 128-bit identifier for metadata */
> > + uint32_t offset; /* byte offset of the metadata. At
> > + least 64kB. Relative to start of
> > + metadata region */
> > + /* note: if length = 0, so is offset */
> > + uint32_t length; /* length of metadata. <= 1MB. */
> > + uint32_t data_bits; /* least-significant 3 bits are flags, the
> > + rest are reserved (see above) */
> > + uint32_t reserved2;
> > +} vhdx_metadata_table_entry;
> > +
> > +#define VHDX_PARAMS_LEAVE_BLOCKS_ALLOCED 0x01 /* Do not change any blocks to
> > + be BLOCK_NOT_PRESENT.
> > + If set indicates a fixed
> > + size VHDX file */
> > +#define VHDX_PARAMS_HAS_PARENT 0x02 /* has parent / backing file */
> > +typedef struct QEMU_PACKED vhdx_file_parameters {
> > + uint32_t block_size; /* size of each payload block, always
> > + power of 2, <= 256MB and >= 1MB. */
> > + uint32_t data_bits; /* least-significant 2 bits are flags, the rest
> > + are reserved (see above) */
> > +} vhdx_file_parameters;
> > +
> > +typedef struct QEMU_PACKED vhdx_virtual_disk_size {
> > + uint64_t virtual_disk_size; /* Size of the virtual disk, in bytes.
> > + Must be multiple of the sector size,
> > + max of 64TB */
> > +} vhdx_virtual_disk_size;
> > +
> > +typedef struct QEMU_PACKED vhdx_page83_data {
> > + uint8_t page_83_data[16]; /* unique id for scsi devices that
> > + support page 0x83 */
> > +} vhdx_page83_data;
> > +
> > +typedef struct QEMU_PACKED vhdx_virtual_disk_logical_sector_size {
> > + uint32_t logical_sector_size; /* virtual disk sector size (in bytes).
> > + Can only be 512 or 4096 bytes */
> > +} vhdx_virtual_disk_logical_sector_size;
> > +
> > +typedef struct QEMU_PACKED vhdx_virtual_disk_physical_sector_size {
> > + uint32_t physical_sector_size; /* physical sector size (in bytes).
> > + Can only be 512 or 4096 bytes */
> > +} vhdx_virtual_disk_physical_sector_size;
>
> What's the point with all the single-field structs?
>
Another case of laying out as it was in the spec, for readability. I
tried to keep the header as faithful to the spec as possible; if you
have a strong preference, I can drop these and just use the single
field types inside vhdx.c.
> > +typedef struct QEMU_PACKED vhdx_parent_locator_header {
> > + uint8_t locator_type[16]; /* type of the parent virtual disk. */
> > + uint16_t reserved;
> > + uint16_t key_value_count; /* number of key/value pairs for this
> > + locator */
> > +} vhdx_parent_locator_header;
> > +
> > +/* key and value strings are UNICODE strings, UTF-16 LE encoding, no NULs */
> > +typedef struct QEMU_PACKED vhdx_parent_locator_entry {
> > + uint32_t key_offset; /* offset in metadata for key, > 0 */
> > + uint32_t value_offset; /* offset in metadata for value, >0 */
> > + uint16_t key_length; /* length of entry key, > 0 */
> > + uint16_t value_length; /* length of entry value, > 0 */
> > +} vhdx_parent_locator_entry;
> > +
> > +
> > +/* ----- END VHDX SPECIFICATION STRUCTURES ---- */
> > +
> > +#endif
>
> Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
2013-04-23 15:10 ` Kevin Wolf
@ 2013-04-24 12:31 ` Stefan Hajnoczi
2013-04-24 12:34 ` Jeff Cody
2013-04-25 13:05 ` Kevin Wolf
2 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2013-04-24 12:31 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, Apr 23, 2013 at 10:24:21AM -0400, Jeff Cody wrote:
> +/* ---- HEADER SECTION STRUCTURES ---- */
> +
> +/* Important note: these structures are as defined in the VHDX specification,
> + * including byte order and size. However, without being packed structures,
> + * they will not match 1:1 data read from disk. Rather than use potentially
> + * non-portable packed structures, data is copied from read buffers into
> + * the structures below. However, for reference, please refrain from
> + * modifying these structures to something that does not represent the spec */
Outdated comment? The patch uses QEMU_PACKED.
> +/* Header for the region table block */
> +#define VHDX_RT_MAGIC 0x69676572 /* 'regi ' */
Not worth respinnning, but should be 'regi' (4 bytes).
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-24 12:31 ` Stefan Hajnoczi
@ 2013-04-24 12:34 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-24 12:34 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha
On Wed, Apr 24, 2013 at 02:31:27PM +0200, Stefan Hajnoczi wrote:
> On Tue, Apr 23, 2013 at 10:24:21AM -0400, Jeff Cody wrote:
> > +/* ---- HEADER SECTION STRUCTURES ---- */
> > +
> > +/* Important note: these structures are as defined in the VHDX specification,
> > + * including byte order and size. However, without being packed structures,
> > + * they will not match 1:1 data read from disk. Rather than use potentially
> > + * non-portable packed structures, data is copied from read buffers into
> > + * the structures below. However, for reference, please refrain from
> > + * modifying these structures to something that does not represent the spec */
>
> Outdated comment? The patch uses QEMU_PACKED.
>
Yes, outdated comment, thanks.
> > +/* Header for the region table block */
> > +#define VHDX_RT_MAGIC 0x69676572 /* 'regi ' */
>
> Not worth respinnning, but should be 'regi' (4 bytes).
Per Kevin's comments, I am removing all the _MAGIC's from the
headers anyway, and using memcmp against the strings in the source
file. So I'll take care of the outdated comment as well then.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
2013-04-23 15:46 ` Kevin Wolf
@ 2013-04-24 13:21 ` Stefan Hajnoczi
2013-04-24 13:40 ` Jeff Cody
2013-04-25 13:04 ` Kevin Wolf
` (2 subsequent siblings)
4 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2013-04-24 13:21 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, Apr 23, 2013 at 10:24:22AM -0400, Jeff Cody wrote:
> + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> + s->rt.signature != VHDX_RT_MAGIC) {
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + for (i = 0; i < s->rt.entry_count; i++) {
It's nice to avoid signed/unsigned comparisons. i should be uint32_t
just like entry_count.
> + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> + offset += sizeof(rt_entry);
Looks like we're trusting rt.entry_count to be a sane value? Need to
prevent offset from exceeding buffer size.
> + while (logical_sector_size >>= 1) {
> + s->logical_sector_size_bits++;
> + }
> + while (sectors_per_block >>= 1) {
> + s->sectors_per_block_bits++;
> + }
> + while (chunk_ratio >>= 1) {
> + s->chunk_ratio_bits++;
> + }
> + while (block_size >>= 1) {
> + s->block_size_bits++;
> + }
ctz()/clo() do this.
> +static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + int i;
> + vhdx_header *hdr;
> +
> + hdr = s->headers[s->curr_header];
> +
> + /* either either the log guid, or log length is zero,
either either
> + s->bat_offset = s->bat_rt.file_offset;
> + s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
> + s->bat = qemu_blockalign(bs, s->bat_rt.length);
No sanity check was done on bat_rt.length. If this allocation fails
QEMU will exit. Could be used as a DoS if you can get someone to attach
a malicious VHDX to their VM?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-24 13:21 ` Stefan Hajnoczi
@ 2013-04-24 13:40 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-24 13:40 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha
On Wed, Apr 24, 2013 at 03:21:10PM +0200, Stefan Hajnoczi wrote:
> On Tue, Apr 23, 2013 at 10:24:22AM -0400, Jeff Cody wrote:
> > + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> > + s->rt.signature != VHDX_RT_MAGIC) {
> > + ret = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + for (i = 0; i < s->rt.entry_count; i++) {
>
> It's nice to avoid signed/unsigned comparisons. i should be uint32_t
> just like entry_count.
I agree. I will also double check the other parsing routines (e.g.
vhdx_parse_metadata()).
>
> > + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> > + offset += sizeof(rt_entry);
>
> Looks like we're trusting rt.entry_count to be a sane value? Need to
> prevent offset from exceeding buffer size.
>
Agree again, and I will also check the other parsers as well.
> > + while (logical_sector_size >>= 1) {
> > + s->logical_sector_size_bits++;
> > + }
> > + while (sectors_per_block >>= 1) {
> > + s->sectors_per_block_bits++;
> > + }
> > + while (chunk_ratio >>= 1) {
> > + s->chunk_ratio_bits++;
> > + }
> > + while (block_size >>= 1) {
> > + s->block_size_bits++;
> > + }
>
> ctz()/clo() do this.
>
Ah, yes! I will switch over to using those.
> > +static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + int i;
> > + vhdx_header *hdr;
> > +
> > + hdr = s->headers[s->curr_header];
> > +
> > + /* either either the log guid, or log length is zero,
>
> either either
>
Thanks
> > + s->bat_offset = s->bat_rt.file_offset;
> > + s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
> > + s->bat = qemu_blockalign(bs, s->bat_rt.length);
>
> No sanity check was done on bat_rt.length. If this allocation fails
> QEMU will exit. Could be used as a DoS if you can get someone to attach
> a malicious VHDX to their VM?
Yes, bat_rt.length needs to be verified here as well. I will add that
in.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format.
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format Jeff Cody
@ 2013-04-24 14:38 ` Stefan Hajnoczi
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2013-04-24 14:38 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, Apr 23, 2013 at 10:24:23AM -0400, Jeff Cody wrote:
> +static void vhdx_block_translate(BDRVVHDXState *s, int64_t sector_num,
> + int nb_sectors, vhdx_sector_info *sinfo)
> +{
> + uint32_t block_offset;
> +
> + sinfo->bat_idx = sector_num >> s->sectors_per_block_bits;
> + /* effectively a modulo - this gives us the offset into the block
> + * (in sector sizes) for our sector number */
> + block_offset = sector_num - (sinfo->bat_idx << s->sectors_per_block_bits);
> + /* the chunk ratio gives us the interleaving of the sector
> + * bitmaps, so we need to advance our page block index by the
> + * sector bitmaps entry number */
> + sinfo->bat_idx += sinfo->bat_idx >> s->chunk_ratio_bits;
> +
> + /* the number of sectors we can read/write in this cycle */
> + sinfo->sectors_avail = s->sectors_per_block - block_offset;
> +
> + sinfo->bytes_left = sinfo->sectors_avail << s->logical_sector_size_bits;
> +
> + if (sinfo->sectors_avail > nb_sectors) {
> + sinfo->sectors_avail = nb_sectors;
> + }
> +
> + sinfo->bytes_avail = sinfo->sectors_avail << s->logical_sector_size_bits;
> +
> + sinfo->file_offset = s->bat[sinfo->bat_idx] >> VHDX_BAT_FILE_OFF_BITS;
If my calculation is correct, a 2 TB image would result in 16 MB of BAT
entries for payload blocks. The BAT entries for sector bitmap blocks
would be smaller.
On a slow disk the startup time with many VHDX files could be be several
seconds though since we need to read all this data into memory.
Perhaps something to consider for the future but not critical right now.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images Jeff Cody
@ 2013-04-24 14:47 ` Stefan Hajnoczi
2013-04-24 14:56 ` Jeff Cody
2013-04-28 10:05 ` Fam Zheng
1 sibling, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2013-04-24 14:47 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, Apr 23, 2013 at 10:24:24AM -0400, Jeff Cody wrote:
> This adds the ability to update the headers in a VHDX image, including
> generating a new MS-compatible GUID.
>
> As VHDX depends on uuid.h, VHDX is now a configurable build option.
> If VHDX support is enabled, that will also enable uuid as well.
>
> To enable/disable VHDX: --enable-vhdx, --disable-vhdx
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> block/Makefile.objs | 2 +-
> block/vhdx.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> configure | 13 +++++
> 3 files changed, 169 insertions(+), 3 deletions(-)
Why is this part of the series?
Stefan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-24 14:47 ` Stefan Hajnoczi
@ 2013-04-24 14:56 ` Jeff Cody
2013-04-25 7:20 ` Stefan Hajnoczi
0 siblings, 1 reply; 30+ messages in thread
From: Jeff Cody @ 2013-04-24 14:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha
On Wed, Apr 24, 2013 at 04:47:15PM +0200, Stefan Hajnoczi wrote:
> On Tue, Apr 23, 2013 at 10:24:24AM -0400, Jeff Cody wrote:
> > This adds the ability to update the headers in a VHDX image, including
> > generating a new MS-compatible GUID.
> >
> > As VHDX depends on uuid.h, VHDX is now a configurable build option.
> > If VHDX support is enabled, that will also enable uuid as well.
> >
> > To enable/disable VHDX: --enable-vhdx, --disable-vhdx
> >
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> > block/Makefile.objs | 2 +-
> > block/vhdx.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > configure | 13 +++++
> > 3 files changed, 169 insertions(+), 3 deletions(-)
>
> Why is this part of the series?
>
> Stefan
This could technically be dropped from the series, since the current
series only supports r/o functions.
I included it because it is not log-dependent - This patch provides
the basic mechanism to update the headers (header updates do not go
through the log), so it is safe. It is not required yet, but it will
be in the future.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-24 14:56 ` Jeff Cody
@ 2013-04-25 7:20 ` Stefan Hajnoczi
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2013-04-25 7:20 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, Stefan Hajnoczi, qemu-devel
On Wed, Apr 24, 2013 at 10:56:19AM -0400, Jeff Cody wrote:
> On Wed, Apr 24, 2013 at 04:47:15PM +0200, Stefan Hajnoczi wrote:
> > On Tue, Apr 23, 2013 at 10:24:24AM -0400, Jeff Cody wrote:
> > > This adds the ability to update the headers in a VHDX image, including
> > > generating a new MS-compatible GUID.
> > >
> > > As VHDX depends on uuid.h, VHDX is now a configurable build option.
> > > If VHDX support is enabled, that will also enable uuid as well.
> > >
> > > To enable/disable VHDX: --enable-vhdx, --disable-vhdx
> > >
> > > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > > ---
> > > block/Makefile.objs | 2 +-
> > > block/vhdx.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > configure | 13 +++++
> > > 3 files changed, 169 insertions(+), 3 deletions(-)
> >
> > Why is this part of the series?
> >
> > Stefan
>
> This could technically be dropped from the series, since the current
> series only supports r/o functions.
>
> I included it because it is not log-dependent - This patch provides
> the basic mechanism to update the headers (header updates do not go
> through the log), so it is safe. It is not required yet, but it will
> be in the future.
I suggest putting this patch into the series that makes use of it.
Stefan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
2013-04-23 15:46 ` Kevin Wolf
2013-04-24 13:21 ` Stefan Hajnoczi
@ 2013-04-25 13:04 ` Kevin Wolf
2013-04-25 15:03 ` Jeff Cody
2013-04-28 7:29 ` Fam Zheng
2013-04-28 9:58 ` Fam Zheng
4 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2013-04-25 13:04 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> This is the initial block driver framework for VHDX image support (
> i.e. Hyper-V image file formats), that supports opening VHDX files, and
> parsing the headers.
>
> This commit does not yet enable:
> - reading
> - writing
> - updating the header
> - differencing files (images with parents)
> - log replay / dirty logs (only clean images)
>
> This is based on Microsoft's VHDX specification:
> "VHDX Format Specification v0.95", published 4/12/2012
> https://www.microsoft.com/en-us/download/details.aspx?id=29681
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
Okay, now I'm starting to actually compare the spec to your
implementation. Comments inline.
> block/Makefile.objs | 1 +
> block/vhdx.c | 769 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> block/vhdx.h | 25 ++
> 3 files changed, 795 insertions(+)
> create mode 100644 block/vhdx.c
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 6c4b5bc..5f0358a 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> +block-obj-y += vhdx.o
> block-obj-y += parallels.o blkdebug.o blkverify.o
> block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> block-obj-$(CONFIG_POSIX) += raw-posix.o
> diff --git a/block/vhdx.c b/block/vhdx.c
> new file mode 100644
> index 0000000..b0ea2ba
> --- /dev/null
> +++ b/block/vhdx.c
> @@ -0,0 +1,769 @@
> +/*
> + * Block driver for Hyper-V VHDX Images
> + *
> + * Copyright (c) 2013 Red Hat, Inc.,
> + *
> + * Authors:
> + * Jeff Cody <jcody@redhat.com>
> + *
> + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> + * by Microsoft:
> + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +#include "qemu/crc32c.h"
> +#include "block/vhdx.h"
> +
> +
> +/* Several metadata and region table data entries are identified by
> + * guids in a MS-specific GUID format. */
> +
> +
> +/* ------- Known Region Table GUIDs ---------------------- */
> +static const ms_guid bat_guid = { .data1 = 0x2dc27766,
> + .data2 = 0xf623,
> + .data3 = 0x4200,
> + .data4 = { 0x9d, 0x64, 0x11, 0x5e,
> + 0x9b, 0xfd, 0x4a, 0x08} };
> +
> +static const ms_guid metadata_guid = { .data1 = 0x8b7ca206,
> + .data2 = 0x4790,
> + .data3 = 0x4b9a,
> + .data4 = { 0xb8, 0xfe, 0x57, 0x5f,
> + 0x05, 0x0f, 0x88, 0x6e} };
> +
> +
> +
> +/* ------- Known Metadata Entry GUIDs ---------------------- */
> +static const ms_guid file_param_guid = { .data1 = 0xcaa16737,
> + .data2 = 0xfa36,
> + .data3 = 0x4d43,
> + .data4 = { 0xb3, 0xb6, 0x33, 0xf0,
> + 0xaa, 0x44, 0xe7, 0x6b} };
> +
> +static const ms_guid virtual_size_guid = { .data1 = 0x2FA54224,
> + .data2 = 0xcd1b,
> + .data3 = 0x4876,
> + .data4 = { 0xb2, 0x11, 0x5d, 0xbe,
> + 0xd8, 0x3b, 0xf4, 0xb8} };
> +
> +static const ms_guid page83_guid = { .data1 = 0xbeca12ab,
> + .data2 = 0xb2e6,
> + .data3 = 0x4523,
> + .data4 = { 0x93, 0xef, 0xc3, 0x09,
> + 0xe0, 0x00, 0xc7, 0x46} };
> +
> +
> +static const ms_guid phys_sector_guid = { .data1 = 0xcda348c7,
> + .data2 = 0x445d,
> + .data3 = 0x4471,
> + .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
> + 0x52, 0x51, 0xc5, 0x56} };
> +
> +static const ms_guid parent_locator_guid = { .data1 = 0xa8d35f2d,
> + .data2 = 0xb30b,
> + .data3 = 0x454d,
> + .data4 = { 0xab, 0xf7, 0xd3,
> + 0xd8, 0x48, 0x34,
> + 0xab, 0x0c} };
> +
> +static const ms_guid logical_sector_guid = { .data1 = 0x8141bf1d,
> + .data2 = 0xa96f,
> + .data3 = 0x4709,
> + .data4 = { 0xba, 0x47, 0xf2,
> + 0x33, 0xa8, 0xfa,
> + 0xab, 0x5f} };
> +
> +/* Each parent type must have a valid GUID; this is for parent images
> + * of type 'VHDX'. If we were to allow e.g. a QCOW2 parent, we would
> + * need to make up our own QCOW2 GUID type */
> +static const ms_guid parent_vhdx_guid = { .data1 = 0xb04aefb7,
> + .data2 = 0xd19e,
> + .data3 = 0x4a81,
> + .data4 = { 0xb7, 0x89, 0x25, 0xb8,
> + 0xe9, 0x44, 0x59, 0x13} };
> +
> +
> +#define META_FILE_PARAMETER_PRESENT 0x01
> +#define META_VIRTUAL_DISK_SIZE_PRESENT 0x02
> +#define META_PAGE_83_PRESENT 0x04
> +#define META_LOGICAL_SECTOR_SIZE_PRESENT 0x08
> +#define META_PHYS_SECTOR_SIZE_PRESENT 0x10
> +#define META_PARENT_LOCATOR_PRESENT 0x20
> +
> +#define META_ALL_PRESENT \
> + (META_FILE_PARAMETER_PRESENT | META_VIRTUAL_DISK_SIZE_PRESENT | \
> + META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
> + META_PHYS_SECTOR_SIZE_PRESENT)
> +
> +typedef struct vhdx_metadata_entries {
> + vhdx_metadata_table_entry file_parameters_entry;
> + vhdx_metadata_table_entry virtual_disk_size_entry;
> + vhdx_metadata_table_entry page83_data_entry;
> + vhdx_metadata_table_entry logical_sector_size_entry;
> + vhdx_metadata_table_entry phys_sector_size_entry;
> + vhdx_metadata_table_entry parent_locator_entry;
> + uint16_t present;
> +} vhdx_metadata_entries;
> +
> +
> +typedef struct BDRVVHDXState {
> + CoMutex lock;
> +
> + int curr_header;
> + vhdx_header *headers[2];
> +
> + vhdx_region_table_header rt;
> + vhdx_region_table_entry bat_rt; /* region table for the BAT */
> + vhdx_region_table_entry metadata_rt; /* region table for the metadata */
> +
> + vhdx_metadata_table_header metadata_hdr;
> + vhdx_metadata_entries metadata_entries;
> +
> + vhdx_file_parameters params;
> + uint32_t block_size;
> + uint32_t block_size_bits;
> + uint32_t sectors_per_block;
> + uint32_t sectors_per_block_bits;
> +
> + uint64_t virtual_disk_size;
> + uint32_t logical_sector_size;
> + uint32_t physical_sector_size;
> +
> + uint64_t chunk_ratio;
> + uint32_t chunk_ratio_bits;
> + uint32_t logical_sector_size_bits;
> +
> + uint32_t bat_entries;
> + vhdx_bat_entry *bat;
> + uint64_t bat_offset;
> +
> + vhdx_parent_locator_header parent_header;
> + vhdx_parent_locator_entry *parent_entries;
> +
> +} BDRVVHDXState;
> +
> +uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> + int crc_offset)
> +{
> + uint32_t crc_new;
> + uint32_t crc_orig;
> + assert(buf != NULL);
> +
> + if (crc_offset > 0) {
> + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> + memset(buf+crc_offset, 0, sizeof(crc_orig));
> + }
> +
> + crc_new = crc32c(crc, buf, size);
> + if (crc_offset > 0) {
> + memcpy(buf+crc_offset, &crc_orig, sizeof(crc_orig));
> + }
> +
> + return crc_new;
> +}
> +
> +/* Validates the checksum of the buffer, with an in-place CRC.
> + *
> + * Zero is substituted during crc calculation for the original crc field,
> + * and the crc field is restored afterwards. But the buffer will be modifed
> + * during the calculation, so this may not be not suitable for multi-threaded
> + * use.
> + *
> + * crc_offset: byte offset in buf of the buffer crc
> + * buf: buffer pointer
> + * size: size of buffer (must be > crc_offset+4)
> + *
> + * returns true if checksum is valid, false otherwise
> + */
> +bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
> +{
> + uint32_t crc_orig;
> + uint32_t crc;
> +
> + assert(buf != NULL);
> + assert(size > (crc_offset+4));
> +
> + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> + crc_orig = le32_to_cpu(crc_orig);
> +
> + crc = vhdx_checksum_calc(0xffffffff, buf, size, crc_offset);
> +
> + return crc == crc_orig;
> +}
> +
> +
> +/*
> + * Per the MS VHDX Specification, for every VHDX file:
> + * - The header section is fixed size - 1 MB
> + * - The header section is always the first "object"
> + * - The first 64KB of the header is the File Identifier
> + * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
> + * - The following 512 bytes constitute a UTF-16 string identifiying the
> + * software that created the file, and is optional and diagnostic only.
> + *
> + * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
> + */
> +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> + if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
> + return 100;
> + }
> + return 0;
> +}
> +
> +/* All VHDX structures on disk are little endian */
> +static void vhdx_header_le_import(vhdx_header *h)
> +{
> + assert(h != NULL);
> +
> + le32_to_cpus(&h->signature);
> + le32_to_cpus(&h->checksum);
> + le64_to_cpus(&h->sequence_number);
> +
> + leguid_to_cpus(&h->file_write_guid);
> + leguid_to_cpus(&h->data_write_guid);
> + leguid_to_cpus(&h->log_guid);
> +
> + le16_to_cpus(&h->log_version);
> + le16_to_cpus(&h->version);
> + le32_to_cpus(&h->log_length);
> + le64_to_cpus(&h->log_offset);
> +}
> +
> +
> +/* opens the specified header block from the VHDX file header section */
> +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + vhdx_header *header1;
> + vhdx_header *header2;
> + uint64_t h1_seq = 0;
> + uint64_t h2_seq = 0;
> + uint8_t *buffer;
Makes sense to use uint8_t* (or possible rather void*) here, but it
means that struct vhdx_header_padded is unused and could be dropped from
the header file.
> + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> +
> + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> +
> + s->headers[0] = header1;
> + s->headers[1] = header2;
The spec requires that the signature field of the File Type Identifier
must be validated when loading a VHDX file. We only have the check in
vhdx_probe(), but not in vhdx_open().
(This means that our struct vhdx_file_identifier is unused.)
> + /* We have to read the whole VHDX_HEADER_SIZE instead of
> + * sizeof(vhdx_header), because the checksum is over the whole
> + * region */
> + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header1, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header1);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header1->signature == VHDX_HDR_MAGIC) {
> + h1_seq = header1->sequence_number;
> + }
> +
> + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header2, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header2);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header2->signature == VHDX_HDR_MAGIC) {
> + h2_seq = header2->sequence_number;
> + }
> +
> + if (h1_seq > h2_seq) {
> + s->curr_header = 0;
> + } else if (h2_seq > h1_seq) {
> + s->curr_header = 1;
> + } else {
> + printf("NO VALID HEADER\n");
> + ret = -EINVAL;
> + goto fail;
> + }
This may be nit-picking, but according to the spec, I think a header
with a sequence number of 0 can be valid (if the other header is
invalid). This code would get 0 for both and error out.
What does a new VHDX file look like in practice? Sequence numbers 1
and 2?
> +
> + ret = 0;
> +
> + goto exit;
> +
> +fail:
> + qemu_vfree(header1);
> + qemu_vfree(header2);
> + s->headers[0] = NULL;
> + s->headers[1] = NULL;
> +exit:
> + qemu_vfree(buffer);
> + return ret;
> +}
> +
> +
> +static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + uint8_t *buffer;
> + int offset = 0;
> + vhdx_region_table_entry rt_entry;
> + int i;
> +
> + /* We have to read the whole 64KB block, because the crc32 is over the
> + * whole block */
> + buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
> +
> + ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
> + VHDX_HEADER_BLOCK_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + memcpy(&s->rt, buffer, sizeof(s->rt));
> + le32_to_cpus(&s->rt.signature);
> + le32_to_cpus(&s->rt.checksum);
> + le32_to_cpus(&s->rt.entry_count);
> + le32_to_cpus(&s->rt.reserved);
> + offset += sizeof(s->rt);
Maybe it would safer with respect to forgotting endianness conversion if
we didn't copy and then convert fields, but only copy converted values.
Like:
vhdx_region_table_header *rt = buffer;
s->rt = (vhdx_region_table_header) {
.signature = le32_to_cpu(rt->signature),
.checksum = le32_to_cpu(rt->checksum),
...
};
Matter of taste, I guess, so this is just a suggestion, not a request.
You could do this is quite a few places, I think.
> +
> + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> + s->rt.signature != VHDX_RT_MAGIC) {
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + for (i = 0; i < s->rt.entry_count; i++) {
> + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> + offset += sizeof(rt_entry);
> +
> + leguid_to_cpus(&rt_entry.guid);
> + le64_to_cpus(&rt_entry.file_offset);
> + le32_to_cpus(&rt_entry.length);
> + le32_to_cpus(&rt_entry.data_bits);
> +
> + /* see if we recognize the entry */
> + if (guid_eq(rt_entry.guid, bat_guid)) {
> + s->bat_rt = rt_entry;
> + continue;
> + }
> +
> + if (guid_eq(rt_entry.guid, metadata_guid)) {
> + s->metadata_rt = rt_entry;
> + continue;
> + }
If the same regions occurs multiple times, the latest wins. Such images
aren't valid, but what should we do with such cases? Is this good enough
or should we detect it?
> + if (rt_entry.data_bits & VHDX_REGION_ENTRY_REQUIRED) {
> + /* cannot read vhdx file - required region table entry that
> + * we do not understand. per spec, we must fail to open */
> + ret = -ENOTSUP;
> + goto fail;
> + }
> + }
> + ret = 0;
> +
> +fail:
> + qemu_vfree(buffer);
> + return ret;
> +}
> +
> +
> +
> +/* Metadata initial parser
> + *
> + * This loads all the metadata entry fields. This may cause additional
> + * fields to be processed (e.g. parent locator, etc..).
> + *
> + * There are 5 Metadata items that are always required:
> + * - File Parameters (block size, has a parent)
> + * - Virtual Disk Size (size, in bytes, of the virtual drive)
> + * - Page 83 Data (scsi page 83 guid)
> + * - Logical Sector Size (logical sector size in bytes, either 512 or
> + * 4096. We only support 512 currently)
> + * - Physical Sector Size (512 or 4096)
> + *
> + * Also, if the File Parameters indicate this is a differencing file,
> + * we must also look for the Parent Locator metadata item.
> + */
> +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + uint8_t *buffer;
> + int offset = 0;
> + int i = 0;
> + uint32_t block_size, sectors_per_block, logical_sector_size;
> + uint64_t chunk_ratio;
> + vhdx_metadata_table_entry md_entry;
> +
> + buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
> +
> + ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
> + VHDX_METADATA_TABLE_MAX_SIZE);
> + if (ret < 0) {
> + goto exit;
> + }
> + memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
> + offset += sizeof(s->metadata_hdr);
> +
> + le64_to_cpus(&s->metadata_hdr.signature);
> + le16_to_cpus(&s->metadata_hdr.reserved);
> + le16_to_cpus(&s->metadata_hdr.entry_count);
> +
> + if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + s->metadata_entries.present = 0;
> +
> + if ((s->metadata_hdr.entry_count * sizeof(md_entry)) >
> + (VHDX_METADATA_TABLE_MAX_SIZE - offset)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + for (i = 0; i < s->metadata_hdr.entry_count; i++) {
> + memcpy(&md_entry, buffer+offset, sizeof(md_entry));
> + offset += sizeof(md_entry);
> +
> + leguid_to_cpus(&md_entry.item_id);
> + le32_to_cpus(&md_entry.offset);
> + le32_to_cpus(&md_entry.length);
> + le32_to_cpus(&md_entry.data_bits);
> + le32_to_cpus(&md_entry.reserved2);
> +
> + if (guid_eq(md_entry.item_id, file_param_guid)) {
> + s->metadata_entries.file_parameters_entry = md_entry;
> + s->metadata_entries.present |= META_FILE_PARAMETER_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, virtual_size_guid)) {
> + s->metadata_entries.virtual_disk_size_entry = md_entry;
> + s->metadata_entries.present |= META_VIRTUAL_DISK_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, page83_guid)) {
> + s->metadata_entries.page83_data_entry = md_entry;
> + s->metadata_entries.present |= META_PAGE_83_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, logical_sector_guid)) {
> + s->metadata_entries.logical_sector_size_entry = md_entry;
> + s->metadata_entries.present |= META_LOGICAL_SECTOR_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, phys_sector_guid)) {
> + s->metadata_entries.phys_sector_size_entry = md_entry;
> + s->metadata_entries.present |= META_PHYS_SECTOR_SIZE_PRESENT;
> + continue;
> + }
> +
> + if (guid_eq(md_entry.item_id, parent_locator_guid)) {
> + s->metadata_entries.parent_locator_entry = md_entry;
> + s->metadata_entries.present |= META_PARENT_LOCATOR_PRESENT;
> + continue;
> + }
> +
> + if (md_entry.data_bits & VHDX_META_FLAGS_IS_REQUIRED) {
> + /* cannot read vhdx file - required region table entry that
> + * we do not understand. per spec, we must fail to open */
> + ret = -ENOTSUP;
> + goto exit;
> + }
> + }
> +
> + if (s->metadata_entries.present != META_ALL_PRESENT) {
> + ret = -ENOTSUP;
> + goto exit;
> + }
> +
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.file_parameters_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->params,
> + sizeof(s->params));
> +
> + if (ret < 0) {
> + goto exit;
> + }
> +
> + le32_to_cpus(&s->params.block_size);
> + le32_to_cpus(&s->params.data_bits);
> +
> +
> + /* We now have the file parameters, so we can tell if this is a
> + * differencing file (i.e.. has_parent), is dynamic or fixed
> + * sized (leave_blocks_allocated), and the block size */
> +
> + /* The parent locator required iff the file parameters has_parent set */
> + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
> + if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
Not sure what you're trying to achieve here. We get -ENOTSUP if any
metadata entry except the parent locator is present, and -EINVAL if
there is none?
Of course, this doesn't matter currently, because above we've verified
that s->metadata_entries.present == META_ALL_PRESENT, so we'll always
get the -ENOTSUP case.
> + /* TODO: parse parent locator fields */
> + ret = -ENOTSUP; /* temp, until differencing files are supported */
> + goto exit;
> + } else {
> + /* if has_parent is set, but there is not parent locator present,
> + * then that is an invalid combination */
> + ret = -EINVAL;
> + goto exit;
> + }
> + }
> +
> + /* determine virtual disk size, logical sector size,
> + * and phys sector size */
> +
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.virtual_disk_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->virtual_disk_size,
> + sizeof(uint64_t));
> + if (ret < 0) {
> + goto exit;
> + }
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.logical_sector_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->logical_sector_size,
> + sizeof(uint32_t));
> + if (ret < 0) {
> + goto exit;
> + }
> + ret = bdrv_pread(bs->file,
> + s->metadata_entries.phys_sector_size_entry.offset
> + + s->metadata_rt.file_offset,
> + &s->physical_sector_size,
> + sizeof(uint32_t));
> + if (ret < 0) {
> + goto exit;
> + }
> +
> + le64_to_cpus(&s->virtual_disk_size);
> + le32_to_cpus(&s->logical_sector_size);
> + le32_to_cpus(&s->physical_sector_size);
> +
> + if (s->logical_sector_size == 0 || s->params.block_size == 0) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + /* both block_size and sector_size are guaranteed powers of 2 */
> + s->sectors_per_block = s->params.block_size / s->logical_sector_size;
> + s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
> + (uint64_t)s->logical_sector_size /
> + (uint64_t)s->params.block_size;
> +
> + /* These values are ones we will want to use for division / multiplication
> + * later on, and they are all guaranteed (per the spec) to be powers of 2,
> + * so we can take advantage of that for shift operations during
> + * reads/writes */
> + logical_sector_size = s->logical_sector_size;
> + if (logical_sector_size & (logical_sector_size - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + sectors_per_block = s->sectors_per_block;
> + if (sectors_per_block & (sectors_per_block - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + chunk_ratio = s->chunk_ratio;
> + if (chunk_ratio & (chunk_ratio - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + block_size = s->params.block_size;
> + if (block_size & (block_size - 1)) {
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + while (logical_sector_size >>= 1) {
> + s->logical_sector_size_bits++;
> + }
> + while (sectors_per_block >>= 1) {
> + s->sectors_per_block_bits++;
> + }
> + while (chunk_ratio >>= 1) {
> + s->chunk_ratio_bits++;
> + }
> + while (block_size >>= 1) {
> + s->block_size_bits++;
> + }
> +
> + if (s->logical_sector_size != BDRV_SECTOR_SIZE) {
> + printf("VHDX error - QEMU only supports 512 byte sector sizes\n");
This is not true. It depends on the device model, and in particular the
qdev properties of it. The block layer doesn't have access to them, so
checking that it matches the image file isn't trivial. It would have to
be the device model that checks that the BlockDriverState is valid for
the device it emulates.
I suggest dropping this check; otherwise make it (q)error_report at
least.
> + ret = -ENOTSUP;
> + goto exit;
> + }
> +
> + ret = 0;
> +
> +exit:
> + qemu_vfree(buffer);
> + return ret;
> +}
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
2013-04-23 15:10 ` Kevin Wolf
2013-04-24 12:31 ` Stefan Hajnoczi
@ 2013-04-25 13:05 ` Kevin Wolf
2013-04-25 14:29 ` Jeff Cody
2 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2013-04-25 13:05 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> This is based on Microsoft's VHDX specification:
> "VHDX Format Specification v0.95", published 4/12/2012
> https://www.microsoft.com/en-us/download/details.aspx?id=29681
>
> These structures define the various header, metadata, and other
> block structures defined in the VHDX specification.
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> block/vhdx.h | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 327 insertions(+)
> create mode 100644 block/vhdx.h
> +typedef struct QEMU_PACKED vhdx_page83_data {
> + uint8_t page_83_data[16]; /* unique id for scsi devices that
> + support page 0x83 */
> +} vhdx_page83_data;
Why uint8_t[16] instead of ms_guid?
> +
> +typedef struct QEMU_PACKED vhdx_virtual_disk_logical_sector_size {
> + uint32_t logical_sector_size; /* virtual disk sector size (in bytes).
> + Can only be 512 or 4096 bytes */
> +} vhdx_virtual_disk_logical_sector_size;
> +
> +typedef struct QEMU_PACKED vhdx_virtual_disk_physical_sector_size {
> + uint32_t physical_sector_size; /* physical sector size (in bytes).
> + Can only be 512 or 4096 bytes */
> +} vhdx_virtual_disk_physical_sector_size;
> +
> +typedef struct QEMU_PACKED vhdx_parent_locator_header {
> + uint8_t locator_type[16]; /* type of the parent virtual disk. */
Same question here.
> + uint16_t reserved;
> + uint16_t key_value_count; /* number of key/value pairs for this
> + locator */
> +} vhdx_parent_locator_header;
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images
2013-04-25 13:05 ` Kevin Wolf
@ 2013-04-25 14:29 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-25 14:29 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, stefanha
On Thu, Apr 25, 2013 at 03:05:37PM +0200, Kevin Wolf wrote:
> Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > This is based on Microsoft's VHDX specification:
> > "VHDX Format Specification v0.95", published 4/12/2012
> > https://www.microsoft.com/en-us/download/details.aspx?id=29681
> >
> > These structures define the various header, metadata, and other
> > block structures defined in the VHDX specification.
> >
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> > block/vhdx.h | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 327 insertions(+)
> > create mode 100644 block/vhdx.h
>
> > +typedef struct QEMU_PACKED vhdx_page83_data {
> > + uint8_t page_83_data[16]; /* unique id for scsi devices that
> > + support page 0x83 */
> > +} vhdx_page83_data;
>
> Why uint8_t[16] instead of ms_guid?
>
> > +
> > +typedef struct QEMU_PACKED vhdx_virtual_disk_logical_sector_size {
> > + uint32_t logical_sector_size; /* virtual disk sector size (in bytes).
> > + Can only be 512 or 4096 bytes */
> > +} vhdx_virtual_disk_logical_sector_size;
> > +
> > +typedef struct QEMU_PACKED vhdx_virtual_disk_physical_sector_size {
> > + uint32_t physical_sector_size; /* physical sector size (in bytes).
> > + Can only be 512 or 4096 bytes */
> > +} vhdx_virtual_disk_physical_sector_size;
> > +
> > +typedef struct QEMU_PACKED vhdx_parent_locator_header {
> > + uint8_t locator_type[16]; /* type of the parent virtual disk. */
>
> Same question here.
>
You are right, those should both be ms_guid.
> > + uint16_t reserved;
> > + uint16_t key_value_count; /* number of key/value pairs for this
> > + locator */
> > +} vhdx_parent_locator_header;
>
> Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-25 13:04 ` Kevin Wolf
@ 2013-04-25 15:03 ` Jeff Cody
2013-04-25 16:52 ` Kevin Wolf
0 siblings, 1 reply; 30+ messages in thread
From: Jeff Cody @ 2013-04-25 15:03 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, stefanha
On Thu, Apr 25, 2013 at 03:04:23PM +0200, Kevin Wolf wrote:
> Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > This is the initial block driver framework for VHDX image support (
> > i.e. Hyper-V image file formats), that supports opening VHDX files, and
> > parsing the headers.
> >
> > This commit does not yet enable:
> > - reading
> > - writing
> > - updating the header
> > - differencing files (images with parents)
> > - log replay / dirty logs (only clean images)
> >
> > This is based on Microsoft's VHDX specification:
> > "VHDX Format Specification v0.95", published 4/12/2012
> > https://www.microsoft.com/en-us/download/details.aspx?id=29681
> >
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
>
> Okay, now I'm starting to actually compare the spec to your
> implementation. Comments inline.
>
> > block/Makefile.objs | 1 +
> > block/vhdx.c | 769 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > block/vhdx.h | 25 ++
> > 3 files changed, 795 insertions(+)
> > create mode 100644 block/vhdx.c
> >
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 6c4b5bc..5f0358a 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> > block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> > block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> > block-obj-y += qed-check.o
> > +block-obj-y += vhdx.o
> > block-obj-y += parallels.o blkdebug.o blkverify.o
> > block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > block-obj-$(CONFIG_POSIX) += raw-posix.o
> > diff --git a/block/vhdx.c b/block/vhdx.c
> > new file mode 100644
> > index 0000000..b0ea2ba
> > --- /dev/null
> > +++ b/block/vhdx.c
> > @@ -0,0 +1,769 @@
> > +/*
> > + * Block driver for Hyper-V VHDX Images
> > + *
> > + * Copyright (c) 2013 Red Hat, Inc.,
> > + *
> > + * Authors:
> > + * Jeff Cody <jcody@redhat.com>
> > + *
> > + * This is based on the "VHDX Format Specification v0.95", published 4/12/2012
> > + * by Microsoft:
> > + * https://www.microsoft.com/en-us/download/details.aspx?id=29681
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING.LIB file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu-common.h"
> > +#include "block/block_int.h"
> > +#include "qemu/module.h"
> > +#include "qemu/crc32c.h"
> > +#include "block/vhdx.h"
> > +
> > +
> > +/* Several metadata and region table data entries are identified by
> > + * guids in a MS-specific GUID format. */
> > +
> > +
> > +/* ------- Known Region Table GUIDs ---------------------- */
> > +static const ms_guid bat_guid = { .data1 = 0x2dc27766,
> > + .data2 = 0xf623,
> > + .data3 = 0x4200,
> > + .data4 = { 0x9d, 0x64, 0x11, 0x5e,
> > + 0x9b, 0xfd, 0x4a, 0x08} };
> > +
> > +static const ms_guid metadata_guid = { .data1 = 0x8b7ca206,
> > + .data2 = 0x4790,
> > + .data3 = 0x4b9a,
> > + .data4 = { 0xb8, 0xfe, 0x57, 0x5f,
> > + 0x05, 0x0f, 0x88, 0x6e} };
> > +
> > +
> > +
> > +/* ------- Known Metadata Entry GUIDs ---------------------- */
> > +static const ms_guid file_param_guid = { .data1 = 0xcaa16737,
> > + .data2 = 0xfa36,
> > + .data3 = 0x4d43,
> > + .data4 = { 0xb3, 0xb6, 0x33, 0xf0,
> > + 0xaa, 0x44, 0xe7, 0x6b} };
> > +
> > +static const ms_guid virtual_size_guid = { .data1 = 0x2FA54224,
> > + .data2 = 0xcd1b,
> > + .data3 = 0x4876,
> > + .data4 = { 0xb2, 0x11, 0x5d, 0xbe,
> > + 0xd8, 0x3b, 0xf4, 0xb8} };
> > +
> > +static const ms_guid page83_guid = { .data1 = 0xbeca12ab,
> > + .data2 = 0xb2e6,
> > + .data3 = 0x4523,
> > + .data4 = { 0x93, 0xef, 0xc3, 0x09,
> > + 0xe0, 0x00, 0xc7, 0x46} };
> > +
> > +
> > +static const ms_guid phys_sector_guid = { .data1 = 0xcda348c7,
> > + .data2 = 0x445d,
> > + .data3 = 0x4471,
> > + .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
> > + 0x52, 0x51, 0xc5, 0x56} };
> > +
> > +static const ms_guid parent_locator_guid = { .data1 = 0xa8d35f2d,
> > + .data2 = 0xb30b,
> > + .data3 = 0x454d,
> > + .data4 = { 0xab, 0xf7, 0xd3,
> > + 0xd8, 0x48, 0x34,
> > + 0xab, 0x0c} };
> > +
> > +static const ms_guid logical_sector_guid = { .data1 = 0x8141bf1d,
> > + .data2 = 0xa96f,
> > + .data3 = 0x4709,
> > + .data4 = { 0xba, 0x47, 0xf2,
> > + 0x33, 0xa8, 0xfa,
> > + 0xab, 0x5f} };
> > +
> > +/* Each parent type must have a valid GUID; this is for parent images
> > + * of type 'VHDX'. If we were to allow e.g. a QCOW2 parent, we would
> > + * need to make up our own QCOW2 GUID type */
> > +static const ms_guid parent_vhdx_guid = { .data1 = 0xb04aefb7,
> > + .data2 = 0xd19e,
> > + .data3 = 0x4a81,
> > + .data4 = { 0xb7, 0x89, 0x25, 0xb8,
> > + 0xe9, 0x44, 0x59, 0x13} };
> > +
> > +
> > +#define META_FILE_PARAMETER_PRESENT 0x01
> > +#define META_VIRTUAL_DISK_SIZE_PRESENT 0x02
> > +#define META_PAGE_83_PRESENT 0x04
> > +#define META_LOGICAL_SECTOR_SIZE_PRESENT 0x08
> > +#define META_PHYS_SECTOR_SIZE_PRESENT 0x10
> > +#define META_PARENT_LOCATOR_PRESENT 0x20
> > +
> > +#define META_ALL_PRESENT \
> > + (META_FILE_PARAMETER_PRESENT | META_VIRTUAL_DISK_SIZE_PRESENT | \
> > + META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
> > + META_PHYS_SECTOR_SIZE_PRESENT)
> > +
> > +typedef struct vhdx_metadata_entries {
> > + vhdx_metadata_table_entry file_parameters_entry;
> > + vhdx_metadata_table_entry virtual_disk_size_entry;
> > + vhdx_metadata_table_entry page83_data_entry;
> > + vhdx_metadata_table_entry logical_sector_size_entry;
> > + vhdx_metadata_table_entry phys_sector_size_entry;
> > + vhdx_metadata_table_entry parent_locator_entry;
> > + uint16_t present;
> > +} vhdx_metadata_entries;
> > +
> > +
> > +typedef struct BDRVVHDXState {
> > + CoMutex lock;
> > +
> > + int curr_header;
> > + vhdx_header *headers[2];
> > +
> > + vhdx_region_table_header rt;
> > + vhdx_region_table_entry bat_rt; /* region table for the BAT */
> > + vhdx_region_table_entry metadata_rt; /* region table for the metadata */
> > +
> > + vhdx_metadata_table_header metadata_hdr;
> > + vhdx_metadata_entries metadata_entries;
> > +
> > + vhdx_file_parameters params;
> > + uint32_t block_size;
> > + uint32_t block_size_bits;
> > + uint32_t sectors_per_block;
> > + uint32_t sectors_per_block_bits;
> > +
> > + uint64_t virtual_disk_size;
> > + uint32_t logical_sector_size;
> > + uint32_t physical_sector_size;
> > +
> > + uint64_t chunk_ratio;
> > + uint32_t chunk_ratio_bits;
> > + uint32_t logical_sector_size_bits;
> > +
> > + uint32_t bat_entries;
> > + vhdx_bat_entry *bat;
> > + uint64_t bat_offset;
> > +
> > + vhdx_parent_locator_header parent_header;
> > + vhdx_parent_locator_entry *parent_entries;
> > +
> > +} BDRVVHDXState;
> > +
> > +uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> > + int crc_offset)
> > +{
> > + uint32_t crc_new;
> > + uint32_t crc_orig;
> > + assert(buf != NULL);
> > +
> > + if (crc_offset > 0) {
> > + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> > + memset(buf+crc_offset, 0, sizeof(crc_orig));
> > + }
> > +
> > + crc_new = crc32c(crc, buf, size);
> > + if (crc_offset > 0) {
> > + memcpy(buf+crc_offset, &crc_orig, sizeof(crc_orig));
> > + }
> > +
> > + return crc_new;
> > +}
> > +
> > +/* Validates the checksum of the buffer, with an in-place CRC.
> > + *
> > + * Zero is substituted during crc calculation for the original crc field,
> > + * and the crc field is restored afterwards. But the buffer will be modifed
> > + * during the calculation, so this may not be not suitable for multi-threaded
> > + * use.
> > + *
> > + * crc_offset: byte offset in buf of the buffer crc
> > + * buf: buffer pointer
> > + * size: size of buffer (must be > crc_offset+4)
> > + *
> > + * returns true if checksum is valid, false otherwise
> > + */
> > +bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
> > +{
> > + uint32_t crc_orig;
> > + uint32_t crc;
> > +
> > + assert(buf != NULL);
> > + assert(size > (crc_offset+4));
> > +
> > + memcpy(&crc_orig, buf+crc_offset, sizeof(crc_orig));
> > + crc_orig = le32_to_cpu(crc_orig);
> > +
> > + crc = vhdx_checksum_calc(0xffffffff, buf, size, crc_offset);
> > +
> > + return crc == crc_orig;
> > +}
> > +
> > +
> > +/*
> > + * Per the MS VHDX Specification, for every VHDX file:
> > + * - The header section is fixed size - 1 MB
> > + * - The header section is always the first "object"
> > + * - The first 64KB of the header is the File Identifier
> > + * - The first uint64 (8 bytes) is the VHDX Signature ("vhdxfile")
> > + * - The following 512 bytes constitute a UTF-16 string identifiying the
> > + * software that created the file, and is optional and diagnostic only.
> > + *
> > + * Therefore, we probe by looking for the vhdxfile signature "vhdxfile"
> > + */
> > +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
> > +{
> > + if (buf_size >= 8 && !memcmp(buf, "vhdxfile", 8)) {
> > + return 100;
> > + }
> > + return 0;
> > +}
> > +
> > +/* All VHDX structures on disk are little endian */
> > +static void vhdx_header_le_import(vhdx_header *h)
> > +{
> > + assert(h != NULL);
> > +
> > + le32_to_cpus(&h->signature);
> > + le32_to_cpus(&h->checksum);
> > + le64_to_cpus(&h->sequence_number);
> > +
> > + leguid_to_cpus(&h->file_write_guid);
> > + leguid_to_cpus(&h->data_write_guid);
> > + leguid_to_cpus(&h->log_guid);
> > +
> > + le16_to_cpus(&h->log_version);
> > + le16_to_cpus(&h->version);
> > + le32_to_cpus(&h->log_length);
> > + le64_to_cpus(&h->log_offset);
> > +}
> > +
> > +
> > +/* opens the specified header block from the VHDX file header section */
> > +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + vhdx_header *header1;
> > + vhdx_header *header2;
> > + uint64_t h1_seq = 0;
> > + uint64_t h2_seq = 0;
> > + uint8_t *buffer;
>
> Makes sense to use uint8_t* (or possible rather void*) here, but it
> means that struct vhdx_header_padded is unused and could be dropped from
> the header file.
>
OK, I'll drop it from the header file (or maybe convert the meaning of
it into a comment in the header, to preserve the info).
> > + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> > + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> > +
> > + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> > +
> > + s->headers[0] = header1;
> > + s->headers[1] = header2;
>
> The spec requires that the signature field of the File Type Identifier
> must be validated when loading a VHDX file. We only have the check in
> vhdx_probe(), but not in vhdx_open().
>
Yes, true - I'll add that in first.
> (This means that our struct vhdx_file_identifier is unused.)
>
> > + /* We have to read the whole VHDX_HEADER_SIZE instead of
> > + * sizeof(vhdx_header), because the checksum is over the whole
> > + * region */
> > + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header1, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header1);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header1->signature == VHDX_HDR_MAGIC) {
> > + h1_seq = header1->sequence_number;
> > + }
> > +
> > + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header2, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header2);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header2->signature == VHDX_HDR_MAGIC) {
> > + h2_seq = header2->sequence_number;
> > + }
> > +
> > + if (h1_seq > h2_seq) {
> > + s->curr_header = 0;
> > + } else if (h2_seq > h1_seq) {
> > + s->curr_header = 1;
> > + } else {
> > + printf("NO VALID HEADER\n");
> > + ret = -EINVAL;
> > + goto fail;
> > + }
>
> This may be nit-picking, but according to the spec, I think a header
> with a sequence number of 0 can be valid (if the other header is
> invalid). This code would get 0 for both and error out.
>
I think you are right. The other header would need to have either an
invalid signature or checksum, but if that was the case technically 0
could indicate the active header.
> What does a new VHDX file look like in practice? Sequence numbers 1
> and 2?
>
I'll check that out - my Hyper-V test server is at home, and I am
working at a co-working place today. I'll check it this evening -
it'll have to be on a freshly created image before even opening it up
with Hyper-V.
> > +
> > + ret = 0;
> > +
> > + goto exit;
> > +
> > +fail:
> > + qemu_vfree(header1);
> > + qemu_vfree(header2);
> > + s->headers[0] = NULL;
> > + s->headers[1] = NULL;
> > +exit:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
> > +
> > +
> > +static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + uint8_t *buffer;
> > + int offset = 0;
> > + vhdx_region_table_entry rt_entry;
> > + int i;
> > +
> > + /* We have to read the whole 64KB block, because the crc32 is over the
> > + * whole block */
> > + buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
> > +
> > + ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
> > + VHDX_HEADER_BLOCK_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + memcpy(&s->rt, buffer, sizeof(s->rt));
> > + le32_to_cpus(&s->rt.signature);
> > + le32_to_cpus(&s->rt.checksum);
> > + le32_to_cpus(&s->rt.entry_count);
> > + le32_to_cpus(&s->rt.reserved);
> > + offset += sizeof(s->rt);
>
> Maybe it would safer with respect to forgotting endianness conversion if
> we didn't copy and then convert fields, but only copy converted values.
> Like:
>
> vhdx_region_table_header *rt = buffer;
> s->rt = (vhdx_region_table_header) {
> .signature = le32_to_cpu(rt->signature),
> .checksum = le32_to_cpu(rt->checksum),
> ...
> };
>
> Matter of taste, I guess, so this is just a suggestion, not a request.
> You could do this is quite a few places, I think.
>
> > +
> > + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> > + s->rt.signature != VHDX_RT_MAGIC) {
> > + ret = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + for (i = 0; i < s->rt.entry_count; i++) {
> > + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> > + offset += sizeof(rt_entry);
> > +
> > + leguid_to_cpus(&rt_entry.guid);
> > + le64_to_cpus(&rt_entry.file_offset);
> > + le32_to_cpus(&rt_entry.length);
> > + le32_to_cpus(&rt_entry.data_bits);
> > +
> > + /* see if we recognize the entry */
> > + if (guid_eq(rt_entry.guid, bat_guid)) {
> > + s->bat_rt = rt_entry;
> > + continue;
> > + }
> > +
> > + if (guid_eq(rt_entry.guid, metadata_guid)) {
> > + s->metadata_rt = rt_entry;
> > + continue;
> > + }
>
> If the same regions occurs multiple times, the latest wins. Such images
> aren't valid, but what should we do with such cases? Is this good enough
> or should we detect it?
I think such images are more undefined rather than explicitly invalid.
I don't think the spec touches on the idea of multiple regions of the
same type. That said, I don't what to make of an image file with
multiple BAT regions, for instance - I think error is the only sane
option.
Maybe keep a list of all entries, and if there are duplicates error
out with -EINVAL?
>
> > + if (rt_entry.data_bits & VHDX_REGION_ENTRY_REQUIRED) {
> > + /* cannot read vhdx file - required region table entry that
> > + * we do not understand. per spec, we must fail to open */
> > + ret = -ENOTSUP;
> > + goto fail;
> > + }
> > + }
> > + ret = 0;
> > +
> > +fail:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
> > +
> > +
> > +
> > +/* Metadata initial parser
> > + *
> > + * This loads all the metadata entry fields. This may cause additional
> > + * fields to be processed (e.g. parent locator, etc..).
> > + *
> > + * There are 5 Metadata items that are always required:
> > + * - File Parameters (block size, has a parent)
> > + * - Virtual Disk Size (size, in bytes, of the virtual drive)
> > + * - Page 83 Data (scsi page 83 guid)
> > + * - Logical Sector Size (logical sector size in bytes, either 512 or
> > + * 4096. We only support 512 currently)
> > + * - Physical Sector Size (512 or 4096)
> > + *
> > + * Also, if the File Parameters indicate this is a differencing file,
> > + * we must also look for the Parent Locator metadata item.
> > + */
> > +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + uint8_t *buffer;
> > + int offset = 0;
> > + int i = 0;
> > + uint32_t block_size, sectors_per_block, logical_sector_size;
> > + uint64_t chunk_ratio;
> > + vhdx_metadata_table_entry md_entry;
> > +
> > + buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
> > +
> > + ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
> > + VHDX_METADATA_TABLE_MAX_SIZE);
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
> > + offset += sizeof(s->metadata_hdr);
> > +
> > + le64_to_cpus(&s->metadata_hdr.signature);
> > + le16_to_cpus(&s->metadata_hdr.reserved);
> > + le16_to_cpus(&s->metadata_hdr.entry_count);
> > +
> > + if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + s->metadata_entries.present = 0;
> > +
> > + if ((s->metadata_hdr.entry_count * sizeof(md_entry)) >
> > + (VHDX_METADATA_TABLE_MAX_SIZE - offset)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + for (i = 0; i < s->metadata_hdr.entry_count; i++) {
> > + memcpy(&md_entry, buffer+offset, sizeof(md_entry));
> > + offset += sizeof(md_entry);
> > +
> > + leguid_to_cpus(&md_entry.item_id);
> > + le32_to_cpus(&md_entry.offset);
> > + le32_to_cpus(&md_entry.length);
> > + le32_to_cpus(&md_entry.data_bits);
> > + le32_to_cpus(&md_entry.reserved2);
> > +
> > + if (guid_eq(md_entry.item_id, file_param_guid)) {
> > + s->metadata_entries.file_parameters_entry = md_entry;
> > + s->metadata_entries.present |= META_FILE_PARAMETER_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, virtual_size_guid)) {
> > + s->metadata_entries.virtual_disk_size_entry = md_entry;
> > + s->metadata_entries.present |= META_VIRTUAL_DISK_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, page83_guid)) {
> > + s->metadata_entries.page83_data_entry = md_entry;
> > + s->metadata_entries.present |= META_PAGE_83_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, logical_sector_guid)) {
> > + s->metadata_entries.logical_sector_size_entry = md_entry;
> > + s->metadata_entries.present |= META_LOGICAL_SECTOR_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, phys_sector_guid)) {
> > + s->metadata_entries.phys_sector_size_entry = md_entry;
> > + s->metadata_entries.present |= META_PHYS_SECTOR_SIZE_PRESENT;
> > + continue;
> > + }
> > +
> > + if (guid_eq(md_entry.item_id, parent_locator_guid)) {
> > + s->metadata_entries.parent_locator_entry = md_entry;
> > + s->metadata_entries.present |= META_PARENT_LOCATOR_PRESENT;
> > + continue;
> > + }
> > +
> > + if (md_entry.data_bits & VHDX_META_FLAGS_IS_REQUIRED) {
> > + /* cannot read vhdx file - required region table entry that
> > + * we do not understand. per spec, we must fail to open */
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > + }
> > +
> > + if (s->metadata_entries.present != META_ALL_PRESENT) {
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > +
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.file_parameters_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->params,
> > + sizeof(s->params));
> > +
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > +
> > + le32_to_cpus(&s->params.block_size);
> > + le32_to_cpus(&s->params.data_bits);
> > +
> > +
> > + /* We now have the file parameters, so we can tell if this is a
> > + * differencing file (i.e.. has_parent), is dynamic or fixed
> > + * sized (leave_blocks_allocated), and the block size */
> > +
> > + /* The parent locator required iff the file parameters has_parent set */
> > + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
> > + if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
>
> Not sure what you're trying to achieve here. We get -ENOTSUP if any
> metadata entry except the parent locator is present, and -EINVAL if
> there is none?
>
That is a mistake - that check should be sans tilde:
if (s->metadata_entries.present & META_PARENT_LOCATOR_PRESENT)
There are currently 2 failure cases (while parent locators are
unsupported):
1.) There is a parent present, and a parent locator field.
- This is proper per the spec. Normally, we would parse
the parent locator field. But this is not supported, so
we return -ENOTSUP.
2.) There is a parent present, but not parent locator field.
- This is invalid, because if the parent is present the
parent locator field per spec is a required field. So,
return -EINVAL to indicate invalid field.
> Of course, this doesn't matter currently, because above we've verified
> that s->metadata_entries.present == META_ALL_PRESENT, so we'll always
> get the -ENOTSUP case.
>
Once we support parent locator fields, that check should be masked
against META_ALL_PRESENT, so that we are only checking for the base
required fields. It wouldn't hurt to got ahead and do it now.
> > + /* TODO: parse parent locator fields */
> > + ret = -ENOTSUP; /* temp, until differencing files are supported */
> > + goto exit;
> > + } else {
> > + /* if has_parent is set, but there is not parent locator present,
> > + * then that is an invalid combination */
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + }
> > +
> > + /* determine virtual disk size, logical sector size,
> > + * and phys sector size */
> > +
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.virtual_disk_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->virtual_disk_size,
> > + sizeof(uint64_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.logical_sector_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->logical_sector_size,
> > + sizeof(uint32_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > + ret = bdrv_pread(bs->file,
> > + s->metadata_entries.phys_sector_size_entry.offset
> > + + s->metadata_rt.file_offset,
> > + &s->physical_sector_size,
> > + sizeof(uint32_t));
> > + if (ret < 0) {
> > + goto exit;
> > + }
> > +
> > + le64_to_cpus(&s->virtual_disk_size);
> > + le32_to_cpus(&s->logical_sector_size);
> > + le32_to_cpus(&s->physical_sector_size);
> > +
> > + if (s->logical_sector_size == 0 || s->params.block_size == 0) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + /* both block_size and sector_size are guaranteed powers of 2 */
> > + s->sectors_per_block = s->params.block_size / s->logical_sector_size;
> > + s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
> > + (uint64_t)s->logical_sector_size /
> > + (uint64_t)s->params.block_size;
> > +
> > + /* These values are ones we will want to use for division / multiplication
> > + * later on, and they are all guaranteed (per the spec) to be powers of 2,
> > + * so we can take advantage of that for shift operations during
> > + * reads/writes */
> > + logical_sector_size = s->logical_sector_size;
> > + if (logical_sector_size & (logical_sector_size - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + sectors_per_block = s->sectors_per_block;
> > + if (sectors_per_block & (sectors_per_block - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + chunk_ratio = s->chunk_ratio;
> > + if (chunk_ratio & (chunk_ratio - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + block_size = s->params.block_size;
> > + if (block_size & (block_size - 1)) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + while (logical_sector_size >>= 1) {
> > + s->logical_sector_size_bits++;
> > + }
> > + while (sectors_per_block >>= 1) {
> > + s->sectors_per_block_bits++;
> > + }
> > + while (chunk_ratio >>= 1) {
> > + s->chunk_ratio_bits++;
> > + }
> > + while (block_size >>= 1) {
> > + s->block_size_bits++;
> > + }
> > +
> > + if (s->logical_sector_size != BDRV_SECTOR_SIZE) {
> > + printf("VHDX error - QEMU only supports 512 byte sector sizes\n");
>
> This is not true. It depends on the device model, and in particular the
> qdev properties of it. The block layer doesn't have access to them, so
> checking that it matches the image file isn't trivial. It would have to
> be the device model that checks that the BlockDriverState is valid for
> the device it emulates.
>
> I suggest dropping this check; otherwise make it (q)error_report at
> least.
>
OK.
> > + ret = -ENOTSUP;
> > + goto exit;
> > + }
> > +
> > + ret = 0;
> > +
> > +exit:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
>
> Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-25 15:03 ` Jeff Cody
@ 2013-04-25 16:52 ` Kevin Wolf
0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2013-04-25 16:52 UTC (permalink / raw)
To: Jeff Cody; +Cc: qemu-devel, stefanha
Am 25.04.2013 um 17:03 hat Jeff Cody geschrieben:
> On Thu, Apr 25, 2013 at 03:04:23PM +0200, Kevin Wolf wrote:
> > Am 23.04.2013 um 16:24 hat Jeff Cody geschrieben:
> > > + if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
> > > + s->rt.signature != VHDX_RT_MAGIC) {
> > > + ret = -EINVAL;
> > > + goto fail;
> > > + }
> > > +
> > > + for (i = 0; i < s->rt.entry_count; i++) {
> > > + memcpy(&rt_entry, buffer+offset, sizeof(rt_entry));
> > > + offset += sizeof(rt_entry);
> > > +
> > > + leguid_to_cpus(&rt_entry.guid);
> > > + le64_to_cpus(&rt_entry.file_offset);
> > > + le32_to_cpus(&rt_entry.length);
> > > + le32_to_cpus(&rt_entry.data_bits);
> > > +
> > > + /* see if we recognize the entry */
> > > + if (guid_eq(rt_entry.guid, bat_guid)) {
> > > + s->bat_rt = rt_entry;
> > > + continue;
> > > + }
> > > +
> > > + if (guid_eq(rt_entry.guid, metadata_guid)) {
> > > + s->metadata_rt = rt_entry;
> > > + continue;
> > > + }
> >
> > If the same regions occurs multiple times, the latest wins. Such images
> > aren't valid, but what should we do with such cases? Is this good enough
> > or should we detect it?
>
> I think such images are more undefined rather than explicitly invalid.
> I don't think the spec touches on the idea of multiple regions of the
> same type.
It says "The Guid field specifies a 128-bit identifier for the object
and must be unique within the table.", so it's explicitly forbidden to
have two regions of the same type.
> That said, I don't what to make of an image file with
> multiple BAT regions, for instance - I think error is the only sane
> option.
>
> Maybe keep a list of all entries, and if there are duplicates error
> out with -EINVAL?
We could do that if we really care. Or just check for the ones that we
recognise that they aren't set yet when we assign them.
> > > + /* We now have the file parameters, so we can tell if this is a
> > > + * differencing file (i.e.. has_parent), is dynamic or fixed
> > > + * sized (leave_blocks_allocated), and the block size */
> > > +
> > > + /* The parent locator required iff the file parameters has_parent set */
> > > + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
> > > + if (s->metadata_entries.present & ~META_PARENT_LOCATOR_PRESENT) {
> >
> > Not sure what you're trying to achieve here. We get -ENOTSUP if any
> > metadata entry except the parent locator is present, and -EINVAL if
> > there is none?
> >
>
> That is a mistake - that check should be sans tilde:
> if (s->metadata_entries.present & META_PARENT_LOCATOR_PRESENT)
Right, that makes more sense.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
` (2 preceding siblings ...)
2013-04-25 13:04 ` Kevin Wolf
@ 2013-04-28 7:29 ` Fam Zheng
2013-04-29 17:25 ` Jeff Cody
2013-04-28 9:58 ` Fam Zheng
4 siblings, 1 reply; 30+ messages in thread
From: Fam Zheng @ 2013-04-28 7:29 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
> +static void cpu_to_leguids(ms_guid *guid)
> +{
> + cpu_to_le32s(&guid->data1);
> + cpu_to_le16s(&guid->data2);
> + cpu_to_le16s(&guid->data3);
> +}
This one seems used in 5/5 only, so this patch fails compiling (defined
but not used with -Werror=unused-function). Maybe move to 5/5?
--
Fam
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
` (3 preceding siblings ...)
2013-04-28 7:29 ` Fam Zheng
@ 2013-04-28 9:58 ` Fam Zheng
2013-04-29 17:24 ` Jeff Cody
4 siblings, 1 reply; 30+ messages in thread
From: Fam Zheng @ 2013-04-28 9:58 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, 04/23 10:24, Jeff Cody wrote:
> +/* opens the specified header block from the VHDX file header section */
> +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> + int ret = 0;
> + vhdx_header *header1;
> + vhdx_header *header2;
> + uint64_t h1_seq = 0;
> + uint64_t h2_seq = 0;
> + uint8_t *buffer;
> +
> + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> +
> + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> +
> + s->headers[0] = header1;
> + s->headers[1] = header2;
> +
Logic for header1 and header2 are completely the same, IMO it might be
better to avoid code with a loop.
> + /* We have to read the whole VHDX_HEADER_SIZE instead of
> + * sizeof(vhdx_header), because the checksum is over the whole
> + * region */
> + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header1, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header1);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header1->signature == VHDX_HDR_MAGIC) {
> + h1_seq = header1->sequence_number;
> + }
Do we need to check the version here? As the specification page 15 says:
The Version field specifies the version of the VHDX format used
within the VHDX file. This field must be set to 1. If it is not, a
parser must not attempt to parse the file using the details from
this format specification.
> +
> + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + /* copy over just the relevant portion that we need */
> + memcpy(header2, buffer, sizeof(vhdx_header));
> + vhdx_header_le_import(header2);
> +
> + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> + header2->signature == VHDX_HDR_MAGIC) {
> + h2_seq = header2->sequence_number;
> + }
Same as above.
> +
> + if (h1_seq > h2_seq) {
> + s->curr_header = 0;
> + } else if (h2_seq > h1_seq) {
> + s->curr_header = 1;
> + } else {
> + printf("NO VALID HEADER\n");
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + ret = 0;
> +
> + goto exit;
> +
> +fail:
> + qemu_vfree(header1);
> + qemu_vfree(header2);
> + s->headers[0] = NULL;
> + s->headers[1] = NULL;
> +exit:
> + qemu_vfree(buffer);
> + return ret;
> +}
--
Fam
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images Jeff Cody
2013-04-24 14:47 ` Stefan Hajnoczi
@ 2013-04-28 10:05 ` Fam Zheng
2013-04-29 17:19 ` Jeff Cody
1 sibling, 1 reply; 30+ messages in thread
From: Fam Zheng @ 2013-04-28 10:05 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha
On Tue, 04/23 10:24, Jeff Cody wrote:
> if (flags & BDRV_O_RDWR) {
> - ret = -ENOTSUP;
> - goto fail;
> + vhdx_update_headers(bs, s, false);
Do we really have to update the header immediately on RW open? I assume
when implementing vhdx_co_writev, this is guaranteed to get called
before other write. In that case we don't need it here, which means we
don't change anything if no user write, even for RW opened file.
--
Fam
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images.
2013-04-28 10:05 ` Fam Zheng
@ 2013-04-29 17:19 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-29 17:19 UTC (permalink / raw)
To: qemu-devel, kwolf, stefanha
On Sun, Apr 28, 2013 at 06:05:53PM +0800, Fam Zheng wrote:
> On Tue, 04/23 10:24, Jeff Cody wrote:
> > if (flags & BDRV_O_RDWR) {
> > - ret = -ENOTSUP;
> > - goto fail;
> > + vhdx_update_headers(bs, s, false);
>
> Do we really have to update the header immediately on RW open? I assume
> when implementing vhdx_co_writev, this is guaranteed to get called
> before other write. In that case we don't need it here, which means we
> don't change anything if no user write, even for RW opened file.
>
There are two file modification GUIDS that need to be generated: a
'file write' guid, and a 'data write' guid. The data write GUID
encompasses anything visible to the guest. The file write GUID
includes any modifications to metadata.
This will update the file write GUID in the header, but not the data
write GUID. According the spec, the file write GUID is for any data
modification, including metadata. So once the log support is
implemented, we could latch that this has not occurred and do it
during a log write if we wanted (since all metdata updates will go
through the log).
In the vhdx_co_writev patches from the RFC, it does writes the data
guid if it detects it is the first write.
However, the spec says (with regard to the file write guid):
"The parser can skip updating this field if the storage media on which
the file is stored is read-only, or if the file is opened in read-only
mode."
Which I think implies that we should go ahead and update the file
write guid field if the image is opened r/w. This would also
encompass any modifications to resizing the image, changing backing
files (once differencing files are supported), which would not go
through vhdx_co_writev().
All that said, I am getting ready to submit v3, and I am dropping this
patch from the series for now.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-28 9:58 ` Fam Zheng
@ 2013-04-29 17:24 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-29 17:24 UTC (permalink / raw)
To: qemu-devel, kwolf, stefanha
On Sun, Apr 28, 2013 at 05:58:55PM +0800, Fam Zheng wrote:
> On Tue, 04/23 10:24, Jeff Cody wrote:
> > +/* opens the specified header block from the VHDX file header section */
> > +static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > + int ret = 0;
> > + vhdx_header *header1;
> > + vhdx_header *header2;
> > + uint64_t h1_seq = 0;
> > + uint64_t h2_seq = 0;
> > + uint8_t *buffer;
> > +
> > + header1 = qemu_blockalign(bs, sizeof(vhdx_header));
> > + header2 = qemu_blockalign(bs, sizeof(vhdx_header));
> > +
> > + buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> > +
> > + s->headers[0] = header1;
> > + s->headers[1] = header2;
> > +
> Logic for header1 and header2 are completely the same, IMO it might be
> better to avoid code with a loop.
> > + /* We have to read the whole VHDX_HEADER_SIZE instead of
> > + * sizeof(vhdx_header), because the checksum is over the whole
> > + * region */
> > + ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header1, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header1);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header1->signature == VHDX_HDR_MAGIC) {
> > + h1_seq = header1->sequence_number;
> > + }
> Do we need to check the version here? As the specification page 15 says:
>
> The Version field specifies the version of the VHDX format used
> within the VHDX file. This field must be set to 1. If it is not, a
> parser must not attempt to parse the file using the details from
> this format specification.
>
Yes, you are correct - I will add that in for v3.
> > +
> > + ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
> > + if (ret < 0) {
> > + goto fail;
> > + }
> > + /* copy over just the relevant portion that we need */
> > + memcpy(header2, buffer, sizeof(vhdx_header));
> > + vhdx_header_le_import(header2);
> > +
> > + if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
> > + header2->signature == VHDX_HDR_MAGIC) {
> > + h2_seq = header2->sequence_number;
> > + }
> Same as above.
> > +
> > + if (h1_seq > h2_seq) {
> > + s->curr_header = 0;
> > + } else if (h2_seq > h1_seq) {
> > + s->curr_header = 1;
> > + } else {
> > + printf("NO VALID HEADER\n");
> > + ret = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + ret = 0;
> > +
> > + goto exit;
> > +
> > +fail:
> > + qemu_vfree(header1);
> > + qemu_vfree(header2);
> > + s->headers[0] = NULL;
> > + s->headers[1] = NULL;
> > +exit:
> > + qemu_vfree(buffer);
> > + return ret;
> > +}
>
> --
> Fam
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe
2013-04-28 7:29 ` Fam Zheng
@ 2013-04-29 17:25 ` Jeff Cody
0 siblings, 0 replies; 30+ messages in thread
From: Jeff Cody @ 2013-04-29 17:25 UTC (permalink / raw)
To: qemu-devel, kwolf, stefanha
On Sun, Apr 28, 2013 at 03:29:14PM +0800, Fam Zheng wrote:
> > +static void cpu_to_leguids(ms_guid *guid)
> > +{
> > + cpu_to_le32s(&guid->data1);
> > + cpu_to_le16s(&guid->data2);
> > + cpu_to_le16s(&guid->data3);
> > +}
>
> This one seems used in 5/5 only, so this patch fails compiling (defined
> but not used with -Werror=unused-function). Maybe move to 5/5?
>
Thanks. Yes, that should be in patch 5 (git rebase user error there),
which means this will go away completely for v3.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2013-04-29 17:25 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-23 14:24 [Qemu-devel] [PATCH v2 0/5] Initial VHDX support Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 1/5] qemu: add castagnoli crc32c checksum algorithm Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 2/5] block: vhdx header for the QEMU support of VHDX images Jeff Cody
2013-04-23 15:10 ` Kevin Wolf
2013-04-23 16:32 ` Jeff Cody
2013-04-24 12:31 ` Stefan Hajnoczi
2013-04-24 12:34 ` Jeff Cody
2013-04-25 13:05 ` Kevin Wolf
2013-04-25 14:29 ` Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 3/5] block: initial VHDX driver support framework - supports open and probe Jeff Cody
2013-04-23 15:46 ` Kevin Wolf
2013-04-23 16:11 ` Jeff Cody
2013-04-23 16:18 ` Kevin Wolf
2013-04-24 13:21 ` Stefan Hajnoczi
2013-04-24 13:40 ` Jeff Cody
2013-04-25 13:04 ` Kevin Wolf
2013-04-25 15:03 ` Jeff Cody
2013-04-25 16:52 ` Kevin Wolf
2013-04-28 7:29 ` Fam Zheng
2013-04-29 17:25 ` Jeff Cody
2013-04-28 9:58 ` Fam Zheng
2013-04-29 17:24 ` Jeff Cody
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 4/5] block: add read-only support to VHDX image format Jeff Cody
2013-04-24 14:38 ` Stefan Hajnoczi
2013-04-23 14:24 ` [Qemu-devel] [PATCH v2 5/5] block: add header update capability for VHDX images Jeff Cody
2013-04-24 14:47 ` Stefan Hajnoczi
2013-04-24 14:56 ` Jeff Cody
2013-04-25 7:20 ` Stefan Hajnoczi
2013-04-28 10:05 ` Fam Zheng
2013-04-29 17:19 ` Jeff Cody
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).