From: Gerhard Wiesinger <lists@wiesinger.com>
To: Fam Zheng <famcool@gmail.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] block.c, block/vmdk.c: Fixed major bug in VMDK WRITE and READ handling - FIXES DATA CORRUPTION
Date: Fri, 09 Nov 2012 07:51:37 +0100 [thread overview]
Message-ID: <509CA7F9.7020400@wiesinger.com> (raw)
In-Reply-To: <CABgc4wQ7GKhxHrHXzp0JvnYDfWOEAv4ABVFJujmp6RppLeauEg@mail.gmail.com>
Hello,
It was created with VMWare Server 2.0.x or maybe even with 1.x. under Linux.
Ciao,
Gerhard
On 09.11.2012 06:12, Fam Zheng wrote:
> Yes, I can confirm the presence of this bug and this is a valid fix.
>
> May I ask where is this kind of vmdk from? Because regularly we see
> extents in identical size.
>
> ---
> Best regards!
> Fam Zheng
>
>
> On Fri, Nov 9, 2012 at 3:05 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> Fixed a MAJOR BUG in VMDK files on file boundaries on reads
>> and ALSO ON WRITES WHICH MIGHT CORRUPT THE IMAGE AND DATA!!!!!!
>>
>> Triggered for example with the following VMDK file (partly listed):
>> # Extent description
>> RW 4193792 FLAT "XP-W1-f001.vmdk" 0
>> RW 2097664 FLAT "XP-W1-f002.vmdk" 0
>> RW 4193792 FLAT "XP-W1-f003.vmdk" 0
>> RW 512 FLAT "XP-W1-f004.vmdk" 0
>> RW 4193792 FLAT "XP-W1-f005.vmdk" 0
>> RW 2097664 FLAT "XP-W1-f006.vmdk" 0
>> RW 4193792 FLAT "XP-W1-f007.vmdk" 0
>> RW 512 FLAT "XP-W1-f008.vmdk" 0
>>
>> Patch includes:
>> 1.) Patch fixes wrong calculation on extent boundaries. Especially it fixes
>> the relativeness of the sector number to the current extent.
>> 2.) Added debug code to block.c and to block/vmdk.c to verify correctness
>> 3.) Also optimized code which avoids multiplication and uses shifts.
>>
>> Verfied correctness with:
>> 1.) Converted either with Virtualbox to VDI and then with qemu-img and then
>> with qemu-img only
>> VBoxManage clonehd --format vdi /VM/XP-W/new/XP-W1.vmdk
>> ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi
>> ./qemu-img convert -O raw ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi
>> /root/QEMU/VM-XP-W1/XP-W1-via-VBOX.img
>> md5sum /root/QEMU/VM-XP-W/XP-W1-direct.img
>> md5sum /root/QEMU/VM-XP-W/XP-W1-via-VBOX.img
>> => same MD5 hash
>> 2.) Verified debug log files
>> 3.) Run Windows XP successfully
>> 4.) chkdsk run successfully without any errors
>>
>> Signed-off-by: Gerhard Wiesinger <lists@wiesinger.com>
>> ---
>> block.c | 50 +++++++++++++++++++++++
>> block/vmdk.c | 129
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>> 2 files changed, 170 insertions(+), 9 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index da1fdca..69259f2 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -49,6 +49,12 @@
>> #include <windows.h>
>> #endif
>>
>> +#if 0
>> + #define DEBUG_BLOCK
>> +#endif
>> +
>> +#define DEBUG_BLOCK_PREFIX "BLOCK: "
>> +
>> #define NOT_DONE 0x7fffffff /* used while emulated sync operation in
>> progress */
>>
>> typedef enum {
>> @@ -789,6 +795,18 @@ int bdrv_open(BlockDriverState *bs, const char
>> *filename, int flags,
>> int ret;
>> char tmp_filename[PATH_MAX];
>>
>> +#ifdef DEBUG_BLOCK
>> + const char *format = "(nil)";
>> + if (drv) {
>> + format = drv->format_name;
>> + }
>> +
>> + printf(DEBUG_BLOCK
>> + "bdrv_open: filename=%s, BlockDriver=%p, format=%s\n",
>> + filename, drv, format
>> + );
>> +#endif
>> +
>> if (flags & BDRV_O_SNAPSHOT) {
>> BlockDriverState *bs1;
>> int64_t total_size;
>> @@ -2004,6 +2022,22 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t
>> sector_num, uint8_t *buf,
>> int bdrv_read(BlockDriverState *bs, int64_t sector_num,
>> uint8_t *buf, int nb_sectors)
>> {
>> +#ifdef DEBUG_BLOCK
>> + BlockDriver *drv = bs->drv;
>> + const char *format_name = "(nil)";
>> + if (drv) {
>> + if (drv->format_name) {
>> + format_name = drv->format_name;
>> + }
>> + }
>> +
>> + printf(DEBUG_BLOCK
>> + "bdrv_read: driver=%s, filename=%s, sector_num=%" PRId64
>> + ", nb_sectors=%i\n",
>> + format_name, bs->filename, sector_num, nb_sectors
>> + );
>> +#endif
>> +
>> return bdrv_rw_co(bs, sector_num, buf, nb_sectors, false);
>> }
>>
>> @@ -2060,6 +2094,22 @@ static void set_dirty_bitmap(BlockDriverState *bs,
>> int64_t sector_num,
>> int bdrv_write(BlockDriverState *bs, int64_t sector_num,
>> const uint8_t *buf, int nb_sectors)
>> {
>> +#ifdef DEBUG_BLOCK
>> + BlockDriver *drv = bs->drv;
>> + const char *format_name = "(nil)";
>> + if (drv) {
>> + if (drv->format_name) {
>> + format_name = drv->format_name;
>> + }
>> + }
>> +
>> + printf(DEBUG_BLOCK
>> + "bdrv_write: driver=%s, filename=%s, sector_num=%" PRId64
>> + ", nb_sectors=%i\n",
>> + format_name, bs->filename, sector_num, nb_sectors
>> + );
>> +#endif
>> +
>> return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true);
>> }
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index 1a80e5a..92ab92c 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -29,6 +29,15 @@
>> #include "migration.h"
>> #include <zlib.h>
>>
>> +#if 0
>> + #define DEBUG_VMDK
>> +#endif
>> +
>> +#define DEBUG_VMDK_PREFIX "VMDK: "
>> +#define DEBUG_VMDK_SEPARATOR \
>> + "########################################" \
>> + "########################################"
>> +
>> #define VMDK3_MAGIC (('C' << 24) | ('O' << 16) | ('W' << 8) | 'D')
>> #define VMDK4_MAGIC (('K' << 24) | ('D' << 16) | ('M' << 8) | 'V')
>> #define VMDK4_COMPRESSION_DEFLATE 1
>> @@ -377,6 +386,16 @@ static VmdkExtent *vmdk_add_extent(BlockDriverState
>> *bs,
>> VmdkExtent *extent;
>> BDRVVmdkState *s = bs->opaque;
>>
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "vmdk_add_extent: flat=%i, sectors=%" PRId64
>> + ", l1_backup_offset=%" PRId64
>> + ", l1_size=%u, l2_size=%i, cluster_sectors=%u\n",
>> + (int)flat, sectors, l1_backup_offset,
>> + l1_size, l2_size, cluster_sectors
>> + );
>> +#endif
>> +
>> s->extents = g_realloc(s->extents,
>> (s->num_extents + 1) * sizeof(VmdkExtent));
>> extent = &s->extents[s->num_extents];
>> @@ -674,6 +693,14 @@ static int vmdk_parse_extents(const char *desc,
>> BlockDriverState *bs,
>> return ret;
>> }
>>
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "valid extent found: access=%s, sectors=%" PRId64
>> + ", type=%s, filename=%s, flat_offset=%" PRId64 "\n",
>> + access, sectors, type, fname, flat_offset
>> + );
>> +#endif
>> +
>> /* save to extents array */
>> if (!strcmp(type, "FLAT")) {
>> /* FLAT extent */
>> @@ -704,6 +731,45 @@ next_line:
>> return 0;
>> }
>>
>> +#ifdef DEBUG_VMDK
>> +static void vmdk_debug_print_one_extent(VmdkExtent *extent)
>> +{
>> + printf(DEBUG_VMDK_PREFIX DEBUG_VMDK_SEPARATOR "\n");
>> + printf(DEBUG_VMDK_PREFIX "filename%s\n", extent->file->filename);
>> + printf(DEBUG_VMDK_PREFIX "flat=%i\n", (int)extent->flat);
>> + printf(DEBUG_VMDK_PREFIX "compressed=%i\n", (int)extent->compressed);
>> + printf(DEBUG_VMDK_PREFIX "has_marker=%i\n", (int)extent->has_marker);
>> + printf(DEBUG_VMDK_PREFIX "sectors=%" PRId64 "\n", extent->sectors);
>> + printf(DEBUG_VMDK_PREFIX "end_sector=%" PRId64 "\n",
>> extent->end_sector);
>> + printf(DEBUG_VMDK_PREFIX "flat_start_offset=%" PRId64 "\n",
>> + extent->flat_start_offset);
>> + printf(DEBUG_VMDK_PREFIX "l1_table_offsett=%" PRId64 "\n",
>> + extent->l1_table_offset);
>> + printf(DEBUG_VMDK_PREFIX "l1_backup_table_offsett=%"
>> + PRId64 "\n", extent->l1_backup_table_offset);
>> + printf(DEBUG_VMDK_PREFIX "l1_size=%u\n", extent->l1_size);
>> + printf(DEBUG_VMDK_PREFIX "l1_entry_sectors=%u\n",
>> extent->l1_entry_sectors);
>> + printf(DEBUG_VMDK_PREFIX "l2_size=%u\n", extent->l2_size);
>> + printf(DEBUG_VMDK_PREFIX "cluster_sectors=%u\n",
>> extent->cluster_sectors);
>> + fflush(stdout);
>> +}
>> +
>> +static void vmdk_debug_print_extents(BlockDriverState *bs)
>> +{
>> + BDRVVmdkState *s = bs->opaque;
>> + VmdkExtent *extent = &s->extents[0];
>> +
>> + printf(DEBUG_VMDK_PREFIX DEBUG_VMDK_SEPARATOR "\n");
>> + printf(DEBUG_VMDK_PREFIX "total_sectors=%" PRId64 "\n",
>> bs->total_sectors);
>> + printf(DEBUG_VMDK_PREFIX "num_extents=%u\n", s->num_extents);
>> +
>> + while (extent < &s->extents[s->num_extents]) {
>> + vmdk_debug_print_one_extent(extent);
>> + extent++;
>> + }
>> +}
>> +#endif
>> +
>> static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
>> int64_t desc_offset)
>> {
>> @@ -728,7 +794,13 @@ static int vmdk_open_desc_file(BlockDriverState *bs,
>> int flags,
>> return -ENOTSUP;
>> }
>> s->desc_offset = 0;
>> - return vmdk_parse_extents(buf, bs, bs->file->filename);
>> + ret = vmdk_parse_extents(buf, bs, bs->file->filename);
>> +
>> +#ifdef DEBUG_VMDK
>> + vmdk_debug_print_extents(bs);
>> +#endif
>> +
>> + return ret;
>> }
>>
>> static int vmdk_open(BlockDriverState *bs, int flags)
>> @@ -774,6 +846,14 @@ static int get_whole_cluster(BlockDriverState *bs,
>> /* 128 sectors * 512 bytes each = grain size 64KB */
>> uint8_t whole_grain[extent->cluster_sectors * 512];
>>
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "vmdk_get_whole_cluster: filename=%s, cluster_offset=%" PRId64
>> + ", offset=%" PRId64 "\n",
>> + extent->file->filename, cluster_offset, offset
>> + );
>> +#endif
>> +
>> /* we will be here if it's first write on non-exist grain(cluster).
>> * try to read from parent image, if exist */
>> if (bs->backing_hd) {
>> @@ -1032,18 +1112,35 @@ static int vmdk_read_extent(VmdkExtent *extent,
>> int64_t cluster_offset,
>> VmdkGrainMarker *marker;
>> uLongf buf_len;
>>
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "vmdk_read_extent: filename=%s, cluster_offset=%" PRId64
>> + ", offset_in_cluster=%" PRId64 ", nb_sectors=%i\n",
>> + extent->file->filename, cluster_offset,
>> + offset_in_cluster, nb_sectors
>> + );
>> +#endif
>>
>> if (!extent->compressed) {
>> +
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "vmdk_read_extent:bdrv_pread filename=%s, offset=%" PRId64
>> + ", length=%i\n",
>> + extent->file->filename, offset_in_cluster, nb_sectors << 9
>> + );
>> +#endif
>> +
>> ret = bdrv_pread(extent->file,
>> cluster_offset + offset_in_cluster,
>> - buf, nb_sectors * 512);
>> - if (ret == nb_sectors * 512) {
>> + buf, nb_sectors << 9);
>> + if (ret == nb_sectors << 9) {
>> return 0;
>> } else {
>> return -EIO;
>> }
>> }
>> - cluster_bytes = extent->cluster_sectors * 512;
>> + cluster_bytes = extent->cluster_sectors << 9;
>> /* Read two clusters in case GrainMarker + compressed data > one
>> cluster */
>> buf_bytes = cluster_bytes * 2;
>> cluster_buf = g_malloc(buf_bytes);
>> @@ -1092,9 +1189,18 @@ static int vmdk_read(BlockDriverState *bs, int64_t
>> sector_num,
>> BDRVVmdkState *s = bs->opaque;
>> int ret;
>> uint64_t n, index_in_cluster;
>> + uint64_t extent_begin_sector, extent_relative_sector_num;
>> VmdkExtent *extent = NULL;
>> uint64_t cluster_offset;
>>
>> +#ifdef DEBUG_VMDK
>> + printf(DEBUG_VMDK_PREFIX
>> + "vmdk_read: sectornum=%" PRId64
>> + ", nb_sectors=%i\n",
>> + sector_num, nb_sectors
>> + );
>> +#endif
>> +
>> while (nb_sectors > 0) {
>> extent = find_extent(s, sector_num, extent);
>> if (!extent) {
>> @@ -1103,7 +1209,9 @@ static int vmdk_read(BlockDriverState *bs, int64_t
>> sector_num,
>> ret = get_cluster_offset(
>> bs, extent, NULL,
>> sector_num << 9, 0, &cluster_offset);
>> - index_in_cluster = sector_num % extent->cluster_sectors;
>> + extent_begin_sector = extent->end_sector - extent->sectors;
>> + extent_relative_sector_num = sector_num - extent_begin_sector;
>> + index_in_cluster = extent_relative_sector_num %
>> extent->cluster_sectors;
>> n = extent->cluster_sectors - index_in_cluster;
>> if (n > nb_sectors) {
>> n = nb_sectors;
>> @@ -1119,11 +1227,11 @@ static int vmdk_read(BlockDriverState *bs, int64_t
>> sector_num,
>> return ret;
>> }
>> } else {
>> - memset(buf, 0, 512 * n);
>> + memset(buf, 0, n << 9);
>> }
>> } else {
>> ret = vmdk_read_extent(extent,
>> - cluster_offset, index_in_cluster * 512,
>> + cluster_offset, index_in_cluster << 9,
>> buf, n);
>> if (ret) {
>> return ret;
>> @@ -1131,7 +1239,7 @@ static int vmdk_read(BlockDriverState *bs, int64_t
>> sector_num,
>> }
>> nb_sectors -= n;
>> sector_num += n;
>> - buf += n * 512;
>> + buf += n << 9;
>> }
>> return 0;
>> }
>> @@ -1154,6 +1262,7 @@ static int vmdk_write(BlockDriverState *bs, int64_t
>> sector_num,
>> VmdkExtent *extent = NULL;
>> int n, ret;
>> int64_t index_in_cluster;
>> + uint64_t extent_begin_sector, extent_relative_sector_num;
>> uint64_t cluster_offset;
>> VmdkMetaData m_data;
>>
>> @@ -1196,7 +1305,9 @@ static int vmdk_write(BlockDriverState *bs, int64_t
>> sector_num,
>> if (ret) {
>> return -EINVAL;
>> }
>> - index_in_cluster = sector_num % extent->cluster_sectors;
>> + extent_begin_sector = extent->end_sector - extent->sectors;
>> + extent_relative_sector_num = sector_num - extent_begin_sector;
>> + index_in_cluster = extent_relative_sector_num %
>> extent->cluster_sectors;
>> n = extent->cluster_sectors - index_in_cluster;
>> if (n > nb_sectors) {
>> n = nb_sectors;
>> --
>> 1.7.11.7
>>
next prev parent reply other threads:[~2012-11-09 6:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-08 19:05 [Qemu-devel] [PATCH] block.c, block/vmdk.c: Fixed major bug in VMDK WRITE and READ handling - FIXES DATA CORRUPTION Gerhard Wiesinger
2012-11-09 5:12 ` Fam Zheng
2012-11-09 6:51 ` Gerhard Wiesinger [this message]
2012-11-09 7:38 ` Lei Li
2012-11-09 8:26 ` Paolo Bonzini
2012-11-10 8:30 ` Gerhard Wiesinger
2012-11-10 8:55 ` Paolo Bonzini
2012-11-10 9:59 ` Gerhard Wiesinger
2012-11-13 12:27 ` Kevin Wolf
2012-11-13 13:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=509CA7F9.7020400@wiesinger.com \
--to=lists@wiesinger.com \
--cc=famcool@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).