From: Anthony Liguori <anthony@codemonkey.ws>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
Date: Sat, 14 Feb 2009 17:13:49 -0600 [thread overview]
Message-ID: <4997502D.1080401@codemonkey.ws> (raw)
In-Reply-To: <20090213190419.GB20328@shareable.org>
Jamie Lokier wrote:
> Anthony Liguori wrote:
>
>>> Simply reverting the qcow2 code appears to fix those problems, so it
>>> needn't hold up cutting a release. That's what I recommend.
>>>
>> Send some patches.
>>
>
> I did already.
>
> Here it is again. This should fix my bug and Marc's bug according to
> his report that reverting qcow2.c fixes it.
>
Well such a large reversion is a bad idea. Can you git bisect to the
actual changeset that introduced the bug you see?
You're effectively reverting a very large number of changes whereas only
one is likely causing your problem
Regards,
Anthony Liguori
> -- Jamie
>
>
> Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports
>
> This fixes two kinds of qcow2 corruption observed in kvm-83 (actually
> kvm-73 and later), from three bug reports.
>
>
> Bug 1: Windows 2000 guests complain of corrupt registry.
>
> Many Windows 2000 guests which boot and runs fine in kvm-72, fail with
> a blue-screen indicating file corruption errors in kvm-73 through to
> kvm-83 (the latest), and succeed if we replace block-qcow2.c with the
> version from kvm-72.
>
> The blue screen appears towards the end of the boot sequence, and
> shows only briefly before rebooting. It says:
>
> STOP: c0000218 (Registry File Failure)
> The registry cannot load the hive (file):
> \SystemRoot\System32\Config\SOFTWARE
> or its log or alternate.
> It is corrupt, absent, or not writable.
>
> Beginning dump of physical memory
> Physical memory dump complete. Contact your system administrator or
> technical support [...?]
>
> This is narrowed down to the difference in block-qcow2.c between
> kvm-72 and kvm-73 (not -83). From kvm-73 to kvm-83, there have been
> more changes block-qcow2.c, but the observed corruption still occurs.
>
> The bug isn't evident when only reading. When using "qemu-img
> convert" to convert a qcow2 file to a raw file, with broken and fixed
> versions of block-qcow2.c it produces the same raw file. Also, when
> using "-snapshot" with qemu, the blue screen doesn't occur.
>
> This bug was observed by Jamie Lokier <jamie@shareable.org> and
> confirmed for multiple Windows 2000 guests by
> Marc Bevand <m.bevand@gmail.com>.
>
>
> Bug 2: Windows 2003 guests complain of corrupt registry.
>
> According to
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
>
> Windows 2003 32-bit guests randomly spew disk corruption messages
> like this:
>
> Windows – Registry Hive Recovered
> Registry hive (file): SOFTWARE was corrupted and it has
> been recovered. Some data might have been lost.
>
> and
>
> The system cannot log on due to the following error:
> Unable to complete the requested operation because of
> either a catastrophic media failure or a data structure
> corruption on the disk.
>
> This bug was reported by <gerdwachs@users.sourceforge.net> and
> confirmed by Marc Bevand, noting:
>
> kvm-73+ also causes some of my Windows 2003 guests to exhibit this
> exact registry corruption error. [...] This bug is also fixed by
> reverting block-qcow2.c to the version from kvm-72.
>
> Worryingly, gerdwachs' bug report says it's for kvm-70, implying this
> patch may not fix all the Windows 2003 guest corruption problems.
>
> At least Marc says his observed problem goes away with kvm-72's qcow2.
>
>
> Bug 3: Corruption of qcow2 index rendering the file unusable.
>
> Marc Bevand writes:
>
> I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because
> of the qcow2 performance regression caused by the default writethrough
> caching policy) but it randomly triggers an even worse bug: the moment
> I shut down a guest by typing "quit" in the monitor, it sometimes
> overwrite the first 4kB of the disk image with mostly NUL bytes (!)
> which completely destroys it. I am familiar with the qcow2 format and
> apparently this 4kB block seems to be an L2 table with most entries
> set to zero. I have had to restore at least 6 or 7 disk images from
> backup after occurences of that bug. My intuition tells me this may be
> the qcow2 code trying to allocate a cluster to write a new L2 table,
> but not noticing the allocation failed (represented by a 0 offset),
> and writing the L2 table at that 0 offset, overwriting the qcow2
> header.
>
> Fortunately this bug is also fixed by running kvm-75 with
> block-qcow2.c reverted to its kvm-72 version.
>
> Basically qcow2 in kvm-73 or newer is completely unreliable.
>
>
> Reverting block-qcow2.c to the version in kvm-72 appears to fix the
> corruption symptoms reported by Marc and Jamie, although gerdwachs'
> related bug is against kvm-70 so it may not fix that.
>
> Unfortunately this reverts some optimisations, but fixing corruption
> is more important until the new code is reliable.
>
> This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72,
> except the "cache=writeback" default performance tweak is retained and
> there's no need to define "offsetof".
>
> Signed-Off-By: Jamie Lokier <jamie@shareable.org>
>
>
> --- kvm-83-real/qemu/block-qcow2.c 2009-01-13 13:29:42.000000000 +0000
> +++ kvm-83/qemu/block-qcow2.c 2009-02-13 18:51:12.000000000 +0000
> @@ -52,8 +52,6 @@
> #define QCOW_CRYPT_NONE 0
> #define QCOW_CRYPT_AES 1
>
> -#define QCOW_MAX_CRYPT_CLUSTERS 32
> -
> /* indicate that the refcount of the referenced cluster is exactly one. */
> #define QCOW_OFLAG_COPIED (1LL << 63)
> /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -269,8 +267,7 @@
> if (!s->cluster_cache)
> goto fail;
> /* one more sector for decompressed data alignment */
> - s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
> - + 512);
> + s->cluster_data = qemu_malloc(s->cluster_size + 512);
> if (!s->cluster_data)
> goto fail;
> s->cluster_cache_offset = -1;
> @@ -437,7 +434,8 @@
> int new_l1_size, new_l1_size2, ret, i;
> uint64_t *new_l1_table;
> uint64_t new_l1_table_offset;
> - uint8_t data[12];
> + uint64_t data64;
> + uint32_t data32;
>
> new_l1_size = s->l1_size;
> if (min_size <= new_l1_size)
> @@ -467,10 +465,13 @@
> new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
>
> /* set new table */
> - cpu_to_be32w((uint32_t*)data, new_l1_size);
> - cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
> - if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,
> - sizeof(data)) != sizeof(data))
> + data64 = cpu_to_be64(new_l1_table_offset);
> + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset),
> + &data64, sizeof(data64)) != sizeof(data64))
> + goto fail;
> + data32 = cpu_to_be32(new_l1_size);
> + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size),
> + &data32, sizeof(data32)) != sizeof(data32))
> goto fail;
> qemu_free(s->l1_table);
> free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
> @@ -483,549 +484,169 @@
> return -EIO;
> }
>
> -/*
> - * seek_l2_table
> +/* 'allocate' is:
> *
> - * seek l2_offset in the l2_cache table
> - * if not found, return NULL,
> - * if found,
> - * increments the l2 cache hit count of the entry,
> - * if counter overflow, divide by two all counters
> - * return the pointer to the l2 cache entry
> + * 0 not to allocate.
> *
> - */
> -
> -static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
> -{
> - int i, j;
> -
> - for(i = 0; i < L2_CACHE_SIZE; i++) {
> - if (l2_offset == s->l2_cache_offsets[i]) {
> - /* increment the hit count */
> - if (++s->l2_cache_counts[i] == 0xffffffff) {
> - for(j = 0; j < L2_CACHE_SIZE; j++) {
> - s->l2_cache_counts[j] >>= 1;
> - }
> - }
> - return s->l2_cache + (i << s->l2_bits);
> - }
> - }
> - return NULL;
> -}
> -
> -/*
> - * l2_load
> + * 1 to allocate a normal cluster (for sector indexes 'n_start' to
> + * 'n_end')
> *
> - * Loads a L2 table into memory. If the table is in the cache, the cache
> - * is used; otherwise the L2 table is loaded from the image file.
> + * 2 to allocate a compressed cluster of size
> + * 'compressed_size'. 'compressed_size' must be > 0 and <
> + * cluster_size
> *
> - * Returns a pointer to the L2 table on success, or NULL if the read from
> - * the image file failed.
> + * return 0 if not allocated.
> */
> -
> -static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int min_index;
> - uint64_t *l2_table;
> -
> - /* seek if the table for the given offset is in the cache */
> -
> - l2_table = seek_l2_table(s, l2_offset);
> - if (l2_table != NULL)
> - return l2_table;
> -
> - /* not found: load a new entry in the least used one */
> -
> - min_index = l2_cache_new_entry(bs);
> - l2_table = s->l2_cache + (min_index << s->l2_bits);
> - if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> - s->l2_size * sizeof(uint64_t))
> - return NULL;
> - s->l2_cache_offsets[min_index] = l2_offset;
> - s->l2_cache_counts[min_index] = 1;
> -
> - return l2_table;
> -}
> -
> -/*
> - * l2_allocate
> - *
> - * Allocate a new l2 entry in the file. If l1_index points to an already
> - * used entry in the L2 table (i.e. we are doing a copy on write for the L2
> - * table) copy the contents of the old L2 table into the newly allocated one.
> - * Otherwise the new table is initialized with zeros.
> - *
> - */
> -
> -static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int min_index;
> - uint64_t old_l2_offset, tmp;
> - uint64_t *l2_table, l2_offset;
> -
> - old_l2_offset = s->l1_table[l1_index];
> -
> - /* allocate a new l2 entry */
> -
> - l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> -
> - /* update the L1 entry */
> -
> - s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> -
> - tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> - if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> - &tmp, sizeof(tmp)) != sizeof(tmp))
> - return NULL;
> -
> - /* allocate a new entry in the l2 cache */
> -
> - min_index = l2_cache_new_entry(bs);
> - l2_table = s->l2_cache + (min_index << s->l2_bits);
> -
> - if (old_l2_offset == 0) {
> - /* if there was no old l2 table, clear the new table */
> - memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> - } else {
> - /* if there was an old l2 table, read it from the disk */
> - if (bdrv_pread(s->hd, old_l2_offset,
> - l2_table, s->l2_size * sizeof(uint64_t)) !=
> - s->l2_size * sizeof(uint64_t))
> - return NULL;
> - }
> - /* write the l2 table to the file */
> - if (bdrv_pwrite(s->hd, l2_offset,
> - l2_table, s->l2_size * sizeof(uint64_t)) !=
> - s->l2_size * sizeof(uint64_t))
> - return NULL;
> -
> - /* update the l2 cache entry */
> -
> - s->l2_cache_offsets[min_index] = l2_offset;
> - s->l2_cache_counts[min_index] = 1;
> -
> - return l2_table;
> -}
> -
> -static int size_to_clusters(BDRVQcowState *s, int64_t size)
> -{
> - return (size + (s->cluster_size - 1)) >> s->cluster_bits;
> -}
> -
> -static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
> - uint64_t *l2_table, uint64_t start, uint64_t mask)
> -{
> - int i;
> - uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask;
> -
> - if (!offset)
> - return 0;
> -
> - for (i = start; i < start + nb_clusters; i++)
> - if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask))
> - break;
> -
> - return (i - start);
> -}
> -
> -static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
> -{
> - int i = 0;
> -
> - while(nb_clusters-- && l2_table[i] == 0)
> - i++;
> -
> - return i;
> -}
> -
> -/*
> - * get_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * on entry, *num is the number of contiguous clusters we'd like to
> - * access following offset.
> - *
> - * on exit, *num is the number of contiguous clusters we can read.
> - *
> - * Return 1, if the offset is found
> - * Return 0, otherwise.
> - *
> - */
> -
> static uint64_t get_cluster_offset(BlockDriverState *bs,
> - uint64_t offset, int *num)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int l1_index, l2_index;
> - uint64_t l2_offset, *l2_table, cluster_offset;
> - int l1_bits, c;
> - int index_in_cluster, nb_available, nb_needed, nb_clusters;
> -
> - index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> - nb_needed = *num + index_in_cluster;
> -
> - l1_bits = s->l2_bits + s->cluster_bits;
> -
> - /* compute how many bytes there are between the offset and
> - * the end of the l1 entry
> - */
> -
> - nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> -
> - /* compute the number of available sectors */
> -
> - nb_available = (nb_available >> 9) + index_in_cluster;
> -
> - cluster_offset = 0;
> -
> - /* seek the the l2 offset in the l1 table */
> -
> - l1_index = offset >> l1_bits;
> - if (l1_index >= s->l1_size)
> - goto out;
> -
> - l2_offset = s->l1_table[l1_index];
> -
> - /* seek the l2 table of the given l2 offset */
> -
> - if (!l2_offset)
> - goto out;
> -
> - /* load the l2 table in memory */
> -
> - l2_offset &= ~QCOW_OFLAG_COPIED;
> - l2_table = l2_load(bs, l2_offset);
> - if (l2_table == NULL)
> - return 0;
> -
> - /* find the cluster offset for the given disk offset */
> -
> - l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> - cluster_offset = be64_to_cpu(l2_table[l2_index]);
> - nb_clusters = size_to_clusters(s, nb_needed << 9);
> -
> - if (!cluster_offset) {
> - /* how many empty clusters ? */
> - c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
> - } else {
> - /* how many allocated clusters ? */
> - c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> - &l2_table[l2_index], 0, QCOW_OFLAG_COPIED);
> - }
> -
> - nb_available = (c * s->cluster_sectors);
> -out:
> - if (nb_available > nb_needed)
> - nb_available = nb_needed;
> -
> - *num = nb_available - index_in_cluster;
> -
> - return cluster_offset & ~QCOW_OFLAG_COPIED;
> -}
> -
> -/*
> - * free_any_clusters
> - *
> - * free clusters according to its type: compressed or not
> - *
> - */
> -
> -static void free_any_clusters(BlockDriverState *bs,
> - uint64_t cluster_offset, int nb_clusters)
> -{
> - BDRVQcowState *s = bs->opaque;
> -
> - /* free the cluster */
> -
> - if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> - int nb_csectors;
> - nb_csectors = ((cluster_offset >> s->csize_shift) &
> - s->csize_mask) + 1;
> - free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> - nb_csectors * 512);
> - return;
> - }
> -
> - free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits);
> -
> - return;
> -}
> -
> -/*
> - * get_cluster_table
> - *
> - * for a given disk offset, load (and allocate if needed)
> - * the l2 table.
> - *
> - * the l2 table offset in the qcow2 file and the cluster index
> - * in the l2 table are given to the caller.
> - *
> - */
> -
> -static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
> - uint64_t **new_l2_table,
> - uint64_t *new_l2_offset,
> - int *new_l2_index)
> + uint64_t offset, int allocate,
> + int compressed_size,
> + int n_start, int n_end)
> {
> BDRVQcowState *s = bs->opaque;
> - int l1_index, l2_index, ret;
> - uint64_t l2_offset, *l2_table;
> -
> - /* seek the the l2 offset in the l1 table */
> + int min_index, i, j, l1_index, l2_index, ret;
> + uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
>
> l1_index = offset >> (s->l2_bits + s->cluster_bits);
> if (l1_index >= s->l1_size) {
> - ret = grow_l1_table(bs, l1_index + 1);
> - if (ret < 0)
> + /* outside l1 table is allowed: we grow the table if needed */
> + if (!allocate)
> + return 0;
> + if (grow_l1_table(bs, l1_index + 1) < 0)
> return 0;
> }
> l2_offset = s->l1_table[l1_index];
> + if (!l2_offset) {
> + if (!allocate)
> + return 0;
> + l2_allocate:
> + old_l2_offset = l2_offset;
> + /* allocate a new l2 entry */
> + l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> + /* update the L1 entry */
> + s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> + tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> + if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> + &tmp, sizeof(tmp)) != sizeof(tmp))
> + return 0;
> + min_index = l2_cache_new_entry(bs);
> + l2_table = s->l2_cache + (min_index << s->l2_bits);
>
> - /* seek the l2 table of the given l2 offset */
> -
> - if (l2_offset & QCOW_OFLAG_COPIED) {
> - /* load the l2 table in memory */
> - l2_offset &= ~QCOW_OFLAG_COPIED;
> - l2_table = l2_load(bs, l2_offset);
> - if (l2_table == NULL)
> + if (old_l2_offset == 0) {
> + memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> + } else {
> + if (bdrv_pread(s->hd, old_l2_offset,
> + l2_table, s->l2_size * sizeof(uint64_t)) !=
> + s->l2_size * sizeof(uint64_t))
> + return 0;
> + }
> + if (bdrv_pwrite(s->hd, l2_offset,
> + l2_table, s->l2_size * sizeof(uint64_t)) !=
> + s->l2_size * sizeof(uint64_t))
> return 0;
> } else {
> - if (l2_offset)
> - free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> - l2_table = l2_allocate(bs, l1_index);
> - if (l2_table == NULL)
> + if (!(l2_offset & QCOW_OFLAG_COPIED)) {
> + if (allocate) {
> + free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> + goto l2_allocate;
> + }
> + } else {
> + l2_offset &= ~QCOW_OFLAG_COPIED;
> + }
> + for(i = 0; i < L2_CACHE_SIZE; i++) {
> + if (l2_offset == s->l2_cache_offsets[i]) {
> + /* increment the hit count */
> + if (++s->l2_cache_counts[i] == 0xffffffff) {
> + for(j = 0; j < L2_CACHE_SIZE; j++) {
> + s->l2_cache_counts[j] >>= 1;
> + }
> + }
> + l2_table = s->l2_cache + (i << s->l2_bits);
> + goto found;
> + }
> + }
> + /* not found: load a new entry in the least used one */
> + min_index = l2_cache_new_entry(bs);
> + l2_table = s->l2_cache + (min_index << s->l2_bits);
> + if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> + s->l2_size * sizeof(uint64_t))
> return 0;
> - l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
> }
> -
> - /* find the cluster offset for the given disk offset */
> -
> + s->l2_cache_offsets[min_index] = l2_offset;
> + s->l2_cache_counts[min_index] = 1;
> + found:
> l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> -
> - *new_l2_table = l2_table;
> - *new_l2_offset = l2_offset;
> - *new_l2_index = l2_index;
> -
> - return 1;
> -}
> -
> -/*
> - * alloc_compressed_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new compressed cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
> - uint64_t offset,
> - int compressed_size)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int l2_index, ret;
> - uint64_t l2_offset, *l2_table, cluster_offset;
> - int nb_csectors;
> -
> - ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> - if (ret == 0)
> - return 0;
> -
> cluster_offset = be64_to_cpu(l2_table[l2_index]);
> - if (cluster_offset & QCOW_OFLAG_COPIED)
> - return cluster_offset & ~QCOW_OFLAG_COPIED;
> -
> - if (cluster_offset)
> - free_any_clusters(bs, cluster_offset, 1);
> -
> - cluster_offset = alloc_bytes(bs, compressed_size);
> - nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> - (cluster_offset >> 9);
> -
> - cluster_offset |= QCOW_OFLAG_COMPRESSED |
> - ((uint64_t)nb_csectors << s->csize_shift);
> -
> - /* update L2 table */
> -
> - /* compressed clusters never have the copied flag */
> -
> - l2_table[l2_index] = cpu_to_be64(cluster_offset);
> - if (bdrv_pwrite(s->hd,
> - l2_offset + l2_index * sizeof(uint64_t),
> - l2_table + l2_index,
> - sizeof(uint64_t)) != sizeof(uint64_t))
> - return 0;
> -
> - return cluster_offset;
> -}
> -
> -typedef struct QCowL2Meta
> -{
> - uint64_t offset;
> - int n_start;
> - int nb_available;
> - int nb_clusters;
> -} QCowL2Meta;
> -
> -static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
> - QCowL2Meta *m)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int i, j = 0, l2_index, ret;
> - uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
> -
> - if (m->nb_clusters == 0)
> - return 0;
> -
> - if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
> - return -ENOMEM;
> -
> - /* copy content of unmodified sectors */
> - start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
> - if (m->n_start) {
> - ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
> - if (ret < 0)
> - goto err;
> + if (!cluster_offset) {
> + if (!allocate)
> + return cluster_offset;
> + } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
> + if (!allocate)
> + return cluster_offset;
> + /* free the cluster */
> + if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> + int nb_csectors;
> + nb_csectors = ((cluster_offset >> s->csize_shift) &
> + s->csize_mask) + 1;
> + free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> + nb_csectors * 512);
> + } else {
> + free_clusters(bs, cluster_offset, s->cluster_size);
> + }
> + } else {
> + cluster_offset &= ~QCOW_OFLAG_COPIED;
> + return cluster_offset;
> }
> -
> - if (m->nb_available & (s->cluster_sectors - 1)) {
> - uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> - ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
> - m->nb_available - end, s->cluster_sectors);
> - if (ret < 0)
> - goto err;
> + if (allocate == 1) {
> + /* allocate a new cluster */
> + cluster_offset = alloc_clusters(bs, s->cluster_size);
> +
> + /* we must initialize the cluster content which won't be
> + written */
> + if ((n_end - n_start) < s->cluster_sectors) {
> + uint64_t start_sect;
> +
> + start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> + ret = copy_sectors(bs, start_sect,
> + cluster_offset, 0, n_start);
> + if (ret < 0)
> + return 0;
> + ret = copy_sectors(bs, start_sect,
> + cluster_offset, n_end, s->cluster_sectors);
> + if (ret < 0)
> + return 0;
> + }
> + tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> + } else {
> + int nb_csectors;
> + cluster_offset = alloc_bytes(bs, compressed_size);
> + nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> + (cluster_offset >> 9);
> + cluster_offset |= QCOW_OFLAG_COMPRESSED |
> + ((uint64_t)nb_csectors << s->csize_shift);
> + /* compressed clusters never have the copied flag */
> + tmp = cpu_to_be64(cluster_offset);
> }
> -
> - ret = -EIO;
> /* update L2 table */
> - if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index))
> - goto err;
> -
> - for (i = 0; i < m->nb_clusters; i++) {
> - if(l2_table[l2_index + i] != 0)
> - old_cluster[j++] = l2_table[l2_index + i];
> -
> - l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> - (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
> - }
> -
> - if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
> - l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) !=
> - m->nb_clusters * sizeof(uint64_t))
> - goto err;
> -
> - for (i = 0; i < j; i++)
> - free_any_clusters(bs, old_cluster[i], 1);
> -
> - ret = 0;
> -err:
> - qemu_free(old_cluster);
> - return ret;
> - }
> -
> -/*
> - * alloc_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> - uint64_t offset,
> - int n_start, int n_end,
> - int *num, QCowL2Meta *m)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int l2_index, ret;
> - uint64_t l2_offset, *l2_table, cluster_offset;
> - int nb_clusters, i = 0;
> -
> - ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> - if (ret == 0)
> + l2_table[l2_index] = tmp;
> + if (bdrv_pwrite(s->hd,
> + l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
> return 0;
> -
> - nb_clusters = size_to_clusters(s, n_end << 9);
> -
> - nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
> -
> - cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -
> - /* We keep all QCOW_OFLAG_COPIED clusters */
> -
> - if (cluster_offset & QCOW_OFLAG_COPIED) {
> - nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size,
> - &l2_table[l2_index], 0, 0);
> -
> - cluster_offset &= ~QCOW_OFLAG_COPIED;
> - m->nb_clusters = 0;
> -
> - goto out;
> - }
> -
> - /* for the moment, multiple compressed clusters are not managed */
> -
> - if (cluster_offset & QCOW_OFLAG_COMPRESSED)
> - nb_clusters = 1;
> -
> - /* how many available clusters ? */
> -
> - while (i < nb_clusters) {
> - i += count_contiguous_clusters(nb_clusters - i, s->cluster_size,
> - &l2_table[l2_index], i, 0);
> -
> - if(be64_to_cpu(l2_table[l2_index + i]))
> - break;
> -
> - i += count_contiguous_free_clusters(nb_clusters - i,
> - &l2_table[l2_index + i]);
> -
> - cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
> -
> - if ((cluster_offset & QCOW_OFLAG_COPIED) ||
> - (cluster_offset & QCOW_OFLAG_COMPRESSED))
> - break;
> - }
> - nb_clusters = i;
> -
> - /* allocate a new cluster */
> -
> - cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> -
> - /* save info needed for meta data update */
> - m->offset = offset;
> - m->n_start = n_start;
> - m->nb_clusters = nb_clusters;
> -
> -out:
> - m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
> -
> - *num = m->nb_available - n_start;
> -
> return cluster_offset;
> }
>
> static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
> int nb_sectors, int *pnum)
> {
> + BDRVQcowState *s = bs->opaque;
> + int index_in_cluster, n;
> uint64_t cluster_offset;
>
> - *pnum = nb_sectors;
> - cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> -
> + cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
> + index_in_cluster = sector_num & (s->cluster_sectors - 1);
> + n = s->cluster_sectors - index_in_cluster;
> + if (n > nb_sectors)
> + n = nb_sectors;
> + *pnum = n;
> return (cluster_offset != 0);
> }
>
> @@ -1102,9 +723,11 @@
> uint64_t cluster_offset;
>
> while (nb_sectors > 0) {
> - n = nb_sectors;
> - cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> + cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
> index_in_cluster = sector_num & (s->cluster_sectors - 1);
> + n = s->cluster_sectors - index_in_cluster;
> + if (n > nb_sectors)
> + n = nb_sectors;
> if (!cluster_offset) {
> if (bs->backing_hd) {
> /* read from the base image */
> @@ -1143,18 +766,15 @@
> BDRVQcowState *s = bs->opaque;
> int ret, index_in_cluster, n;
> uint64_t cluster_offset;
> - int n_end;
> - QCowL2Meta l2meta;
>
> while (nb_sectors > 0) {
> index_in_cluster = sector_num & (s->cluster_sectors - 1);
> - n_end = index_in_cluster + nb_sectors;
> - if (s->crypt_method &&
> - n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> - n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> - cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> - index_in_cluster,
> - n_end, &n, &l2meta);
> + n = s->cluster_sectors - index_in_cluster;
> + if (n > nb_sectors)
> + n = nb_sectors;
> + cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
> + index_in_cluster,
> + index_in_cluster + n);
> if (!cluster_offset)
> return -1;
> if (s->crypt_method) {
> @@ -1165,10 +785,8 @@
> } else {
> ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
> }
> - if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
> - free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
> + if (ret != n * 512)
> return -1;
> - }
> nb_sectors -= n;
> sector_num += n;
> buf += n * 512;
> @@ -1186,33 +804,8 @@
> uint64_t cluster_offset;
> uint8_t *cluster_data;
> BlockDriverAIOCB *hd_aiocb;
> - QEMUBH *bh;
> - QCowL2Meta l2meta;
> } QCowAIOCB;
>
> -static void qcow_aio_read_cb(void *opaque, int ret);
> -static void qcow_aio_read_bh(void *opaque)
> -{
> - QCowAIOCB *acb = opaque;
> - qemu_bh_delete(acb->bh);
> - acb->bh = NULL;
> - qcow_aio_read_cb(opaque, 0);
> -}
> -
> -static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
> -{
> - if (acb->bh)
> - return -EIO;
> -
> - acb->bh = qemu_bh_new(cb, acb);
> - if (!acb->bh)
> - return -EIO;
> -
> - qemu_bh_schedule(acb->bh);
> -
> - return 0;
> -}
> -
> static void qcow_aio_read_cb(void *opaque, int ret)
> {
> QCowAIOCB *acb = opaque;
> @@ -1222,12 +815,13 @@
>
> acb->hd_aiocb = NULL;
> if (ret < 0) {
> -fail:
> + fail:
> acb->common.cb(acb->common.opaque, ret);
> qemu_aio_release(acb);
> return;
> }
>
> + redo:
> /* post process the read buffer */
> if (!acb->cluster_offset) {
> /* nothing to do */
> @@ -1253,9 +847,12 @@
> }
>
> /* prepare next AIO request */
> - acb->n = acb->nb_sectors;
> - acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
> + acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
> + 0, 0, 0, 0);
> index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> + acb->n = s->cluster_sectors - index_in_cluster;
> + if (acb->n > acb->nb_sectors)
> + acb->n = acb->nb_sectors;
>
> if (!acb->cluster_offset) {
> if (bs->backing_hd) {
> @@ -1268,16 +865,12 @@
> if (acb->hd_aiocb == NULL)
> goto fail;
> } else {
> - ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> - if (ret < 0)
> - goto fail;
> + goto redo;
> }
> } else {
> /* Note: in this case, no need to wait */
> memset(acb->buf, 0, 512 * acb->n);
> - ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> - if (ret < 0)
> - goto fail;
> + goto redo;
> }
> } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
> /* add AIO support for compressed blocks ? */
> @@ -1285,9 +878,7 @@
> goto fail;
> memcpy(acb->buf,
> s->cluster_cache + index_in_cluster * 512, 512 * acb->n);
> - ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> - if (ret < 0)
> - goto fail;
> + goto redo;
> } else {
> if ((acb->cluster_offset & 511) != 0) {
> ret = -EIO;
> @@ -1316,7 +907,6 @@
> acb->nb_sectors = nb_sectors;
> acb->n = 0;
> acb->cluster_offset = 0;
> - acb->l2meta.nb_clusters = 0;
> return acb;
> }
>
> @@ -1340,8 +930,8 @@
> BlockDriverState *bs = acb->common.bs;
> BDRVQcowState *s = bs->opaque;
> int index_in_cluster;
> + uint64_t cluster_offset;
> const uint8_t *src_buf;
> - int n_end;
>
> acb->hd_aiocb = NULL;
>
> @@ -1352,11 +942,6 @@
> return;
> }
>
> - if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) {
> - free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters);
> - goto fail;
> - }
> -
> acb->nb_sectors -= acb->n;
> acb->sector_num += acb->n;
> acb->buf += acb->n * 512;
> @@ -1369,22 +954,19 @@
> }
>
> index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> - n_end = index_in_cluster + acb->nb_sectors;
> - if (s->crypt_method &&
> - n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> - n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> -
> - acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> - index_in_cluster,
> - n_end, &acb->n, &acb->l2meta);
> - if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) {
> + acb->n = s->cluster_sectors - index_in_cluster;
> + if (acb->n > acb->nb_sectors)
> + acb->n = acb->nb_sectors;
> + cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
> + index_in_cluster,
> + index_in_cluster + acb->n);
> + if (!cluster_offset || (cluster_offset & 511) != 0) {
> ret = -EIO;
> goto fail;
> }
> if (s->crypt_method) {
> if (!acb->cluster_data) {
> - acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
> - s->cluster_size);
> + acb->cluster_data = qemu_mallocz(s->cluster_size);
> if (!acb->cluster_data) {
> ret = -ENOMEM;
> goto fail;
> @@ -1397,7 +979,7 @@
> src_buf = acb->buf;
> }
> acb->hd_aiocb = bdrv_aio_write(s->hd,
> - (acb->cluster_offset >> 9) + index_in_cluster,
> + (cluster_offset >> 9) + index_in_cluster,
> src_buf, acb->n,
> qcow_aio_write_cb, acb);
> if (acb->hd_aiocb == NULL)
> @@ -1571,7 +1153,7 @@
>
> memset(s->l1_table, 0, l1_length);
> if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
> - return -1;
> + return -1;
> ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
> if (ret < 0)
> return ret;
> @@ -1637,10 +1219,8 @@
> /* could not compress: write normal cluster */
> qcow_write(bs, sector_num, buf, s->cluster_sectors);
> } else {
> - cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
> - out_len);
> - if (!cluster_offset)
> - return -1;
> + cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
> + out_len, 0, 0);
> cluster_offset &= s->cluster_offset_mask;
> if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
> qemu_free(out_buf);
> @@ -2225,19 +1805,26 @@
> BDRVQcowState *s = bs->opaque;
> int i, nb_clusters;
>
> - nb_clusters = size_to_clusters(s, size);
> -retry:
> - for(i = 0; i < nb_clusters; i++) {
> - int64_t i = s->free_cluster_index++;
> - if (get_refcount(bs, i) != 0)
> - goto retry;
> - }
> + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
> + for(;;) {
> + if (get_refcount(bs, s->free_cluster_index) == 0) {
> + s->free_cluster_index++;
> + for(i = 1; i < nb_clusters; i++) {
> + if (get_refcount(bs, s->free_cluster_index) != 0)
> + goto not_found;
> + s->free_cluster_index++;
> + }
> #ifdef DEBUG_ALLOC2
> - printf("alloc_clusters: size=%lld -> %lld\n",
> - size,
> - (s->free_cluster_index - nb_clusters) << s->cluster_bits);
> + printf("alloc_clusters: size=%lld -> %lld\n",
> + size,
> + (s->free_cluster_index - nb_clusters) << s->cluster_bits);
> #endif
> - return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> + return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> + } else {
> + not_found:
> + s->free_cluster_index++;
> + }
> + }
> }
>
> static int64_t alloc_clusters(BlockDriverState *bs, int64_t size)
> @@ -2301,7 +1888,8 @@
> int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
> uint64_t *new_table;
> int64_t table_offset;
> - uint8_t data[12];
> + uint64_t data64;
> + uint32_t data32;
> int old_table_size;
> int64_t old_table_offset;
>
> @@ -2340,10 +1928,13 @@
> for(i = 0; i < s->refcount_table_size; i++)
> be64_to_cpus(&new_table[i]);
>
> - cpu_to_be64w((uint64_t*)data, table_offset);
> - cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
> + data64 = cpu_to_be64(table_offset);
> if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
> - data, sizeof(data)) != sizeof(data))
> + &data64, sizeof(data64)) != sizeof(data64))
> + goto fail;
> + data32 = cpu_to_be32(refcount_table_clusters);
> + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters),
> + &data32, sizeof(data32)) != sizeof(data32))
> goto fail;
> qemu_free(s->refcount_table);
> old_table_offset = s->refcount_table_offset;
> @@ -2572,7 +2163,7 @@
> uint16_t *refcount_table;
>
> size = bdrv_getlength(s->hd);
> - nb_clusters = size_to_clusters(s, size);
> + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
> refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t));
>
> /* header */
> @@ -2624,7 +2215,7 @@
> int refcount;
>
> size = bdrv_getlength(s->hd);
> - nb_clusters = size_to_clusters(s, size);
> + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
> for(k = 0; k < nb_clusters;) {
> k1 = k;
> refcount = get_refcount(bs, k);
>
>
>
next prev parent reply other threads:[~2009-02-14 23:14 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
2009-02-03 20:58 ` Glauber Costa
2009-02-03 21:35 ` Laurent Desnogues
2009-02-03 21:50 ` Anthony Liguori
2009-02-03 22:05 ` Laurent Desnogues
2009-02-03 22:47 ` Anthony Liguori
2009-02-03 23:48 ` Glauber Costa
2009-02-04 13:09 ` Ulrich Hecht
2009-02-04 0:31 ` David Turner
[not found] ` <74222928-D24B-4780-BDB0-D537A83C4F68@hotmail.com>
2009-02-04 5:08 ` C.W. Betts
2009-02-03 21:48 ` Rick Vernam
2009-02-03 22:07 ` Daniel P. Berrange
2009-02-04 14:50 ` Aurelien Jarno
2009-02-04 15:23 ` Tristan Gingold
2009-02-04 15:43 ` Lennart Sorensen
2009-02-04 16:01 ` Tristan Gingold
2009-02-04 18:17 ` [Qemu-devel] " Consul
2009-02-04 17:39 ` [Qemu-devel] " Blue Swirl
2009-02-04 17:50 ` Jonathan Kalbfeld
2009-02-04 20:07 ` Blue Swirl
2009-02-07 14:15 ` Stuart Brady
2009-02-04 15:58 ` Glauber Costa
2009-02-07 15:29 ` Shin-ichiro KAWASAKI
2009-02-11 21:49 ` Rob Landley
2009-02-12 14:44 ` Shin-ichiro KAWASAKI
2009-02-12 21:08 ` Rob Landley
2009-02-12 21:44 ` Rob Landley
2009-02-09 12:43 ` Mark McLoughlin
2009-02-09 21:36 ` Anthony Liguori
2009-02-10 0:47 ` Rob Landley
2009-02-10 7:22 ` M. Warner Losh
2009-02-13 8:40 ` Riku Voipio
2009-02-13 9:59 ` Stefano Stabellini
2009-02-13 16:30 ` Jamie Lokier
2009-02-13 17:00 ` Anthony Liguori
2009-02-13 19:04 ` [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports Jamie Lokier
2009-02-14 22:23 ` Dor Laor
2009-02-15 2:20 ` Jamie Lokier
2009-02-14 23:13 ` Anthony Liguori [this message]
2009-02-15 2:01 ` Jamie Lokier
2009-02-15 4:09 ` Anthony Liguori
2009-02-15 15:42 ` Jamie Lokier
2009-02-15 18:19 ` Anthony Liguori
2009-02-15 18:34 ` Johannes Schindelin
2009-02-16 1:01 ` Anthony Liguori
2009-02-17 0:52 ` Jamie Lokier
2009-02-17 2:55 ` Anthony Liguori
2009-02-16 1:19 ` Anthony Liguori
2009-02-17 1:01 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4997502D.1080401@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).