From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LYTiI-000775-9u for qemu-devel@nongnu.org; Sat, 14 Feb 2009 18:14:22 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LYTiH-00076K-B2 for qemu-devel@nongnu.org; Sat, 14 Feb 2009 18:14:21 -0500 Received: from [199.232.76.173] (port=33508 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LYTiF-00076B-Fn for qemu-devel@nongnu.org; Sat, 14 Feb 2009 18:14:19 -0500 Received: from yw-out-1718.google.com ([74.125.46.153]:47351) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LYTiE-0002Bd-LQ for qemu-devel@nongnu.org; Sat, 14 Feb 2009 18:14:19 -0500 Received: by yw-out-1718.google.com with SMTP id 6so1696149ywa.82 for ; Sat, 14 Feb 2009 15:14:17 -0800 (PST) Message-ID: <4997502D.1080401@codemonkey.ws> Date: Sat, 14 Feb 2009 17:13:49 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports References: <4988AD96.6090308@codemonkey.ws> <20090213084023.GA1020@kos.to> <20090213163043.GJ18471@shareable.org> <4995A723.9010208@codemonkey.ws> <20090213190419.GB20328@shareable.org> In-Reply-To: <20090213190419.GB20328@shareable.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Jamie Lokier wrote: > Anthony Liguori wrote: > >>> Simply reverting the qcow2 code appears to fix those problems, so it >>> needn't hold up cutting a release. That's what I recommend. >>> >> Send some patches. >> > > I did already. > > Here it is again. This should fix my bug and Marc's bug according to > his report that reverting qcow2.c fixes it. > Well such a large reversion is a bad idea. Can you git bisect to the actual changeset that introduced the bug you see? You're effectively reverting a very large number of changes whereas only one is likely causing your problem Regards, Anthony Liguori > -- Jamie > > > Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports > > This fixes two kinds of qcow2 corruption observed in kvm-83 (actually > kvm-73 and later), from three bug reports. > > > Bug 1: Windows 2000 guests complain of corrupt registry. > > Many Windows 2000 guests which boot and runs fine in kvm-72, fail with > a blue-screen indicating file corruption errors in kvm-73 through to > kvm-83 (the latest), and succeed if we replace block-qcow2.c with the > version from kvm-72. > > The blue screen appears towards the end of the boot sequence, and > shows only briefly before rebooting. It says: > > STOP: c0000218 (Registry File Failure) > The registry cannot load the hive (file): > \SystemRoot\System32\Config\SOFTWARE > or its log or alternate. > It is corrupt, absent, or not writable. > > Beginning dump of physical memory > Physical memory dump complete. Contact your system administrator or > technical support [...?] > > This is narrowed down to the difference in block-qcow2.c between > kvm-72 and kvm-73 (not -83). From kvm-73 to kvm-83, there have been > more changes block-qcow2.c, but the observed corruption still occurs. > > The bug isn't evident when only reading. When using "qemu-img > convert" to convert a qcow2 file to a raw file, with broken and fixed > versions of block-qcow2.c it produces the same raw file. Also, when > using "-snapshot" with qemu, the blue screen doesn't occur. > > This bug was observed by Jamie Lokier and > confirmed for multiple Windows 2000 guests by > Marc Bevand . > > > Bug 2: Windows 2003 guests complain of corrupt registry. > > According to > http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599 > > Windows 2003 32-bit guests randomly spew disk corruption messages > like this: > > Windows – Registry Hive Recovered > Registry hive (file): SOFTWARE was corrupted and it has > been recovered. Some data might have been lost. > > and > > The system cannot log on due to the following error: > Unable to complete the requested operation because of > either a catastrophic media failure or a data structure > corruption on the disk. > > This bug was reported by and > confirmed by Marc Bevand, noting: > > kvm-73+ also causes some of my Windows 2003 guests to exhibit this > exact registry corruption error. [...] This bug is also fixed by > reverting block-qcow2.c to the version from kvm-72. > > Worryingly, gerdwachs' bug report says it's for kvm-70, implying this > patch may not fix all the Windows 2003 guest corruption problems. > > At least Marc says his observed problem goes away with kvm-72's qcow2. > > > Bug 3: Corruption of qcow2 index rendering the file unusable. > > Marc Bevand writes: > > I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because > of the qcow2 performance regression caused by the default writethrough > caching policy) but it randomly triggers an even worse bug: the moment > I shut down a guest by typing "quit" in the monitor, it sometimes > overwrite the first 4kB of the disk image with mostly NUL bytes (!) > which completely destroys it. I am familiar with the qcow2 format and > apparently this 4kB block seems to be an L2 table with most entries > set to zero. I have had to restore at least 6 or 7 disk images from > backup after occurences of that bug. My intuition tells me this may be > the qcow2 code trying to allocate a cluster to write a new L2 table, > but not noticing the allocation failed (represented by a 0 offset), > and writing the L2 table at that 0 offset, overwriting the qcow2 > header. > > Fortunately this bug is also fixed by running kvm-75 with > block-qcow2.c reverted to its kvm-72 version. > > Basically qcow2 in kvm-73 or newer is completely unreliable. > > > Reverting block-qcow2.c to the version in kvm-72 appears to fix the > corruption symptoms reported by Marc and Jamie, although gerdwachs' > related bug is against kvm-70 so it may not fix that. > > Unfortunately this reverts some optimisations, but fixing corruption > is more important until the new code is reliable. > > This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72, > except the "cache=writeback" default performance tweak is retained and > there's no need to define "offsetof". > > Signed-Off-By: Jamie Lokier > > > --- kvm-83-real/qemu/block-qcow2.c 2009-01-13 13:29:42.000000000 +0000 > +++ kvm-83/qemu/block-qcow2.c 2009-02-13 18:51:12.000000000 +0000 > @@ -52,8 +52,6 @@ > #define QCOW_CRYPT_NONE 0 > #define QCOW_CRYPT_AES 1 > > -#define QCOW_MAX_CRYPT_CLUSTERS 32 > - > /* indicate that the refcount of the referenced cluster is exactly one. */ > #define QCOW_OFLAG_COPIED (1LL << 63) > /* indicate that the cluster is compressed (they never have the copied flag) */ > @@ -269,8 +267,7 @@ > if (!s->cluster_cache) > goto fail; > /* one more sector for decompressed data alignment */ > - s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size > - + 512); > + s->cluster_data = qemu_malloc(s->cluster_size + 512); > if (!s->cluster_data) > goto fail; > s->cluster_cache_offset = -1; > @@ -437,7 +434,8 @@ > int new_l1_size, new_l1_size2, ret, i; > uint64_t *new_l1_table; > uint64_t new_l1_table_offset; > - uint8_t data[12]; > + uint64_t data64; > + uint32_t data32; > > new_l1_size = s->l1_size; > if (min_size <= new_l1_size) > @@ -467,10 +465,13 @@ > new_l1_table[i] = be64_to_cpu(new_l1_table[i]); > > /* set new table */ > - cpu_to_be32w((uint32_t*)data, new_l1_size); > - cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset); > - if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data, > - sizeof(data)) != sizeof(data)) > + data64 = cpu_to_be64(new_l1_table_offset); > + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset), > + &data64, sizeof(data64)) != sizeof(data64)) > + goto fail; > + data32 = cpu_to_be32(new_l1_size); > + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), > + &data32, sizeof(data32)) != sizeof(data32)) > goto fail; > qemu_free(s->l1_table); > free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t)); > @@ -483,549 +484,169 @@ > return -EIO; > } > > -/* > - * seek_l2_table > +/* 'allocate' is: > * > - * seek l2_offset in the l2_cache table > - * if not found, return NULL, > - * if found, > - * increments the l2 cache hit count of the entry, > - * if counter overflow, divide by two all counters > - * return the pointer to the l2 cache entry > + * 0 not to allocate. > * > - */ > - > -static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset) > -{ > - int i, j; > - > - for(i = 0; i < L2_CACHE_SIZE; i++) { > - if (l2_offset == s->l2_cache_offsets[i]) { > - /* increment the hit count */ > - if (++s->l2_cache_counts[i] == 0xffffffff) { > - for(j = 0; j < L2_CACHE_SIZE; j++) { > - s->l2_cache_counts[j] >>= 1; > - } > - } > - return s->l2_cache + (i << s->l2_bits); > - } > - } > - return NULL; > -} > - > -/* > - * l2_load > + * 1 to allocate a normal cluster (for sector indexes 'n_start' to > + * 'n_end') > * > - * Loads a L2 table into memory. If the table is in the cache, the cache > - * is used; otherwise the L2 table is loaded from the image file. > + * 2 to allocate a compressed cluster of size > + * 'compressed_size'. 'compressed_size' must be > 0 and < > + * cluster_size > * > - * Returns a pointer to the L2 table on success, or NULL if the read from > - * the image file failed. > + * return 0 if not allocated. > */ > - > -static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset) > -{ > - BDRVQcowState *s = bs->opaque; > - int min_index; > - uint64_t *l2_table; > - > - /* seek if the table for the given offset is in the cache */ > - > - l2_table = seek_l2_table(s, l2_offset); > - if (l2_table != NULL) > - return l2_table; > - > - /* not found: load a new entry in the least used one */ > - > - min_index = l2_cache_new_entry(bs); > - l2_table = s->l2_cache + (min_index << s->l2_bits); > - if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) != > - s->l2_size * sizeof(uint64_t)) > - return NULL; > - s->l2_cache_offsets[min_index] = l2_offset; > - s->l2_cache_counts[min_index] = 1; > - > - return l2_table; > -} > - > -/* > - * l2_allocate > - * > - * Allocate a new l2 entry in the file. If l1_index points to an already > - * used entry in the L2 table (i.e. we are doing a copy on write for the L2 > - * table) copy the contents of the old L2 table into the newly allocated one. > - * Otherwise the new table is initialized with zeros. > - * > - */ > - > -static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index) > -{ > - BDRVQcowState *s = bs->opaque; > - int min_index; > - uint64_t old_l2_offset, tmp; > - uint64_t *l2_table, l2_offset; > - > - old_l2_offset = s->l1_table[l1_index]; > - > - /* allocate a new l2 entry */ > - > - l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t)); > - > - /* update the L1 entry */ > - > - s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED; > - > - tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED); > - if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp), > - &tmp, sizeof(tmp)) != sizeof(tmp)) > - return NULL; > - > - /* allocate a new entry in the l2 cache */ > - > - min_index = l2_cache_new_entry(bs); > - l2_table = s->l2_cache + (min_index << s->l2_bits); > - > - if (old_l2_offset == 0) { > - /* if there was no old l2 table, clear the new table */ > - memset(l2_table, 0, s->l2_size * sizeof(uint64_t)); > - } else { > - /* if there was an old l2 table, read it from the disk */ > - if (bdrv_pread(s->hd, old_l2_offset, > - l2_table, s->l2_size * sizeof(uint64_t)) != > - s->l2_size * sizeof(uint64_t)) > - return NULL; > - } > - /* write the l2 table to the file */ > - if (bdrv_pwrite(s->hd, l2_offset, > - l2_table, s->l2_size * sizeof(uint64_t)) != > - s->l2_size * sizeof(uint64_t)) > - return NULL; > - > - /* update the l2 cache entry */ > - > - s->l2_cache_offsets[min_index] = l2_offset; > - s->l2_cache_counts[min_index] = 1; > - > - return l2_table; > -} > - > -static int size_to_clusters(BDRVQcowState *s, int64_t size) > -{ > - return (size + (s->cluster_size - 1)) >> s->cluster_bits; > -} > - > -static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size, > - uint64_t *l2_table, uint64_t start, uint64_t mask) > -{ > - int i; > - uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask; > - > - if (!offset) > - return 0; > - > - for (i = start; i < start + nb_clusters; i++) > - if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask)) > - break; > - > - return (i - start); > -} > - > -static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table) > -{ > - int i = 0; > - > - while(nb_clusters-- && l2_table[i] == 0) > - i++; > - > - return i; > -} > - > -/* > - * get_cluster_offset > - * > - * For a given offset of the disk image, return cluster offset in > - * qcow2 file. > - * > - * on entry, *num is the number of contiguous clusters we'd like to > - * access following offset. > - * > - * on exit, *num is the number of contiguous clusters we can read. > - * > - * Return 1, if the offset is found > - * Return 0, otherwise. > - * > - */ > - > static uint64_t get_cluster_offset(BlockDriverState *bs, > - uint64_t offset, int *num) > -{ > - BDRVQcowState *s = bs->opaque; > - int l1_index, l2_index; > - uint64_t l2_offset, *l2_table, cluster_offset; > - int l1_bits, c; > - int index_in_cluster, nb_available, nb_needed, nb_clusters; > - > - index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1); > - nb_needed = *num + index_in_cluster; > - > - l1_bits = s->l2_bits + s->cluster_bits; > - > - /* compute how many bytes there are between the offset and > - * the end of the l1 entry > - */ > - > - nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1)); > - > - /* compute the number of available sectors */ > - > - nb_available = (nb_available >> 9) + index_in_cluster; > - > - cluster_offset = 0; > - > - /* seek the the l2 offset in the l1 table */ > - > - l1_index = offset >> l1_bits; > - if (l1_index >= s->l1_size) > - goto out; > - > - l2_offset = s->l1_table[l1_index]; > - > - /* seek the l2 table of the given l2 offset */ > - > - if (!l2_offset) > - goto out; > - > - /* load the l2 table in memory */ > - > - l2_offset &= ~QCOW_OFLAG_COPIED; > - l2_table = l2_load(bs, l2_offset); > - if (l2_table == NULL) > - return 0; > - > - /* find the cluster offset for the given disk offset */ > - > - l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1); > - cluster_offset = be64_to_cpu(l2_table[l2_index]); > - nb_clusters = size_to_clusters(s, nb_needed << 9); > - > - if (!cluster_offset) { > - /* how many empty clusters ? */ > - c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]); > - } else { > - /* how many allocated clusters ? */ > - c = count_contiguous_clusters(nb_clusters, s->cluster_size, > - &l2_table[l2_index], 0, QCOW_OFLAG_COPIED); > - } > - > - nb_available = (c * s->cluster_sectors); > -out: > - if (nb_available > nb_needed) > - nb_available = nb_needed; > - > - *num = nb_available - index_in_cluster; > - > - return cluster_offset & ~QCOW_OFLAG_COPIED; > -} > - > -/* > - * free_any_clusters > - * > - * free clusters according to its type: compressed or not > - * > - */ > - > -static void free_any_clusters(BlockDriverState *bs, > - uint64_t cluster_offset, int nb_clusters) > -{ > - BDRVQcowState *s = bs->opaque; > - > - /* free the cluster */ > - > - if (cluster_offset & QCOW_OFLAG_COMPRESSED) { > - int nb_csectors; > - nb_csectors = ((cluster_offset >> s->csize_shift) & > - s->csize_mask) + 1; > - free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511, > - nb_csectors * 512); > - return; > - } > - > - free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits); > - > - return; > -} > - > -/* > - * get_cluster_table > - * > - * for a given disk offset, load (and allocate if needed) > - * the l2 table. > - * > - * the l2 table offset in the qcow2 file and the cluster index > - * in the l2 table are given to the caller. > - * > - */ > - > -static int get_cluster_table(BlockDriverState *bs, uint64_t offset, > - uint64_t **new_l2_table, > - uint64_t *new_l2_offset, > - int *new_l2_index) > + uint64_t offset, int allocate, > + int compressed_size, > + int n_start, int n_end) > { > BDRVQcowState *s = bs->opaque; > - int l1_index, l2_index, ret; > - uint64_t l2_offset, *l2_table; > - > - /* seek the the l2 offset in the l1 table */ > + int min_index, i, j, l1_index, l2_index, ret; > + uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset; > > l1_index = offset >> (s->l2_bits + s->cluster_bits); > if (l1_index >= s->l1_size) { > - ret = grow_l1_table(bs, l1_index + 1); > - if (ret < 0) > + /* outside l1 table is allowed: we grow the table if needed */ > + if (!allocate) > + return 0; > + if (grow_l1_table(bs, l1_index + 1) < 0) > return 0; > } > l2_offset = s->l1_table[l1_index]; > + if (!l2_offset) { > + if (!allocate) > + return 0; > + l2_allocate: > + old_l2_offset = l2_offset; > + /* allocate a new l2 entry */ > + l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t)); > + /* update the L1 entry */ > + s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED; > + tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED); > + if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp), > + &tmp, sizeof(tmp)) != sizeof(tmp)) > + return 0; > + min_index = l2_cache_new_entry(bs); > + l2_table = s->l2_cache + (min_index << s->l2_bits); > > - /* seek the l2 table of the given l2 offset */ > - > - if (l2_offset & QCOW_OFLAG_COPIED) { > - /* load the l2 table in memory */ > - l2_offset &= ~QCOW_OFLAG_COPIED; > - l2_table = l2_load(bs, l2_offset); > - if (l2_table == NULL) > + if (old_l2_offset == 0) { > + memset(l2_table, 0, s->l2_size * sizeof(uint64_t)); > + } else { > + if (bdrv_pread(s->hd, old_l2_offset, > + l2_table, s->l2_size * sizeof(uint64_t)) != > + s->l2_size * sizeof(uint64_t)) > + return 0; > + } > + if (bdrv_pwrite(s->hd, l2_offset, > + l2_table, s->l2_size * sizeof(uint64_t)) != > + s->l2_size * sizeof(uint64_t)) > return 0; > } else { > - if (l2_offset) > - free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t)); > - l2_table = l2_allocate(bs, l1_index); > - if (l2_table == NULL) > + if (!(l2_offset & QCOW_OFLAG_COPIED)) { > + if (allocate) { > + free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t)); > + goto l2_allocate; > + } > + } else { > + l2_offset &= ~QCOW_OFLAG_COPIED; > + } > + for(i = 0; i < L2_CACHE_SIZE; i++) { > + if (l2_offset == s->l2_cache_offsets[i]) { > + /* increment the hit count */ > + if (++s->l2_cache_counts[i] == 0xffffffff) { > + for(j = 0; j < L2_CACHE_SIZE; j++) { > + s->l2_cache_counts[j] >>= 1; > + } > + } > + l2_table = s->l2_cache + (i << s->l2_bits); > + goto found; > + } > + } > + /* not found: load a new entry in the least used one */ > + min_index = l2_cache_new_entry(bs); > + l2_table = s->l2_cache + (min_index << s->l2_bits); > + if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) != > + s->l2_size * sizeof(uint64_t)) > return 0; > - l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED; > } > - > - /* find the cluster offset for the given disk offset */ > - > + s->l2_cache_offsets[min_index] = l2_offset; > + s->l2_cache_counts[min_index] = 1; > + found: > l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1); > - > - *new_l2_table = l2_table; > - *new_l2_offset = l2_offset; > - *new_l2_index = l2_index; > - > - return 1; > -} > - > -/* > - * alloc_compressed_cluster_offset > - * > - * For a given offset of the disk image, return cluster offset in > - * qcow2 file. > - * > - * If the offset is not found, allocate a new compressed cluster. > - * > - * Return the cluster offset if successful, > - * Return 0, otherwise. > - * > - */ > - > -static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs, > - uint64_t offset, > - int compressed_size) > -{ > - BDRVQcowState *s = bs->opaque; > - int l2_index, ret; > - uint64_t l2_offset, *l2_table, cluster_offset; > - int nb_csectors; > - > - ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index); > - if (ret == 0) > - return 0; > - > cluster_offset = be64_to_cpu(l2_table[l2_index]); > - if (cluster_offset & QCOW_OFLAG_COPIED) > - return cluster_offset & ~QCOW_OFLAG_COPIED; > - > - if (cluster_offset) > - free_any_clusters(bs, cluster_offset, 1); > - > - cluster_offset = alloc_bytes(bs, compressed_size); > - nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) - > - (cluster_offset >> 9); > - > - cluster_offset |= QCOW_OFLAG_COMPRESSED | > - ((uint64_t)nb_csectors << s->csize_shift); > - > - /* update L2 table */ > - > - /* compressed clusters never have the copied flag */ > - > - l2_table[l2_index] = cpu_to_be64(cluster_offset); > - if (bdrv_pwrite(s->hd, > - l2_offset + l2_index * sizeof(uint64_t), > - l2_table + l2_index, > - sizeof(uint64_t)) != sizeof(uint64_t)) > - return 0; > - > - return cluster_offset; > -} > - > -typedef struct QCowL2Meta > -{ > - uint64_t offset; > - int n_start; > - int nb_available; > - int nb_clusters; > -} QCowL2Meta; > - > -static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset, > - QCowL2Meta *m) > -{ > - BDRVQcowState *s = bs->opaque; > - int i, j = 0, l2_index, ret; > - uint64_t *old_cluster, start_sect, l2_offset, *l2_table; > - > - if (m->nb_clusters == 0) > - return 0; > - > - if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t)))) > - return -ENOMEM; > - > - /* copy content of unmodified sectors */ > - start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9; > - if (m->n_start) { > - ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start); > - if (ret < 0) > - goto err; > + if (!cluster_offset) { > + if (!allocate) > + return cluster_offset; > + } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) { > + if (!allocate) > + return cluster_offset; > + /* free the cluster */ > + if (cluster_offset & QCOW_OFLAG_COMPRESSED) { > + int nb_csectors; > + nb_csectors = ((cluster_offset >> s->csize_shift) & > + s->csize_mask) + 1; > + free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511, > + nb_csectors * 512); > + } else { > + free_clusters(bs, cluster_offset, s->cluster_size); > + } > + } else { > + cluster_offset &= ~QCOW_OFLAG_COPIED; > + return cluster_offset; > } > - > - if (m->nb_available & (s->cluster_sectors - 1)) { > - uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1); > - ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9), > - m->nb_available - end, s->cluster_sectors); > - if (ret < 0) > - goto err; > + if (allocate == 1) { > + /* allocate a new cluster */ > + cluster_offset = alloc_clusters(bs, s->cluster_size); > + > + /* we must initialize the cluster content which won't be > + written */ > + if ((n_end - n_start) < s->cluster_sectors) { > + uint64_t start_sect; > + > + start_sect = (offset & ~(s->cluster_size - 1)) >> 9; > + ret = copy_sectors(bs, start_sect, > + cluster_offset, 0, n_start); > + if (ret < 0) > + return 0; > + ret = copy_sectors(bs, start_sect, > + cluster_offset, n_end, s->cluster_sectors); > + if (ret < 0) > + return 0; > + } > + tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED); > + } else { > + int nb_csectors; > + cluster_offset = alloc_bytes(bs, compressed_size); > + nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) - > + (cluster_offset >> 9); > + cluster_offset |= QCOW_OFLAG_COMPRESSED | > + ((uint64_t)nb_csectors << s->csize_shift); > + /* compressed clusters never have the copied flag */ > + tmp = cpu_to_be64(cluster_offset); > } > - > - ret = -EIO; > /* update L2 table */ > - if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index)) > - goto err; > - > - for (i = 0; i < m->nb_clusters; i++) { > - if(l2_table[l2_index + i] != 0) > - old_cluster[j++] = l2_table[l2_index + i]; > - > - l2_table[l2_index + i] = cpu_to_be64((cluster_offset + > - (i << s->cluster_bits)) | QCOW_OFLAG_COPIED); > - } > - > - if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t), > - l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) != > - m->nb_clusters * sizeof(uint64_t)) > - goto err; > - > - for (i = 0; i < j; i++) > - free_any_clusters(bs, old_cluster[i], 1); > - > - ret = 0; > -err: > - qemu_free(old_cluster); > - return ret; > - } > - > -/* > - * alloc_cluster_offset > - * > - * For a given offset of the disk image, return cluster offset in > - * qcow2 file. > - * > - * If the offset is not found, allocate a new cluster. > - * > - * Return the cluster offset if successful, > - * Return 0, otherwise. > - * > - */ > - > -static uint64_t alloc_cluster_offset(BlockDriverState *bs, > - uint64_t offset, > - int n_start, int n_end, > - int *num, QCowL2Meta *m) > -{ > - BDRVQcowState *s = bs->opaque; > - int l2_index, ret; > - uint64_t l2_offset, *l2_table, cluster_offset; > - int nb_clusters, i = 0; > - > - ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index); > - if (ret == 0) > + l2_table[l2_index] = tmp; > + if (bdrv_pwrite(s->hd, > + l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp)) > return 0; > - > - nb_clusters = size_to_clusters(s, n_end << 9); > - > - nb_clusters = MIN(nb_clusters, s->l2_size - l2_index); > - > - cluster_offset = be64_to_cpu(l2_table[l2_index]); > - > - /* We keep all QCOW_OFLAG_COPIED clusters */ > - > - if (cluster_offset & QCOW_OFLAG_COPIED) { > - nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size, > - &l2_table[l2_index], 0, 0); > - > - cluster_offset &= ~QCOW_OFLAG_COPIED; > - m->nb_clusters = 0; > - > - goto out; > - } > - > - /* for the moment, multiple compressed clusters are not managed */ > - > - if (cluster_offset & QCOW_OFLAG_COMPRESSED) > - nb_clusters = 1; > - > - /* how many available clusters ? */ > - > - while (i < nb_clusters) { > - i += count_contiguous_clusters(nb_clusters - i, s->cluster_size, > - &l2_table[l2_index], i, 0); > - > - if(be64_to_cpu(l2_table[l2_index + i])) > - break; > - > - i += count_contiguous_free_clusters(nb_clusters - i, > - &l2_table[l2_index + i]); > - > - cluster_offset = be64_to_cpu(l2_table[l2_index + i]); > - > - if ((cluster_offset & QCOW_OFLAG_COPIED) || > - (cluster_offset & QCOW_OFLAG_COMPRESSED)) > - break; > - } > - nb_clusters = i; > - > - /* allocate a new cluster */ > - > - cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size); > - > - /* save info needed for meta data update */ > - m->offset = offset; > - m->n_start = n_start; > - m->nb_clusters = nb_clusters; > - > -out: > - m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end); > - > - *num = m->nb_available - n_start; > - > return cluster_offset; > } > > static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num, > int nb_sectors, int *pnum) > { > + BDRVQcowState *s = bs->opaque; > + int index_in_cluster, n; > uint64_t cluster_offset; > > - *pnum = nb_sectors; > - cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum); > - > + cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0); > + index_in_cluster = sector_num & (s->cluster_sectors - 1); > + n = s->cluster_sectors - index_in_cluster; > + if (n > nb_sectors) > + n = nb_sectors; > + *pnum = n; > return (cluster_offset != 0); > } > > @@ -1102,9 +723,11 @@ > uint64_t cluster_offset; > > while (nb_sectors > 0) { > - n = nb_sectors; > - cluster_offset = get_cluster_offset(bs, sector_num << 9, &n); > + cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0); > index_in_cluster = sector_num & (s->cluster_sectors - 1); > + n = s->cluster_sectors - index_in_cluster; > + if (n > nb_sectors) > + n = nb_sectors; > if (!cluster_offset) { > if (bs->backing_hd) { > /* read from the base image */ > @@ -1143,18 +766,15 @@ > BDRVQcowState *s = bs->opaque; > int ret, index_in_cluster, n; > uint64_t cluster_offset; > - int n_end; > - QCowL2Meta l2meta; > > while (nb_sectors > 0) { > index_in_cluster = sector_num & (s->cluster_sectors - 1); > - n_end = index_in_cluster + nb_sectors; > - if (s->crypt_method && > - n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) > - n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors; > - cluster_offset = alloc_cluster_offset(bs, sector_num << 9, > - index_in_cluster, > - n_end, &n, &l2meta); > + n = s->cluster_sectors - index_in_cluster; > + if (n > nb_sectors) > + n = nb_sectors; > + cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0, > + index_in_cluster, > + index_in_cluster + n); > if (!cluster_offset) > return -1; > if (s->crypt_method) { > @@ -1165,10 +785,8 @@ > } else { > ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512); > } > - if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) { > - free_any_clusters(bs, cluster_offset, l2meta.nb_clusters); > + if (ret != n * 512) > return -1; > - } > nb_sectors -= n; > sector_num += n; > buf += n * 512; > @@ -1186,33 +804,8 @@ > uint64_t cluster_offset; > uint8_t *cluster_data; > BlockDriverAIOCB *hd_aiocb; > - QEMUBH *bh; > - QCowL2Meta l2meta; > } QCowAIOCB; > > -static void qcow_aio_read_cb(void *opaque, int ret); > -static void qcow_aio_read_bh(void *opaque) > -{ > - QCowAIOCB *acb = opaque; > - qemu_bh_delete(acb->bh); > - acb->bh = NULL; > - qcow_aio_read_cb(opaque, 0); > -} > - > -static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb) > -{ > - if (acb->bh) > - return -EIO; > - > - acb->bh = qemu_bh_new(cb, acb); > - if (!acb->bh) > - return -EIO; > - > - qemu_bh_schedule(acb->bh); > - > - return 0; > -} > - > static void qcow_aio_read_cb(void *opaque, int ret) > { > QCowAIOCB *acb = opaque; > @@ -1222,12 +815,13 @@ > > acb->hd_aiocb = NULL; > if (ret < 0) { > -fail: > + fail: > acb->common.cb(acb->common.opaque, ret); > qemu_aio_release(acb); > return; > } > > + redo: > /* post process the read buffer */ > if (!acb->cluster_offset) { > /* nothing to do */ > @@ -1253,9 +847,12 @@ > } > > /* prepare next AIO request */ > - acb->n = acb->nb_sectors; > - acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n); > + acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, > + 0, 0, 0, 0); > index_in_cluster = acb->sector_num & (s->cluster_sectors - 1); > + acb->n = s->cluster_sectors - index_in_cluster; > + if (acb->n > acb->nb_sectors) > + acb->n = acb->nb_sectors; > > if (!acb->cluster_offset) { > if (bs->backing_hd) { > @@ -1268,16 +865,12 @@ > if (acb->hd_aiocb == NULL) > goto fail; > } else { > - ret = qcow_schedule_bh(qcow_aio_read_bh, acb); > - if (ret < 0) > - goto fail; > + goto redo; > } > } else { > /* Note: in this case, no need to wait */ > memset(acb->buf, 0, 512 * acb->n); > - ret = qcow_schedule_bh(qcow_aio_read_bh, acb); > - if (ret < 0) > - goto fail; > + goto redo; > } > } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) { > /* add AIO support for compressed blocks ? */ > @@ -1285,9 +878,7 @@ > goto fail; > memcpy(acb->buf, > s->cluster_cache + index_in_cluster * 512, 512 * acb->n); > - ret = qcow_schedule_bh(qcow_aio_read_bh, acb); > - if (ret < 0) > - goto fail; > + goto redo; > } else { > if ((acb->cluster_offset & 511) != 0) { > ret = -EIO; > @@ -1316,7 +907,6 @@ > acb->nb_sectors = nb_sectors; > acb->n = 0; > acb->cluster_offset = 0; > - acb->l2meta.nb_clusters = 0; > return acb; > } > > @@ -1340,8 +930,8 @@ > BlockDriverState *bs = acb->common.bs; > BDRVQcowState *s = bs->opaque; > int index_in_cluster; > + uint64_t cluster_offset; > const uint8_t *src_buf; > - int n_end; > > acb->hd_aiocb = NULL; > > @@ -1352,11 +942,6 @@ > return; > } > > - if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) { > - free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters); > - goto fail; > - } > - > acb->nb_sectors -= acb->n; > acb->sector_num += acb->n; > acb->buf += acb->n * 512; > @@ -1369,22 +954,19 @@ > } > > index_in_cluster = acb->sector_num & (s->cluster_sectors - 1); > - n_end = index_in_cluster + acb->nb_sectors; > - if (s->crypt_method && > - n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) > - n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors; > - > - acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9, > - index_in_cluster, > - n_end, &acb->n, &acb->l2meta); > - if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) { > + acb->n = s->cluster_sectors - index_in_cluster; > + if (acb->n > acb->nb_sectors) > + acb->n = acb->nb_sectors; > + cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0, > + index_in_cluster, > + index_in_cluster + acb->n); > + if (!cluster_offset || (cluster_offset & 511) != 0) { > ret = -EIO; > goto fail; > } > if (s->crypt_method) { > if (!acb->cluster_data) { > - acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * > - s->cluster_size); > + acb->cluster_data = qemu_mallocz(s->cluster_size); > if (!acb->cluster_data) { > ret = -ENOMEM; > goto fail; > @@ -1397,7 +979,7 @@ > src_buf = acb->buf; > } > acb->hd_aiocb = bdrv_aio_write(s->hd, > - (acb->cluster_offset >> 9) + index_in_cluster, > + (cluster_offset >> 9) + index_in_cluster, > src_buf, acb->n, > qcow_aio_write_cb, acb); > if (acb->hd_aiocb == NULL) > @@ -1571,7 +1153,7 @@ > > memset(s->l1_table, 0, l1_length); > if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0) > - return -1; > + return -1; > ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length); > if (ret < 0) > return ret; > @@ -1637,10 +1219,8 @@ > /* could not compress: write normal cluster */ > qcow_write(bs, sector_num, buf, s->cluster_sectors); > } else { > - cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9, > - out_len); > - if (!cluster_offset) > - return -1; > + cluster_offset = get_cluster_offset(bs, sector_num << 9, 2, > + out_len, 0, 0); > cluster_offset &= s->cluster_offset_mask; > if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) { > qemu_free(out_buf); > @@ -2225,19 +1805,26 @@ > BDRVQcowState *s = bs->opaque; > int i, nb_clusters; > > - nb_clusters = size_to_clusters(s, size); > -retry: > - for(i = 0; i < nb_clusters; i++) { > - int64_t i = s->free_cluster_index++; > - if (get_refcount(bs, i) != 0) > - goto retry; > - } > + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits; > + for(;;) { > + if (get_refcount(bs, s->free_cluster_index) == 0) { > + s->free_cluster_index++; > + for(i = 1; i < nb_clusters; i++) { > + if (get_refcount(bs, s->free_cluster_index) != 0) > + goto not_found; > + s->free_cluster_index++; > + } > #ifdef DEBUG_ALLOC2 > - printf("alloc_clusters: size=%lld -> %lld\n", > - size, > - (s->free_cluster_index - nb_clusters) << s->cluster_bits); > + printf("alloc_clusters: size=%lld -> %lld\n", > + size, > + (s->free_cluster_index - nb_clusters) << s->cluster_bits); > #endif > - return (s->free_cluster_index - nb_clusters) << s->cluster_bits; > + return (s->free_cluster_index - nb_clusters) << s->cluster_bits; > + } else { > + not_found: > + s->free_cluster_index++; > + } > + } > } > > static int64_t alloc_clusters(BlockDriverState *bs, int64_t size) > @@ -2301,7 +1888,8 @@ > int new_table_size, new_table_size2, refcount_table_clusters, i, ret; > uint64_t *new_table; > int64_t table_offset; > - uint8_t data[12]; > + uint64_t data64; > + uint32_t data32; > int old_table_size; > int64_t old_table_offset; > > @@ -2340,10 +1928,13 @@ > for(i = 0; i < s->refcount_table_size; i++) > be64_to_cpus(&new_table[i]); > > - cpu_to_be64w((uint64_t*)data, table_offset); > - cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters); > + data64 = cpu_to_be64(table_offset); > if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset), > - data, sizeof(data)) != sizeof(data)) > + &data64, sizeof(data64)) != sizeof(data64)) > + goto fail; > + data32 = cpu_to_be32(refcount_table_clusters); > + if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters), > + &data32, sizeof(data32)) != sizeof(data32)) > goto fail; > qemu_free(s->refcount_table); > old_table_offset = s->refcount_table_offset; > @@ -2572,7 +2163,7 @@ > uint16_t *refcount_table; > > size = bdrv_getlength(s->hd); > - nb_clusters = size_to_clusters(s, size); > + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits; > refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t)); > > /* header */ > @@ -2624,7 +2215,7 @@ > int refcount; > > size = bdrv_getlength(s->hd); > - nb_clusters = size_to_clusters(s, size); > + nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits; > for(k = 0; k < nb_clusters;) { > k1 = k; > refcount = get_refcount(bs, k); > > >