From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42980) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xc4Om-0003tC-SA for qemu-devel@nongnu.org; Wed, 08 Oct 2014 23:28:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xc4Oh-0002Rt-Ly for qemu-devel@nongnu.org; Wed, 08 Oct 2014 23:28:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:8051) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xc4Oh-0002Rl-EK for qemu-devel@nongnu.org; Wed, 08 Oct 2014 23:28:11 -0400 Message-ID: <5435C414.5030203@redhat.com> Date: Wed, 08 Oct 2014 17:09:08 -0600 From: Eric Blake MIME-Version: 1.0 References: <1409348463-16627-1-git-send-email-mreitz@redhat.com> <1409348463-16627-9-git-send-email-mreitz@redhat.com> In-Reply-To: <1409348463-16627-9-git-send-email-mreitz@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="KjgfkX0Qs9rcxQvbxKu9bgwtxUUcbQO1f" Subject: Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi , =?UTF-8?B?QmVub8OudCBDYW5ldA==?= This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --KjgfkX0Qs9rcxQvbxKu9bgwtxUUcbQO1f Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 08/29/2014 03:41 PM, Max Reitz wrote: > The previous commit introduced the "rebuild" variable to qcow2's > implementation of the image consistency check. Now make use of this by > adding a function which creates a completely new refcount structure > based solely on the in-memory information gathered before. >=20 > The old refcount structure will be leaked, however. Might be worth mentioning in the commit message that a later commit will deal with the leak. >=20 > Signed-off-by: Max Reitz > --- > block/qcow2-refcount.c | 286 +++++++++++++++++++++++++++++++++++++++++= +++++++- > 1 file changed, 283 insertions(+), 3 deletions(-) >=20 > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index 6300cec..318c152 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState = *bs, BdrvCheckResult *res, > } > =20 > /* > + * Allocates a cluster using an in-memory refcount table (IMRT) in con= trast to > + * the on-disk refcount structures. > + * > + * *first_free_cluster does not necessarily point to the first free cl= uster, but > + * may point to one cluster as close as possible before it. The offset= returned > + * will never be before that cluster. Took me a couple reads of the comment and code to understand that. If I'm correct, this alternative wording may be better: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the actual first free cluster found. Or, depending on the semantics you intended [1]: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the first gap found, even if that gap was too small to be used as the returned offset. > + * > + * Note that *first_free_cluster is a cluster index whereas the return= value is > + * an offset. > + */ > +static int64_t alloc_clusters_imrt(BlockDriverState *bs, > + int cluster_count, > + uint16_t **refcount_table, > + int64_t *nb_clusters, > + int64_t *first_free_cluster) > +{ > + BDRVQcowState *s =3D bs->opaque; > + int64_t cluster =3D *first_free_cluster, i; > + bool first_gap =3D true; > + int contiguous_free_clusters; > + > + /* Starting at *first_free_cluster, find a range of at least clust= er_count > + * continuously free clusters */ > + for (contiguous_free_clusters =3D 0; > + cluster < *nb_clusters && contiguous_free_clusters < cluster_= count; > + cluster++) > + { > + if (!(*refcount_table)[cluster]) { > + contiguous_free_clusters++; > + if (first_gap) { > + /* If this is the first free cluster found, update > + * *first_free_cluster accordingly */ > + *first_free_cluster =3D cluster; > + first_gap =3D false; > + } > + } else if (contiguous_free_clusters) { > + contiguous_free_clusters =3D 0; > + } [1] Should you be resetting first_gap in the 'else'? If you don't, then *first_free_cluster is NOT the start of the cluster just allocated, but the first free cluster encountered on the way to the eventual allocation. I guess it depends on how the callers are using the information; since the function is static, I guess I'll find out later in my review. > + } > + > + /* If contiguous_free_clusters is greater than zero, it contains t= he number > + * of continuously free clusters until the current cluster; the fi= rst free > + * cluster in the current "gap" is therefore > + * cluster - contiguous_free_clusters */ > + > + /* If no such range could be found, grow the in-memory refcount ta= ble > + * accordingly to append free clusters at the end of the image */ > + if (contiguous_free_clusters < cluster_count) { > + int64_t old_nb_clusters =3D *nb_clusters; > + > + /* There already is a gap of contiguous_free_clusters; we need= s/gap/tail/, since we are at the end of the table? > + * cluster_count clusters; therefore, we have to allocate > + * cluster_count - contiguous_free_clusters new clusters at th= e end of > + * the image (which is the current value of cluster; note that= cluster > + * may exceed old_nb_clusters if *first_free_cluster pointed b= eyond the > + * image end) */ > + *nb_clusters =3D cluster + cluster_count - contiguous_free_clu= sters; > + *refcount_table =3D g_try_realloc(*refcount_table, > + *nb_clusters * sizeof(uint16_t= )); > + if (!*refcount_table) { > + return -ENOMEM; > + } > + > + memset(*refcount_table + old_nb_clusters, 0, > + (*nb_clusters - old_nb_clusters) * sizeof(uint16_t)); Is this calculation unnecessarily hard-coded to refcount_order=3D=3D4? > + } > + > + /* Go back to the first free cluster */ > + cluster -=3D contiguous_free_clusters; > + for (i =3D 0; i < cluster_count; i++) { > + (*refcount_table)[cluster + i] =3D 1; > + } > + > + return cluster << s->cluster_bits; > +} > + > +/* > + * Creates a new refcount structure based solely on the in-memory info= rmation > + * given through *refcount_table. All necessary allocations will be re= flected > + * in that array. > + * > + * On success, the old refcount structure is leaked (it will be covere= d by the > + * new refcount structure). > + */ > +static int rebuild_refcount_structure(BlockDriverState *bs, > + BdrvCheckResult *res, > + uint16_t **refcount_table, > + int64_t *nb_clusters) > +{ > + BDRVQcowState *s =3D bs->opaque; > + int64_t first_free_cluster =3D 0, rt_ofs =3D -1, cluster =3D 0; > + int64_t rb_ofs, rb_start, rb_index; > + uint32_t reftable_size =3D 0; > + uint64_t *reftable =3D NULL; > + uint16_t *on_disk_rb; > + int i, ret =3D 0; ret is 0... > + struct { > + uint64_t rt_offset; > + uint32_t rt_clusters; > + } QEMU_PACKED rt_offset_and_clusters; > + > + qcow2_cache_empty(bs, s->refcount_block_cache); > + > +write_refblocks: > + for (; cluster < *nb_clusters; cluster++) { > + if (!(*refcount_table)[cluster]) { > + continue; > + } > + > + rb_index =3D cluster >> s->refcount_block_bits; > + rb_start =3D rb_index << s->refcount_block_bits; > + > + /* Don't allocate a cluster in a refblock already written to d= isk */ > + if (first_free_cluster < rb_start) { > + first_free_cluster =3D rb_start; > + } > + rb_ofs =3D alloc_clusters_imrt(bs, 1, refcount_table, nb_clust= ers, > + &first_free_cluster); [1] looking back at my earlier question, you are starting each iteration no earlier than the current rb_start. But if you end up jumping back to write_refblocks, are you guaranteed that rb_start is safely far enough into the file, even if first_free_cluster is pointing to a gap that was too small for an allocation? > + if (rb_ofs < 0) { > + fprintf(stderr, "ERROR allocating refblock: %s\n", strerro= r(-ret)); =2E..but if we hit this error on the first time through the for loop, strerror(0) is NOT what you meant to print. Did you mean strerror(-rb_ofs) here? > + res->check_errors++; > + ret =3D rb_ofs; Narrowing from int64_t to int; but I guess we know that if rb_ofs < 0, it is only -1, and not something weird like -0x100000000. Is the goal that ret is -1/0, or are you trying to encode negative errno values in the return? > + goto fail; > + } > + > + if (reftable_size <=3D rb_index) { > + uint32_t old_rt_size =3D reftable_size; > + reftable_size =3D ROUND_UP((rb_index + 1) * sizeof(uint64_= t), > + s->cluster_size) / sizeof(uint64_= t); > + reftable =3D g_try_realloc(reftable, > + reftable_size * sizeof(uint64_t))= ; > + if (!reftable) { > + res->check_errors++; > + ret =3D -ENOMEM; > + goto fail; > + } > + > + memset(reftable + old_rt_size, 0, > + (reftable_size - old_rt_size) * sizeof(uint64_t)); > + > + /* The offset we have for the reftable is now no longer va= lid; > + * this will leak that range, but we can easily fix that b= y running > + * a leak-fixing check after this rebuild operation */ > + rt_ofs =3D -1; > + } > + reftable[rb_index] =3D rb_ofs; > + > + /* If this is apparently the last refblock (for now), try to s= queeze the > + * reftable in */ > + if (rb_index =3D=3D (*nb_clusters - 1) >> s->refcount_block_bi= ts && > + rt_ofs < 0) > + { > + rt_ofs =3D alloc_clusters_imrt(bs, size_to_clusters(s, ref= table_size * > + sizeof(u= int64_t)), > + refcount_table, nb_clusters, > + &first_free_cluster); > + if (rt_ofs < 0) { > + fprintf(stderr, "ERROR allocating reftable: %s\n", > + strerror(-ret)); Again, -ret looks wrong here. > + res->check_errors++; > + ret =3D rt_ofs; > + goto fail; > + } > + } > + > + ret =3D qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluste= r_size); > + if (ret < 0) { > + fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-= ret)); > + goto fail; > + } > + > + on_disk_rb =3D g_malloc0(s->cluster_size); Why g_try_malloc earlier, but abort()ing g_malloc0 here? > + for (i =3D 0; i < s->cluster_size / sizeof(uint16_t) && > + rb_start + i < *nb_clusters; i++) > + { > + on_disk_rb[i] =3D cpu_to_be16((*refcount_table)[rb_start += i]); > + } > + > + ret =3D bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE, > + (void *)on_disk_rb, s->cluster_sectors); > + g_free(on_disk_rb); > + if (ret < 0) { > + fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-= ret)); > + goto fail; > + } > + > + /* Go to the end of this refblock */ > + cluster =3D rb_start + s->cluster_size / sizeof(uint16_t) - 1;= > + } > + > + if (rt_ofs < 0) { > + int64_t post_rb_start =3D ROUND_UP(*nb_clusters, > + s->cluster_size / sizeof(uint= 16_t)); > + > + /* Not pretty but simple */ > + if (first_free_cluster < post_rb_start) { > + first_free_cluster =3D post_rb_start; > + } > + rt_ofs =3D alloc_clusters_imrt(bs, size_to_clusters(s, reftabl= e_size * > + sizeof(uint6= 4_t)), > + refcount_table, nb_clusters, > + &first_free_cluster); > + if (rt_ofs < 0) { > + fprintf(stderr, "ERROR allocating reftable: %s\n", strerro= r(-ret)); Another wrong -ret? --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --KjgfkX0Qs9rcxQvbxKu9bgwtxUUcbQO1f Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUNcQUAAoJEKeha0olJ0NqwcwH/RH6qlbbp7mquJPMtz2caNOJ mi4BzY4OKND3qmcePZhr09CFEAaerkq8HBe3979Y3UycbsFuoXedh5MzoYdAAppZ LQO5Xq33m1vCIwmEEY/qsfy/rbX0K9wz+GwJ5qrWHDHCFv0jD0WUs+BpgJq4P2JS qi3g+YIZ3RrIJ4s8Z74BrfQmy9B6xjMJ9Jjtzs0taPszQyR9kK1Oi+qtXy6a69T3 ND91fCL9PeObiEHa0wveN/bcoqkcDaCaKTUryXgSwVxIrYdsIQMl0lJaIL0x3Fpt jGHWU8oEn/Ua1RYFe0zm/QljVd1Ih2nfEPZJY83OSQP4MLwxxbaJjS/Jibij+tQ= =I+Bd -----END PGP SIGNATURE----- --KjgfkX0Qs9rcxQvbxKu9bgwtxUUcbQO1f--